How EarlyNarratives works
Evidence-first signal tracking across public sources. This document explains the pipeline, the meaning of the dashboard metrics, and how to audit any signal back to sources.
No investment advice. Signals & sources only. EarlyNarratives provides informational signals derived from public sources. It does not provide financial, legal, or tax advice. Use at your own risk. We do not add new facts; we display summaries and metrics derived from collected posts and their metadata.
EarlyNarratives Deep Dive
Public version Version 0.1 Date: 18 Dec 2025
1. Executive summary
EarlyNarratives helps you spot emerging signals early across news and social channels, and back every signal with inspectable evidence.
It is not investment advice. It does not predict outcomes. It helps you answer practical questions faster:
- What topics are gaining traction right now?
- Is this idea showing up across independent sources, or is it mostly repetition?
- What changed recently that explains the spike?
- Where are the primary sources, and who is amplifying them?
Three commitments shape the product:
- Evidence first: every signal is backed by a list of sources you can click
- Diversity over volume: independent origin breadth matters more than raw mention counts
- Anti amplification: duplicates, syndication, and coordinated repost patterns are treated as noise unless supported by independent origins
2. The problem
Information moves faster than verification:
- The same article is republished across many outlets
- Social platforms amplify repetition and reposts
- Technical signals often appear before mainstream coverage
- Volume alone is a poor proxy for independent confirmation
For research teams, journalists, and technical users, the bottleneck is not access to information. It is deciding which ideas are becoming signals, and doing it in a way that is early, explainable, and traceable.
EarlyNarratives treats signals as the unit of analysis: clusters of related items measured over short time windows, scored with evidence and spread proxies, and surfaced with transparent provenance.
3. What EarlyNarratives is, and what it is not
What it is
A system that:
- collects items from configured sources (news feeds and social channels, plus optional technical sources)
- normalizes them into a consistent evidence format
- reduces duplication noise
- groups related items into signals within a timeframe (for example 24 hours)
- scores signals using evidence and spread proxies
- serves a radar style view, signal detail views with evidence, and a Weekly Briefing
What it is not
- Not investment advice, trade recommendations, or a forecasting system
- Not a substitute for primary source verification
- Not a black box: evidence is always visible, and metrics are intended to be readable
4. The user contract
EarlyNarratives is built around one simple contract: every signal must be backed by evidence you can inspect.
In practice that means:
- A signal card is never just a claim. It is a summary plus sources.
- The UI is optimized for evidence inspection: open the signal, open the source list, click through to primary sources.
- Confidence is expressed as proxies (breadth, concentration, duplication share, momentum), not as certainty.
5. Core concepts
Source
A single input stream, such as:
- an RSS feed
- a subreddit or forum
- a social channel
- a curated stream for a platform
Sources can be enabled or disabled and grouped by coverage.
Post
The normalized evidence unit. A post contains a URL (where available), a text snippet, a timestamp, and metadata that helps judge provenance.
Signal
A signal is a cluster of related posts within a timeframe. It has:
- a title and short summary
- a why now explanation
- metrics describing evidence and spread
- an evidence list: the linked posts and URLs
Storyline
A storyline is a continuity layer that helps you follow a theme over multiple time windows, even if individual signal clusters shift as new posts appear.
6. Design principles
Evidence first
No evidence, no signal. Every signal is backed by sources, and the evidence list is a first class output.
Diversity over volume
A thousand reposts can still be one idea from one origin. EarlyNarratives prioritizes independent origin breadth and publisher diversity over raw mentions.
Anti amplification
The system reduces the impact of:
- syndication and rewrites
- exact duplicates across different inputs
- near duplicates and repost dynamics
- overly dominant publishers inside a short window
This is signal hygiene, not censorship.
Explainability by default
EarlyNarratives favors transparent, inspectable outputs:
- the evidence list is always present
- metrics are designed to be interpretable at a glance
- editorial phrasing is separated from evidence
7. How signals are produced (conceptual)
This section is intentionally conceptual. Exact implementation details, scoring weights, and thresholds are proprietary and evolve over time.
7.1 Collect and normalize
Items are pulled from configured sources and normalized into a consistent post format.
7.2 Reduce duplication noise
Repetition is common and not the same as independent confirmation. The system detects and labels duplication patterns so they do not dominate clustering and scoring.
7.3 Group related items into signals
Within a timeframe, related posts are grouped into signal clusters using semantic similarity and evidence constraints that prevent a single publisher from dominating.
7.4 Score and track change
Signals are scored using proxies for:
- evidence mix and breadth
- concentration and duplication share
- coherence and stability
- momentum (how fast the signal is changing)
Discrete signal states can be derived from score changes, such as "new" or "accelerating".
7.5 Produce product outputs
The pipeline publishes:
- Signals dashboard views for a timeframe
- Signal detail pages with evidence
- Weekly Briefing format
- Continuity layers (storylines) where applicable
8. Metrics you will see and how to read them
Metrics are proxies. They do not claim truth. They help you judge signal quality quickly.
Evidence mix
How the supporting posts are distributed across source types and platforms.
How to read it:
- A signal supported across multiple source types is usually more robust than one isolated to a single platform.
Origin breadth
How many independent origins appear to be present, expressed through counts of unique origin domains or publishers.
How to read it:
- Higher breadth generally suggests more independent confirmation.
- Low breadth with high volume often suggests amplification.
Concentration
How dominant the top origin or publisher is inside the evidence set.
How to read it:
- High concentration suggests fragility: the signal may collapse if the origin is weak or misleading.
Duplicate share
How much of the supporting set is repetition.
How to read it:
- Higher duplicate share suggests the signal is being repeated more than it is being independently confirmed.
Coherence
How tightly related the posts inside the cluster are.
How to read it:
- Low coherence can mean the topic is drifting or too broad.
- Very high coherence with low breadth can still be a single source repeated.
Momentum
How fast a signal is changing.
How to read it:
- Momentum signals acceleration, not correctness.
- Momentum without breadth is a caution flag.
9. Outputs you can use immediately
Signals dashboard
The Signals dashboard typically shows:
- top signals in a timeframe
- sorting by momentum or overall score
- filters by coverage packs or tags (where available)
The main action is always the same: open evidence.
Weekly Briefing
A stable format designed to be read quickly and archived.
A typical signal entry includes:
- Summary
- Why now
- Why it matters (framed as implications or hypotheses where appropriate)
- Evidence
- Watch next
Storylines
A longer lived theme view that helps you follow a topic over days or weeks, with trend and maturity style signals.
Alerts and integrations
Alert events can be generated from rules and delivered via common channels (email, webhooks, chat tools). Delivery is kept separate from the core signal generation logic.
10. Security, privacy, and trust
EarlyNarratives organizes third party content and metadata. It is not a social network and does not require user generated content to function.
Trust is built primarily through transparency:
- users can inspect the evidence list directly
- provenance and diversity signals are explicit
- the system is designed to encourage primary source verification
11. How to interpret signals safely
A signal is a starting point for investigation.
Good practices:
- click through to primary sources for anything you plan to act on
- treat high momentum with low origin breadth as a caution flag
- treat "why it matters" as implications unless the evidence clearly supports a factual claim
- remember that the system measures attention and spread, not objective truth
12. Current limitations
As an evolving product, you should expect:
- coverage depends on configured sources and can be expanded over time
- some platforms require curated approaches or special access
- signal clusters can split or merge as new information arrives
- thresholds and scoring evolve as the system is tuned
Appendix: Glossary
- Source: a single input stream
- Post: the normalized evidence unit
- Signal: a cluster of related posts in a timeframe
- Storyline: a continuity layer above signals
- Evidence mix: distribution across platforms and source types
- Origin breadth: proxy for independent confirmation
- Concentration: dominance of the top origin or publisher
- Momentum: acceleration of attention over time
- Evidence-first: every signal should point to underlying sources (URLs + timestamps).
- Transparency over hype: show diversity, concentration, duplicates, and confidence signals.
- Multi-source confirmation matters: patterns across multiple publishers are stronger than single-source spikes.
- No recommendations: metrics describe attention and evidence structure, not what to do.
- Collect public posts from configured sources (e.g. news feeds and social platforms).
- Normalize and canonicalize URLs, extract origin domains, and enrich metadata.
- Deduplicate near-identical content to reduce amplification bias.
- Cluster related posts into signals and compute metrics per time window.
- Render weekly/daily briefings from the same payload shown in the UI.
Overall signal score for this signal in the selected window. Higher means more evidence/consistency, not a prediction.
Change in signal activity/score over the last 24 hours. Higher means accelerating attention, not performance.
Number of items included in the signal cluster for this window.
Unique origin domains/publishers (original sources) contributing to this signal. Higher = broader origin coverage.
Unique publishers/accounts observed (can include amplifiers and re-posters). Higher = broader publisher participation.
Share of near-duplicate items in the cluster. Higher can indicate repetition or amplification.
Concentration of origins: the top origin's share of items. Higher means fewer origins dominate the signal.
How many source types are represented (e.g., news vs social). Higher means more mixed evidence types.
Earliest post timestamp linked to a signal across all runs.
Latest post timestamp in the most recent snapshot for that signal.
Heuristic maturity/confidence score derived from breadth and consistency signals. Not a recommendation.
- Open the evidence drawer and click source links; verify publisher/domain diversity.
- Check concentration (top origin share) to see whether one outlet dominates.
- Check duplicate ratio to spot copy-paste waves and amplification.
- Use momentum to track acceleration; treat it as attention velocity, not outcome.
- Coverage depends on configured sources and can miss important context.
- Metrics are heuristic and can be noisy; always validate via sources.
- Summaries are constrained to observed content; they are not exhaustive reporting.
For a live example, open the signals dashboard.
