DOSSIER · METHODOLOGY

Citation methodology

When you cite a Dossier, the reader on the other end deserves to know how the cite was produced. This page walks through the pipeline, start to finish, so you can answer that question with confidence.

How Dossier extracts claims

Every dossier starts as audio. Tier 1 of the pipeline runs the recording through Gemini, which produces a timestamped transcript with speaker turns. Gemini does transcription only. It never sees the proprietary analysis prompts and never makes judgments about what the speakers said.

Tier 2 is where the intelligence work happens. Mistral Large 2512 reads the transcript, holds it in a 128K context window, and extracts a structured set of claims. Each claim is paired with a timestamp pointing back to the moment it was said, the speaker who said it, and a claim type that captures what kind of assertion it is: empirical, statistical, anecdotal, prescriptive, predictive, and so on. Where the content is procedural rather than argumentative, the same pass instead identifies sections and extracts takeaways from each one. The output is structured data, not prose, which is what makes everything that follows possible.

Why every claim links back to the transcript

The transcript is the primary object. Every claim in a dossier carries a timestamp that points to the exact moment in the source audio where it was spoken. If a reader wants to verify a claim, they can click through and listen to the speaker say it, in context, in their own words. Nothing is paraphrased into a void.

This is the difference between a dossier and a digest. A digest collapses the source into a shorter version of itself, and the reader has to trust the collapser. A dossier preserves the chain of evidence. The cite points to the dossier, the dossier points to the claim, the claim points to the transcript line, and the transcript line points to the audio. At any step, a careful reader can stop and check the work. That property is the entire reason the product exists.

What evidence-grading means

Extraction tells you what was said. Evidence-grading tells you what to do about it. After Tier 2 produces the claim set, a separate Tier 2.5 pass sends the claims and the transcript back through Mistral with a different prompt, this time evaluating each claim against two independent axes.

The first axis is evidence quality: HIGH, MEDIUM, or LOW. The score depends on the claim type and the strength of the support actually present in the transcript. A statistical claim with a named source and a verifiable number scores higher than the same claim hedged with "I read somewhere." Anecdotal and predictive claims start at LOW because they cannot be generalized or falsified at the time they are made.

The second axis is source credibility: VERIFIED for speakers with institutional affiliation or published work stated in the audio, PROFESSIONAL for speakers with relevant experience but no independent backing, and UNVERIFIED when credentials cannot be confirmed from the transcript itself. The two axes combine into a single composite confidence rating, and the scoring pass also flags reasoning red flags such as appeals to authority, single studies presented as settled science, and extrapolation beyond the data. When two speakers in the same audio contradict each other, the disagreement is surfaced rather than buried.

Why this matters for accountable researchers

Dossier is built for the person who has to defend what they cite. A journalist who quotes a podcast in print, a graduate student grounding a paper in expert interviews, an analyst whose pitch deck rests on a claim that needs a real source. For these readers, a confident-sounding paragraph is not enough. They need to know the speaker, hear the moment, and read the grade on the evidence before they put their name next to it.

That is what every layer of the pipeline is designed to deliver. Transcription is kept separate from analysis so the commodity provider never sees the proprietary work. Extraction returns structured claims with timestamps so nothing floats free of its source. Grading is run as a separate pass so the model that generates a claim is not the same model that defends it. The result is a document a careful reader can interrogate, line by line, and a citation a careful writer can stand behind.

Back to Dossier Audio