# Griot

> Index de la parole publique francophone. Retrouvez qui a dit quoi, en 27 millisecondes.

## What is Griot?

Griot is a read-only editorial intelligence service that indexes the French-speaking public discourse on YouTube. It combines FTS5 full-text search across 9 SQLite databases with RAG (vector search + LLM synthesis) to answer questions like "What did X say about Y?" with sourced citations and timecodes.

## How it works

1. **Search** — Type a name, topic, or question (e.g., "Bardella immigration", "Todd euro souverainete", "dette publique")
2. **Explore** — Cross-referenced positions, detected contradictions, 5-year timeline. Each result is sourced with a clickable YouTube timecode
3. **Export** — CSV, JSON. Filter by actor, topic, period. Import into Excel, R, Python. The data is yours

## Who is this for

- **Journalists & fact-checkers** — Source your articles in minutes. Find the exact citation, date, context. Detected contradiction = article ready
- **Researchers & academics** — Export 25,000 structured political positions as CSV. Longitudinal discourse analysis across 5 years. Transparent methodology (WER documented)
- **Community managers in media** — Quantified elements of language. Thematic trends. Viral moments detected
- **Engaged citizens** — Who said what? When? Are promises kept? Have positions changed?

## What it contains

| Metric | Number |
|--------|--------|
| Transcripts diarized | 21,070 |
| Political positions indexed | 25,418 |
| Contradictions detected | 3,459 |
| Political actors traced | 67 |
| Promises with deadline | 136 |
| Sources (YouTube channels) | 198 |
| Average search time | 27ms |

## Key features

- **Cross-DB FTS5 search** — 5 databases in parallel (citations, transcripts, positions, fiches, conversations)
- **Vector search (RAG)** — ChromaDB + multilingual embeddings, actor-aware filtering
- **LLM synthesis** — Haiku primary, with sourced references `[Channel — Title, ~M:SS]`
- **Actor profiles** — unified view per politician (transcripts, positions, promises, contradictions, moments forts, citations)
- **Analytics layer** — wordclouds, timelines, network graphs, heatmaps, thematic clusters
- **Export CSV/JSON** — filter by actor, topic, period
- **Transparent methodology** — WER documented, confidence scores on every position (high/medium/low)

## FAQ

**Q: Where does the data come from?**
A: Automatic transcripts of public YouTube content (interviews, debates, conferences). Each source is cited with a timecode link.

**Q: How reliable are the transcripts?**
A: Median Word Error Rate (WER) of 23%. High-reliability sources (BLAST, LCP, ELUCID) are under 12%. Each position has a confidence indicator (high/medium/low).

**Q: Can I export the data?**
A: Yes. CSV and JSON. Positions, contradictions, promises — filter by actor, topic, period.

**Q: Is it legal?**
A: Griot indexes transcripts of public content, without full redistribution. Legal position comparable to a search engine.

**Q: What's the pricing?**
A: Free during beta. Institutional and Pro tiers on request for unlimited exports and API access.

## Architecture

- Public landing : https://corpus-ia.fr/ (this site)
- Application (auth required) : https://griot.srv969505.hstgr.cloud/
- Backend : FastAPI + aiosqlite + ChromaDB + sentence-transformers (multilingual)
- Frontend application : Next.js 14 + D3.js
- LLM : Anthropic Claude (Haiku primary)
- Data : 9 SQLite databases, read-only architecture (Media Factory writes, Griot reads)

## Access

To request beta access, send an email to contact@audiomnes.com with a short description of your use case (journalism, research, civic, community management). Reply within 48h.

## Website

https://corpus-ia.fr/

## Contact

contact@audiomnes.com