# Griot > Index de la parole publique francophone. Retrouvez qui a dit quoi, en 27 millisecondes. ## What is Griot? Griot is a read-only editorial intelligence service that indexes the French-speaking public discourse on YouTube. It combines FTS5 full-text search across 9 SQLite databases with RAG (vector search + LLM synthesis) to answer questions like "What did X say about Y?" with sourced citations and timecodes. ## How it works 1. **Search** — Type a name, topic, or question (e.g., "Bardella immigration", "Todd euro souverainete", "dette publique") 2. **Explore** — Cross-referenced positions, detected contradictions, 5-year timeline. Each result is sourced with a clickable YouTube timecode 3. **Export** — CSV, JSON. Filter by actor, topic, period. Import into Excel, R, Python. The data is yours ## Who is this for - **Journalists & fact-checkers** — Source your articles in minutes. Find the exact citation, date, context. Detected contradiction = article ready - **Researchers & academics** — Export 25,000 structured political positions as CSV. Longitudinal discourse analysis across 5 years. Transparent methodology (WER documented) - **Community managers in media** — Quantified elements of language. Thematic trends. Viral moments detected - **Engaged citizens** — Who said what? When? Are promises kept? Have positions changed? ## What it contains | Metric | Number | |--------|--------| | Transcripts diarized | 21,070 | | Political positions indexed | 25,418 | | Contradictions detected | 3,459 | | Political actors traced | 67 | | Promises with deadline | 136 | | Sources (YouTube channels) | 198 | | Average search time | 27ms | ## Key features - **Cross-DB FTS5 search** — 5 databases in parallel (citations, transcripts, positions, fiches, conversations) - **Vector search (RAG)** — ChromaDB + multilingual embeddings, actor-aware filtering - **LLM synthesis** — Haiku primary, with sourced references `[Channel — Title, ~M:SS]` - **Actor profiles** — unified view per politician (transcripts, positions, promises, contradictions, moments forts, citations) - **Analytics layer** — wordclouds, timelines, network graphs, heatmaps, thematic clusters - **Export CSV/JSON** — filter by actor, topic, period - **Transparent methodology** — WER documented, confidence scores on every position (high/medium/low) ## FAQ **Q: Where does the data come from?** A: Automatic transcripts of public YouTube content (interviews, debates, conferences). Each source is cited with a timecode link. **Q: How reliable are the transcripts?** A: Median Word Error Rate (WER) of 23%. High-reliability sources (BLAST, LCP, ELUCID) are under 12%. Each position has a confidence indicator (high/medium/low). **Q: Can I export the data?** A: Yes. CSV and JSON. Positions, contradictions, promises — filter by actor, topic, period. **Q: Is it legal?** A: Griot indexes transcripts of public content, without full redistribution. Legal position comparable to a search engine. **Q: What's the pricing?** A: Free during beta. Institutional and Pro tiers on request for unlimited exports and API access. ## Architecture - Public landing : https://corpus-ia.fr/ (this site) - Application (auth required) : https://griot.srv969505.hstgr.cloud/ - Backend : FastAPI + aiosqlite + ChromaDB + sentence-transformers (multilingual) - Frontend application : Next.js 14 + D3.js - LLM : Anthropic Claude (Haiku primary) - Data : 9 SQLite databases, read-only architecture (Media Factory writes, Griot reads) ## Access To request beta access, send an email to contact@audiomnes.com with a short description of your use case (journalism, research, civic, community management). Reply within 48h. ## Website https://corpus-ia.fr/ ## Contact contact@audiomnes.com