FactTrace Hackathon · University of Cambridge · 31 Jan 2026
Team: Julia D, Iuliia V, Paulina C, Raj.
Award: Winning team.
We built a multi-agent fact-verification system where AI agents disagree, argue, and negotiate to determine whether an external claim is a faithful representation of a source fact or a mutation.
Instead of a single black-box verdict, our system exposes the reasoning process through an adversarial, courtroom-style debate.
🔗 Project visualization demo:
👉 https://v0.app/t/KEPLER_FACTTRACE_DEMO
(interactive UI showing agent debates, verdicts, and confidence)
Based on a dataset of claims and truthful affirmations (link), this system addresses the challenge: "Is an external claim a faithful representation of an internal fact, or is it a mutation?"
Our system employs 4 specialized agents in an adversarial debate architecture:
- Prosecutor - Aggressively hunts for mutations, distortions, and misrepresentations
- Defense - Argues for faithful interpretation and semantic equivalence
- Epistemologist - Quantifies uncertainty and identifies ambiguous cases
- Jury Foreman - Synthesizes arguments and delivers the final verdict
- Adversarial Design: Forces consideration of multiple perspectives
- Transparent Reasoning: Full debate transcripts show the decision-making process
- Uncertainty Quantification: Explicitly identifies ambiguous cases
- Multi-Round Debates: Agents can challenge and respond to each other's arguments
- Mutation Detection: Identifies 8 types of claim mutations (numerical distortion, missing context, causal confusion, etc.)
- Python: 3.11 or higher
- OpenAI API Key: Required for running the agents
# Create environment from environment.yml
conda env create -f environment.yml
# Activate environment
conda activate hackathon# Set your OpenAI API key as an environment variable
export OPENAI_API_KEY='your-api-key-here'Or create a .env file in the project root:
OPENAI_API_KEY=your-api-key-here
Navigate to the kepler directory:
cd keplerRun with pre-selected strategic cases that showcase different mutation types:
python main.pyThis will analyze 5 carefully selected cases demonstrating:
- Numerical boundary manipulation
- Added information
- Negation framing
- Borderline rounding
- Faithful representation
# Run specific cases by index
python main.py --cases 0,1,2
# Interactive case selection
python main.py --interactive
# Run all cases (expensive!)
python main.py --allCompare the multi-agent system against a single-agent baseline:
cd kepler
python compare_systems.pyThis will:
- Run both single-agent and multi-agent systems on the same cases
- Generate a detailed comparison report (
comparison_report.md) - Export results to JSON files for further analysis
- Show verdict agreements/disagreements and confidence differences
cambridge-dis-hackathon/
├── kepler/ # Main source code
│ ├── agents.py # Multi-agent debate system
│ ├── main.py # Primary entry point
│ ├── compare_systems.py # Single vs multi-agent comparison
│ ├── single_agent_baseline.py # Simple baseline for comparison
│ ├── visualize.py # Visualization and export utilities
│ ├── demo.py # Demo script
│ ├── export_comparison_data.py # Data export utilities
│ ├── export_debates.py # Debate transcript export
│ ├── view_raw_responses.py # View raw agent responses
│ ├── Kepler.csv # Dataset (claim-truth pairs)
│ └── requirements.txt # Python dependencies
├── requirements.txt # Root dependencies
├── environment.yml # Conda environment specification
├── README.md # This file
├── Instructions.md # Hackathon instructions
├── LICENSE # License file
└── *.json # Output files (results, debates, etc.)
- FAITHFUL: The external claim accurately represents the internal fact
- MUTATED: The claim distorts, exaggerates, or misrepresents the fact
- AMBIGUOUS: Genuine uncertainty exists; reasonable interpretations differ
- Numerical Distortion: Changed numbers or statistical boundaries
- Missing Context: Omitted crucial contextual information
- Causal Confusion: Misrepresented cause-effect relationships
- Exaggeration: Amplified or dramatized claims
- Scope Change: Altered the scope or generality of the claim
- Temporal Mismatch: Changed time references or periods
- Added Information: Introduced details not in the source
- Negation Framing: Reframed using negation (e.g., "failed to" vs "did not")
FINAL VERDICT: AMBIGUOUS (80% confidence)
REASONING: The external claim closely approximates the original death toll
figure with minor inequality inversion and omission of additional
epidemiological data...
PROSECUTOR ARGUMENTS:
- Inverts inequality direction from lower bound to upper bound
- Removes broader epidemiological context
DEFENSE ARGUMENTS:
- Uses close numerical figure within narrow range
- Focusing on death toll is common journalistic practice
EPISTEMOLOGIST ANALYSIS:
- Core uncertainty: Whether inequality inversion constitutes meaningful
distortion or acceptable paraphrasing
Running the system generates several output files:
debate_results.json- Full debate results with all agent responsesmulti_agent_results.json- Multi-agent system resultssingle_agent_results.json- Single-agent baseline resultsvisualization_data_*.json- Data for visualizations
- Adversarial Testing: Prosecutor and Defense challenge each other
- Bias Reduction: Multiple perspectives prevent single-viewpoint bias
- Calibrated Confidence: Debate leads to more realistic confidence scores
- Transparent Process: Full debate transcript enables human oversight
- Nuanced Analysis: Multi-round exchanges capture subtle distinctions
# Export debate transcripts
python export_debates.py