VerityNgn Methodology Documentation
Version 1.0 | Last Updated: October 23, 2025Executive Summary
VerityNgn is a multimodal AI-powered video verification system that analyzes YouTube videos to assess the truthfulness of claims made within them. The system combines cutting-edge multimodal LLM analysis, counter-intelligence techniques, and probabilistic reasoning to generate comprehensive truthfulness reports. Core Innovation: VerityNgn is the first system to combine:- Frame-by-frame multimodal video analysis (1 FPS sampling)
- YouTube counter-intelligence (analyzing review videos for contradictory evidence)
- Press release bias detection and penalty system
- Probabilistic truthfulness assessment with evidence weighting
Table of Contents
- System Architecture
- Multimodal Analysis Pipeline
- Claims Extraction Methodology
- Counter-Intelligence System
- Probability Distribution Model
- Evidence Classification & Weighting
- Verification Algorithm
- Report Generation
- Limitations & Future Work
1. System Architecture
Overview
VerityNgn follows a pipeline architecture with six main stages:Technology Stack
- Multimodal LLM: Google Gemini 2.5 Flash (via Vertex AI)
- Context Window: 64K tokens for comprehensive analysis
- Video Processing: yt-dlp for video download and metadata
- Web Search: Google Custom Search API
- Framework: LangGraph for workflow orchestration
- Storage: Local filesystem or Google Cloud Storage
2. Multimodal Analysis Pipeline
Video Processing Strategy
VerityNgn uses an aggressive frame-sampling approach:Frame-by-Frame Analysis
For each frame, the system analyzes:-
Visual Content:
- On-screen text and graphics
- Demonstrations and visual evidence
- Charts, graphs, and data visualizations
- Product displays or examples
-
Audio Track:
- Spoken statements with timestamps
- Speaker identification
- Tone and delivery analysis
-
Metadata:
- Video title and description
- Channel information
- Upload date and view count
- Comments and engagement metrics
CRAAP Criteria Integration
All claims are evaluated using the CRAAP framework:- Currency: When was the information published?
- Relevance: Is it relevant to the video’s main message?
- Authority: What credentials or expertise is claimed?
- Accuracy: Can it be verified with external sources?
- Purpose: Is it promotional, educational, or persuasive?
Prompt Engineering
The system uses carefully crafted prompts to ensure high-quality claim extraction:- 70% Scientific & Verifiable Claims (studies, statistics, product effectiveness)
- 10% Speaker Credibility Claims (credentials, affiliations, experience)
- 20% Other Verifiable Claims (dates, locations, specific statements)
Output Format
Claims are extracted in structured JSON:3. Claims Extraction Methodology
Claim Quality Criteria
Each extracted claim must meet these requirements:- Specificity: Concrete, not vague or general
- Verifiability: Can be checked against external sources
- Relevance: Central to the video’s message
- Factuality: Stated as fact, not opinion
- Significance: Meaningful for truthfulness assessment
Claim Types
Primary Claims (70%):- Study results: “Study X found Y outcome”
- Statistics: “75% of patients experienced Z”
- Product effectiveness: “Product causes W in N days”
- Scientific findings: “Research shows…”
- Educational credentials: “Dr. X studied at Y”
- Professional experience: “X years in field”
- Institutional affiliations: “Works at Hospital Y”
- Awards and recognitions
- Historical facts
- Geographic information
- Temporal data
- Specific events or occurrences
Anti-Patterns (Avoided)
The system explicitly avoids:- Vague motivational statements
- General health advice without specifics
- Obvious facts not requiring verification
- Micro-claims too granular for meaningful assessment
- Subjective opinions without factual basis
4. Counter-Intelligence System
Overview
VerityNgn’s counter-intelligence system is designed to detect and weight contradictory evidence that challenges claims made in the analyzed video. This is one of the system’s most innovative features.Two-Pronged Approach
A. Press Release Detection
Purpose: Identify self-promotional content that lacks independent validation. Detection Method:- Probability Adjustment: -0.4 (significant negative bias)
- Rationale: Press releases are promotional, not independent verification
- Distribution Impact:
- Reduce TRUE by 60% of adjustment
- Increase FALSE by 40% of adjustment
- Increase UNCERTAIN by 40% of adjustment
B. YouTube Counter-Intelligence
Purpose: Find and analyze review videos that contradict claims in the original video. Search Strategy:-
Counter-Signals:
- ‘scam’, ‘fake’, ‘fraud’, ‘lie’, ‘misleading’
- ‘doesn’t work’, ‘waste of money’, ‘no results’
- ‘red flags’, ‘warning’, ‘beware’, ‘avoid’
- ‘overpriced’, ‘overhyped’, ‘disappointing’
- ‘fabricated’, ‘deceptive’, ‘predatory’, ‘exposed’
-
Supporting Signals:
- ‘works’, ‘effective’, ‘results’, ‘recommend’
- ‘good’, ‘helps’, ‘success’, ‘positive’
- ‘beneficial’, ‘worth it’, ‘legit’
- View Count: Videos with 100K+ views = stronger signal
- Channel Credibility: Established channels weighted higher
- Video Age: Recent videos weighted higher
- Engagement: Like ratio and comment sentiment
Counter-Intelligence in Action
Example Case:5. Probability Distribution Model
Three-State Model
VerityNgn uses a probabilistic approach rather than binary true/false:Base Probability Initialization
Enhancement Factors
The base distribution is adjusted based on five key factors:Factor 1: Evidence Coverage (10-30% boost)
Factor 2: Independent Source Ratio (10-25% boost)
Factor 3: Scientific Evidence (up to 40% boost)
Factor 4: YouTube Counter-Intelligence (-20% impact)
Factor 5: Press Release Penalty (-40% impact)
Normalization
After all adjustments, probabilities are normalized:Verdict Mapping
The continuous probabilities are mapped to human-readable verdicts:- 65% threshold for “likely” verdicts (relaxed from 70% to avoid over-conservatism)
- 70% threshold for “highly likely” verdicts
- Combination with UNCERTAIN allows for nuanced assessment
6. Evidence Classification & Weighting
Source Categories
Evidence is classified into six categories, each with distinct validation power:1. Scientific Sources (Highest Weight: 2.5-4.0)
Characteristics:- Peer-reviewed journals
- Academic institutions (.edu domains)
- Research papers and studies
- Government research agencies
- PubMed/NCBI publications
- Journal articles (Nature, Science, NEJM)
- University research papers
- NIH/CDC reports
2. Independent Sources (High Weight: 1.5-2.0)
Characteristics:- Reputable news organizations
- Fact-checking organizations
- Independent analysis
- No financial stake in outcome
- Reuters, AP News, BBC
- FactCheck.org, Snopes, PolitiFact
- Independent medical websites (Mayo Clinic, WebMD)
3. Government Sources (High Weight: 1.5-2.5)
Characteristics:- .gov domains
- Official government publications
- Regulatory agency reports
- FDA reports
- CDC guidelines
- NIH publications
- FTC consumer alerts
4. Press Releases (Negative Weight: -0.5 to -1.0)
Characteristics:- Self-promotional content
- Newswire services
- Company press statements
- Marketing materials
- PR Newswire articles
- Business Wire releases
- Company blog posts
- Marketing pages
5. YouTube Counter-Intelligence (Variable: 0.5-2.0)
Characteristics:- Review videos
- Debunking content
- Investigation videos
- Consumer experiences
- View count (100K+ = higher weight)
- Channel credibility
- Stance confidence
- Counter-signal strength
6. Other Web Sources (Standard Weight: 1.0)
Characteristics:- General web pages
- Blogs
- Forums
- Social media
Self-Referential Detection
The system identifies “self-referential” sources that cite the original video or its creators:- Separated into
press_releasescategory - Given negative validation power
- Excluded from supporting evidence counts
7. Verification Algorithm
Claim Verification Process
For each extracted claim, the system follows this process:Multi-Factor Probability Calculation
The complete algorithm:8. Report Generation
Report Structure
Each report contains:-
Executive Summary
- Total claims analyzed
- Verdict distribution (Highly Likely True, Likely False, etc.)
- Overall truthfulness assessment
-
Video Information
- Title, channel, duration
- Upload date and view count
- Video embed/thumbnail
-
Detailed Claims Analysis
- Claim text with timestamp
- Speaker identification
- Verification verdict
- Probability distribution
- Detailed explanation
- Source links
-
Evidence Summary
- Categorized by source type
- Validation power indicators
- Quality metrics
-
Counter-Intelligence Section
- Press releases identified
- YouTube review analysis
- Impact on truthfulness scores
Output Formats
HTML Report:- Interactive source links (modal popups)
- Video embed
- Formatted tables
- Visual probability bars
- Clean, readable format
- Source citations with links
- Table-formatted claims analysis
- GitHub-compatible
- Structured data format
- All probabilities and evidence
- Machine-readable
- API-friendly
Explanation Generation
For each claim, the system generates a human-readable explanation: Template:9. Limitations & Future Work
Current Limitations
1. Language Support
- Current: English-only
- Impact: Cannot analyze non-English videos
- Mitigation: Rely on auto-translated transcripts (degraded quality)
2. Visual Understanding
- Current: Text-based analysis of visual content
- Impact: May miss subtle visual cues or demonstrations
- Mitigation: 1 FPS sampling captures most visual information
3. Context Understanding
- Current: Individual claim verification
- Impact: May miss broader narrative context
- Mitigation: Video analysis summary provides context
4. Search API Limitations
- Current: Dependent on Google Custom Search API
- Impact: Results limited by search algorithm bias
- Mitigation: Multiple query strategies, diverse sources
5. Real-Time Updates
- Current: Analysis is point-in-time
- Impact: New evidence published later not included
- Mitigation: Reports include generation timestamp
6. Subjective Claims
- Current: Best for factual, verifiable claims
- Impact: Opinion-based claims harder to verify
- Mitigation: Focus on CRAAP criteria, objective claims
Future Enhancements
-
Multi-Language Support
- Integrate translation APIs
- Support major world languages
- Cross-language evidence validation
-
Enhanced Visual Analysis
- Deeper computer vision integration
- Object detection and recognition
- Facial recognition for speaker verification
-
Real-Time Monitoring
- Continuous evidence updates
- Alert system for new contradictory evidence
- Live verification scores
-
Community Validation
- Crowd-sourced evidence submission
- Expert reviewer network
- Public challenge system
-
Automated Fact-Checking
- Integration with fact-checking databases
- Automated source verification
- Claim similarity detection
-
Enhanced Counter-Intelligence
- Social media analysis (Twitter, Reddit)
- Forum scanning (specialized communities)
- Academic preprint servers
-
Explainability Improvements
- Visual probability explanations
- Interactive evidence exploration
- Confidence intervals on scores
Conclusion
VerityNgn represents a significant advancement in automated video verification technology. By combining multimodal AI analysis, sophisticated counter-intelligence techniques, and probabilistic reasoning, the system provides transparent, evidence-based truthfulness assessments. Key Strengths:- Comprehensive multimodal analysis
- Transparent probability calculations
- Counter-intelligence innovation
- Evidence-based reasoning
- Detailed explanations
- Transparent methodology
- Source attribution
- Probabilistic rather than binary verdicts
- Acknowledgment of uncertainty
- Open documentation
- Review the evidence themselves
- Consider the probability distributions
- Understand the limitations
- Make informed judgments
For Technical Implementation Details, see:
ARCHITECTURE.md- System architectureAPI_DOCUMENTATION.md- API referenceDEPLOYMENT_GUIDE.md- Deployment instructions
papers/verityngn_methodology.pdf- Academic paperpapers/counter_intelligence_techniques.pdf- CI methods paperpapers/probability_model.pdf- Mathematical foundations
This methodology documentation is maintained as part of the VerityNgn open-source project. For updates and contributions, visit the GitHub repository.
