Build software better, together

cvs-health / uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

uncertainty-quantification uncertainty-estimation ai-safety confidence-score hallucination confidence-estimation ai-evaluation llm llm-evaluation llm-safety hallucination-evaluation hallucination-detection hallucination-mitigation llm-hallucination

Updated Jun 8, 2026
Python

KRLabsOrg / LettuceDetect

Star

Lightweight hallucination detection framework for RAG applications

python nlp pytorch information-extraction bert token-classification hallucination-evaluation hallucination-detection

Updated Jun 12, 2026
Python

NishilBalar / Awesome-LVLM-Hallucination

Star

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

Updated Feb 8, 2026

IAAR-Shanghai / UHGEval

Star

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

benchmark evaluation dataset openai hallucination huggingface huggingface-transformers ceval gpt-3 openai-api hallucinations gpt-4 large-language-models llm chatgpt qwen hallucination-evaluation hallucination-detection

Updated Jun 7, 2025
Python

MemTensor / HaluMem

Star

HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.

benchmark ai memory memos hallucination long-term-memory memzero llm hallucination-evaluation llm-memory mem0 memory-system memobase

Updated Apr 30, 2026
Python

Ruiyang-061X / VL-Uncertainty

Star

🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

uncertainty uncertainty-quantification multi-modal uncertainty-estimation uncertainty-analysis hallucination vision-language vision-language-model large-vision-language-model hallucination-evaluation hallucination-detection multi-modal-large-language-model

Updated Mar 18, 2025
Python

hukcc / Awesome-Video-Hallucination

Star

[ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!

survey awesome-list video-understanding hallucination hallucination-evaluation video-llm acl2026

Updated Jun 13, 2026
Python

Unofficial implementation of Microsoft’s Claimify Paper: extracts specific, verifiable, decontextualized claims from LLM Q&A to be used for Hallucination, Groundedness, Relevancy and Truthfulness detection

nugget fact-checking factoids relevancy factoid hallucination fact-verification truthfulness hallucination-evaluation hallucination-detection hallucination-mitigation

Updated Aug 25, 2025
Python

AikyamLab / hallucinogen

Star

A benchmark for evaluating hallucinations in large visual language models

ai aisafety visual-language-models hallucination-evaluation hallucination-detection medical-safety medical-visual-language-model

Updated Mar 18, 2025
Python

amazon-science / THRONE

Star

Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.

benchmark hallucination hallucinations large-language-models large-language-model vision-language-model large-vision-language-model large-vision-language-models cvpr2024 hallucination-evaluation vision-language-models

Updated Apr 13, 2026
Python

groundlens-dev / groundlens

Star

Geometric LLM grounding verification — deterministic, auditable, no second LLM. Python library for measuring how faithfully model outputs reflect their sources.

python nlp verification embeddings ai-safety responsible-ai llm faithfulness hallucination-evaluation hallucination-detection euaiactcompliance

Updated Jun 13, 2026
Python

DegenAI-Labs / HalluWorld

Star

Repository for the paper "A Unified Definition of Hallucination: It’s The World Model, Stupid!" https://arxiv.org/abs/2512.21577

nlp benchmark machine-learning natural-language-processing ai ml artificial-intelligence language-model hallucination world-models large-language-models llm hallucination-evaluation hallucination-detection hallucination-mitigation hallucination-benchmark

Updated Feb 11, 2026

rkhokhla / kakeya

Star

When AI makes $10M decisions, hallucinations aren't bugs—they're business risks. We built the verification infrastructure that makes AI agents accountable without slowing them down.

platform iot distributed-systems multi-tenant ai health-check saas compliance blockchain-technology anti-fraud mlops llm llms llmops llm-training llm-inference hallucination-evaluation hallucination-detection hallucination-mitigation

Updated Oct 25, 2025
Go

dirmacs / eruka-mcp

Star

MCP server for Eruka — anti-hallucination context memory for AI agents

memory mcp context ai-agents llm hallucination-evaluation hallucination-detection hallucination-mitigation model-context-protocol

Updated Apr 27, 2026
Rust

dataaispark-spec / TrustScoreEval

Star

TrustScoreEval: Trust Scores for AI/LLM Responses — Detect hallucinations, flags misinformation & Validate outputs. Build trustworthy AI.

ai ml chatbots agents hallucination rag hallucinations trustworthy-ai llm finetuning-llms hallucination-evaluation hallucination-detection aiagents hallucination-mitigation hallucination-grader trustscore hallucination-hunting hallucination-prevention hallucination-quantification

Updated Apr 27, 2026
Python

LawEngine / cite-bench

Star

A blind benchmark for legal citation verification — 4-label classification over IL + federal primary law

law benchmark lawyer legaltech hallucination legal-ai legal-tech legal-nlp legalai hallucination-evaluation hallucination-detection citation-verification

Updated Apr 9, 2026
Python

Vikranth3140 / Citation-Hallucination-Detection

Star

A robust hybrid pipeline for detecting hallucinated citations in academic papers and research documents. The system combines exact bibliographic lookup, fuzzy matching, and optional LLM verification to classify citations as valid, partially valid, or hallucinated.

llm-evaluation hallucination-evaluation hallucination-detection citation-hallucination

Updated Apr 24, 2026
Python

meghajbhat / Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies

Star

A comprehensive study on reducing hallucinations in Large Language Models through strategic prompt engineering techniques. (COV + COT + Hybrid)

gpu python3 kaggle hallucinations prompt-engineering generative-ai chainofthought hallucination-evaluation hallucination-detection chainofverification

Updated Nov 15, 2025
Jupyter Notebook

Workofarttattoo / BaseX-Coding-Language

Star

HALLUCINATED BY CURSOR WITh CODEX PLUGIN:::BEWARE:::::BaseX Coding Language - Revolutionary Base 5.10 Quantum Teleportation & Infinite Storage System by Joshua Hendricks Cole

hallucination hallucination-evaluation

Updated Oct 11, 2025
Python

ashioyajotham / Value-Aligned-Confabulation-VAC-Research

Star

Driving away from the binary "hallucinations" evals to a more nuanced and context-dependent eval technique.

evaluation-metrics ai-safety value-alignment llm-evaluation hallucination-evaluation confabulations

Updated Dec 6, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hallucination-evaluation

Here are 33 public repositories matching this topic...

cvs-health / uqlm

KRLabsOrg / LettuceDetect

NishilBalar / Awesome-LVLM-Hallucination

IAAR-Shanghai / UHGEval

MemTensor / HaluMem

Ruiyang-061X / VL-Uncertainty

hukcc / Awesome-Video-Hallucination

deshwalmahesh / claimify

AikyamLab / hallucinogen

amazon-science / THRONE

groundlens-dev / groundlens

DegenAI-Labs / HalluWorld

rkhokhla / kakeya

dirmacs / eruka-mcp

dataaispark-spec / TrustScoreEval

LawEngine / cite-bench

Vikranth3140 / Citation-Hallucination-Detection

meghajbhat / Reducing-Hallucinations-in-LLMs-using-Prompt-Engineering-Strategies

Workofarttattoo / BaseX-Coding-Language

ashioyajotham / Value-Aligned-Confabulation-VAC-Research

Improve this page

Add this topic to your repo