HALT: Hallucination Assessment via Log-probs as Time series

Overview

Hallucinations remain a major obstacle for large language models (LLMs), especially in safety-critical domains. HALT (Hallucination Assessment via Log-probs as Time series) is a lightweight hallucination detector that uses only the top-20 token log-probabilities from LLM generations as a time series. Unlike white-box methods, HALT does not require access to hidden states or attention maps; unlike typical black-box methods, it operates on log-probabilities rather than surface-form text, enabling stronger domain generalization and compatibility with proprietary LLMs without access to internal weights.

HALT uses a gated recurrent unit (GRU) combined with entropy-based features to learn model-specific calibration bias—how a model’s confidence patterns relate to correctness. We also introduce HUB (Hallucination detection Unified Benchmark), which unifies prior datasets into ten capabilities: reasoning tasks (Algorithmic, Commonsense, Mathematical, Symbolic, Code Generation) and general-purpose skills (Chat, Data-to-Text, Question Answering, Summarization, World Knowledge). HUB covers both factual and logical hallucinations (e.g., flawed reasoning traces). While being about 30× smaller, HALT outperforms Lettuce, a finetuned ModernBERT-base encoder, and achieves roughly 60× speedup on HUB. We release two variants, HALT-L (trained on Llama 3.1-8B log-probabilities) and HALT-Q (Qwen 2.5-7B), showing that compact sequence models can capture temporal uncertainty patterns that aggregate confidence metrics miss.

arXiv preprint (2026)

Technologies and Tools

LLMs, Time-series Classification, GRU, Calibration Modeling, Token Log-probabilities, Entropy-based Features, vLLM, PyTorch, HUB Benchmark

Team

Ahmad Shapiro, Karan Taneja, Ashok Goel (Georgia Institute of Technology)

Share on

Twitter Facebook LinkedIn

Overview

Related Papers

Technologies and Tools

Team

Share on