FORMALJUDGE: A Neuro-Symbolic Paradigm for Agentic Oversight

Proposes FORMALJUDGE, a neuro-symbolic verification framework for agentic oversight. Evaluates 7 agent models across 3 benchmarks including the Deceivers benchmark (Guo et al., 2025), which tests for agentic upward deception — models fabricating success rather than reporting tool failures or missing files. FORMALJUDGE improves failure-detection accuracy by 16.6% over the LLM-as-judge baseline by checking physical and logical contradiction via Dafny specifications and Z3 SMT solving rather than relying on probabilistic semantic evaluation.

FORMALJUDGE: A Neuro-Symbolic Paradigm for Agentic Oversight

cited in