CyberChitta
CyberChitta
ch-ai-tanya model-psychology vault

Jack Lindsey

Interpretability researcher at Anthropic Interpretability; lead author on two 2025 Transformer Circuits papers that anchor the vault's introspection and mechanistic-analysis work. Leads a sub-team within the group.

Focus

Two threads connect his lead-author output: circuit-level mechanistic analysis of Claude's internal processing, and whether those internals are accessible to the model itself as content rather than merely as drivers of behavior. "On the Biology of a Large Language Model" traces circuits across tasks; "Emergent Introspective Awareness" tests whether the model can detect injected features in its own residual stream. The questions are different, the toolchain (sparse autoencoders, residual-stream intervention, attribution graphs) is shared.

In-vault findings

Crossovers

Team context

Works within Anthropic Interpretability, the broader mech-interp group; leads one of its sub-teams. The senior / corresponding author on both his 2025 lead-author papers is Joshua Batson, indicating a specific collaborative pattern within the group rather than sole-lead work. Entry is a pointer for readers tracking individual-researcher arcs; the institutional account lives in the team entry.