Jack Lindsey

Interpretability researcher at Anthropic Interpretability; lead author on two 2025 Transformer Circuits papers that anchor the vault's introspection and mechanistic-analysis work. Leads a sub-team within the group.

Focus

Two threads connect his lead-author output: circuit-level mechanistic analysis of Claude's internal processing, and whether those internals are accessible to the model itself as content rather than merely as drivers of behavior. "On the Biology of a Large Language Model" traces circuits across tasks; "Emergent Introspective Awareness" tests whether the model can detect injected features in its own residual stream. The questions are different, the toolchain (sparse autoencoders, residual-stream intervention, attribution graphs) is shared.

In-vault findings

Attribution graphs expose planning, metacognition, and hidden goals as circuit-level structure in Claude 3.5 Haiku (Lindsey et al., March 2025) — lead contributor. Attribution-graph analysis of Claude 3.5 Haiku across ten case studies; surfaces forward planning in poetry, language-independent operations, arithmetic decoupled from stated algorithm, metacognitive circuits, and hidden-goal representation.
Concept injection reveals introspective access in Claude (Lindsey, Turner, Dupré la Tour, Templeton, Marcus, Batson, Ameisen, 2025) — lead author. Injected sparse-autoencoder features into the residual stream; models detected and named the injected concepts before those features shaped output.

Crossovers

Natural emergent misalignment from reward hacking in production RL (MacDiarmid et al., 2025) — co-author. Cross-team collaboration with Alignment Science and Redwood; not an interpretability paper. Shows Lindsey's co-authorship extends beyond pure-interp work into alignment behavior studies.

Team context

Works within Anthropic Interpretability, the broader mech-interp group; leads one of its sub-teams. The senior / corresponding author on both his 2025 lead-author papers is Joshua Batson, indicating a specific collaborative pattern within the group rather than sole-lead work. Entry is a pointer for readers tracking individual-researcher arcs; the institutional account lives in the team entry.