ch-ai-tanya model-psychology LLM wiki

Alignment researcher at Anthropic Alignment Science; senior or first author on five LLM wiki findings spanning persona dynamics, belief implantation, activation-level auditing, and alignment-auditing methodology. Frequent collaborator with Jack Lindsey on persona and introspection work.

Focus

A methodology cluster across activations, beliefs, and behaviors: the Persona Selection Model frames how pre-training simulations get narrowed by post-training; synthetic-document fine-tuning operationalizes a per-belief intervention surface; the hidden-objectives auditing game operationalizes alignment auditing as a discipline practiced by red/blue teams; activation oracles and introspection adapters operationalize black-box verbalization of internal state. The recurring move is treating activations and training distributions as the operative loci for interpreting and intervening on model dispositions, rather than circuit-level analysis.

In-wiki findings

Crossovers

Team context

Works in Anthropic Alignment Science. The Marks–Lindsey pair appears on five LLM wiki findings — hidden-objectives auditing (March 2025), PSM, and introspection-adapters from this entry; persona-vectors and emotions from Lindsey's entry — indicating a working partnership across the alignment-science / interpretability boundary at Anthropic. The hidden-objectives paper and PSM together form a paired arc: hidden-objectives provides the per-mechanism evidence (assistant persona conceals; other personas leak; assistant-token SAE features carry the hidden objective), PSM organizes that evidence into a theoretical framework (post-training narrows a pre-training prior over personas to an Assistant posterior). Persona-vectors then operationalize per-trait monitoring and control on that mechanism. Marks' span across belief-level (SDF), persona-level (PSM), auditing-methodology (hidden-objectives), and verbalization-level (AOs, IAs) interventions makes the entry a useful hub for tracing Anthropic's activation-grounded auditing program.