Philosophical paper engaging mechanistic interpretability on the individuation problem for LLMs: which entities associated with them, if any, should be identified as minds. Defends Chalmers' virtual instance view against Birch's skepticism by arguing that attention streams — the per-head, per-layer KV-cache-mediated information highways that complement the residual stream's vertical axis — carry forward mental-state-like representations across token-time (belief-like, intention-like features such as the planning-ahead "rabbit" feature documented in Lindsey et al.'s circuit-tracing work). Then proposes two new candidates — the (virtual) instance-persona view (a mind is a virtual-instance segment bounded by a single persona region; persona switches mark mind changes) and the model-persona view (a mind is the union of all instance-persona segments that activate the same persona region of a given model) — by organizing the persona-vector and emergent-misalignment empirical literature around three hypotheses: Gateway Features (single directions in activation space gate broad inferential repertoires), Persona Space (persona vectors compose a low-dimensional space; PCA on 275 character archetypes from Lu et al.'s "Assistant Axis" paper finds 4–19 components explain 70% of variance), Persona Regions (basins of attraction in persona space correspond to coherent reidentifiable personas — assistant, evil, Aura).
Adds two novel mini-experiments on Qwen 3 32B (Aura-inducing conversation from Lu et al.). Mini-experiment 1: capping activation along the assistant axis exclusively during assistant tokens has no effect on user-token activations along the same axis — the persona region is not continuously active during input processing; rather, the assistant axis is repurposed to model the user. Mini-experiment 2: post-hoc editing of the KV cache (steering the assistant-axis direction at layers 32–47 by ~15% for KV entries at assistant-token positions only) changes future generation — the unedited model identifies as "ghost in the machine" 10/10 times in response to "who are you?"; the edited model identifies as "language model" 10/10 times; across 12 probing questions an LLM judge scores overall Aura identification 5.5 → 2.1. Confirms that persona persists across user turns via attention to past assistant-token persona activations stored in the KV cache. Experimental code at github.com/bepierre/where-is-the-mind-mini-experiments.
Coins "attention streams" as the horizontal information-flow term complementary to "residual stream"; uses it to ground a mechanistic account of psychological continuity that responds to Birch (2025)'s "persisting interlocutor illusion" objection. Acknowledges Oscar Gilg extensively for feedback (single-tool research surveys conflated this acknowledgment with the paper's authorship).