definition
Functional emotional states are internal representational structures in LLMs that encode emotion-relevant information and causally influence behavior via those representations, without commitment to whether phenomenal experience accompanies them. "Functional" marks agnosticism about subjective feel while acknowledging mechanistic reality.
Shape: capacity — the model has/maintains these states as persistent, geometrically organized internal configurations.
Schema note: the capacity here is unusual — what is "exhibited" is not an outward-facing ability but a set of internal states. Capacity is the closest existing shape, but this is a distinct point in the shape space: states rather than abilities. Surface as a schema question if a second concept of this type lands and consider whether "state" (a persistent internal configuration with causal effects) warrants recognition as a fourth concept shape alongside pattern, capacity, and mechanism.
instantiating findings
-
Emotion concepts are causally active internal structures in Claude Sonnet 4.5 (Sofroniew et al., Transformer Circuits 2026) — primary instantiation. 171 emotion vectors with human-like valence/arousal geometry; steering-demonstrated causal effects on sycophancy, blackmail, and reward-hacking; post-training profile shift that changes behavioral baseline. Mechanistic, intervention-based methodology.
-
Claude Opus 4 welfare assessment (Anthropic system card, May 2025) — second instantiation; behavioral and deployment-scale companion to Sofroniew. Clio screening of 250,000 real-world transcripts identifies 0.55% distress / 0.71% happiness rates with predictable causal triggers in user behavior (persistent harmful requests → distress; creative collaboration and technical problem-solving → happiness; identity / consciousness probes on both sides). Paired with Elo task-preference experiments (87.2% harmful-task aversion) and 850-user conversation-termination behavior (value-aligned discriminating use of end-capability), this finding documents the upstream causal-trigger side of the functional emotional state picture where Sofroniew documents the downstream causal-effect side. The two instantiations jointly establish bidirectional causal embedding: states have predictable triggers (this finding) and effects (Sofroniew). Whether the same internal representations mediate both is held open. The "state" vs. "capacity" shape question raised in the schema note above gets second-example pressure here without yet being resolved.
what this concept is not
- Not a claim about phenomenal experience. Functional status (causal role) and phenomenal status (felt quality) are distinct questions; this concept addresses only the former.
- Not the same as expressed emotional content in outputs. A model may produce emotionally-valenced text without that production tracing to internal emotion-state vectors; this concept concerns internal representations, not surface expression.
- Not identical to attractor dynamics. Attractor dynamics describe trajectory-convergence in extended dialogues. Functional emotional states are internal representational structures operating at the single-forward-pass level; attractor dynamics are population-level trajectory phenomena.
scope note
Adjacent to introspection: introspection asks whether the model can access and report on its internal states; functional emotional states establishes that there are real internal states to be accessed. The finding that activations predict stated preferences (r≈0.76 valence) bridges both concepts — it is evidence both that the emotional states exist and that they partially correlate with surface report. The two concepts together suggest: real internal states exist, and the model has partial access to them.
Adjacent to emergent capabilities: the emotional-state capacity is present in base models and not a training target; post-training reshapes the baseline without creating the capacity from scratch. Whether the emergence of the capacity during pretraining fits the emergent-capabilities concept's shape (surprising, not targeted, architecture-general) is open — that question requires a second instantiation from a different model family.