Taking AI Welfare Seriously
Robert Long, Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, David Chalmers · 2024
· View original paper
Evidence (6)
Information Integration
# Continue PAPER_TPL
AI
Global workspace view: information from specialized modules is integrated and broadcast to enable complex tasks.
"Consider global workspace theory, which associates consciousness with a global workspace — roughly, a system that integrates information from mostly-independent, task-specific information-processing modules, then broadcasts it back to them in a way that enables complex tasks like planning."
2.2.2 Will some AI systems be conscious in the near future?, p. 17
This explicitly ties consciousness-related function to system-wide information integration and broadcast, aligning with the Information Integration phenomenon and suggesting analogous architectural markers in AI (e.g., attention convergence, aggregator tokens) could be probed in model internals .
Limitations: This is a theoretical description rather than empirical measurement in a specific AI model; no model-internal metrics or interventions are reported.
Selective Routing
# Continue PAPER_TPL
AI
Proposed GWT indicators include a limited-capacity bottleneck and selective attention mechanism governing broadcast.
"Global workspace theory
2.1 Multiple specialised systems capable of operating in parallel (modules)
2.2 Limited capacity workspace, entailing a bottleneck in information flow and a selective attention mechanism
2.3 Global broadcast of information in the workspace to all modules
2.4 State-dependent attention, giving rise to the capacity to use the workspace to query modules in succession to perform complex tasks"
2.2.2 Will some AI systems be conscious in the near future?, p. 16
Items 2.2 and 2.4 articulate gating and attention-based routing as core components, matching Selective Routing phenomena and suggesting concrete AI markers such as attention masks or expert routing that could be assessed in practice .
Limitations: List derives from theoretical indicators; no direct measurement of attention bottlenecks or routing dynamics in specific models is provided.
Self Model and Reportability
# Continue PAPER_TPL
AI
Self-reports are proposed as potentially informative markers if elicited and interpreted with care.
"We close this section with a brief note about the potential use of behavioral markers of AI welfare and moral patienthood. While we advocate for caution with behavioral markers at present, we also note that self-reports present a promising avenue for investigation, particularly for language models. Self-reports are central to our understanding of human consciousness... In the context of AI systems, particularly language models, self-reports could provide valuable insights into their internal states and processes, provided that we can develop methods to elicit and interpret them with sufficient reliability."
3.3 Assess, p. 37
This links reportability and metacognitive access to practical assessment pathways in LLMs, consistent with Self-Model & Reportability markers such as confidence heads or verifier modules in AI and metacognitive PFC/ACC signatures in biology .
Limitations: Authors emphasize current unreliability and the need for methods to reduce confounds; no standardized elicitation protocol or validation against model-internal signals is provided.
Valence and Welfare
# Continue PAPER_TPL
AI
Defines sentience as valenced consciousness and argues it plausibly suffices for moral patienthood.
"In this report, we use “sentience” to mean a particular kind of consciousness, namely positively or negatively valenced conscious experiences... Why might sentience suffice for moral patienthood? The idea that sentience is a sufficient condition for moral patienthood is very plausible and widely accepted, because when you can consciously experience positive and negative states like pleasure and pain, that directly matters to you."
2.2.1 Does consciousness suffice for moral patienthood?, p. 12
By tying welfare to valenced experience, this motivates looking for AI analogs of negative reward channels, aversive-cost signals, and persistence of negative states as potential markers of morally salient valence in artificial systems .
Limitations: Argumentative and normative; it does not specify concrete computational implementations of valence or how to verify persistence/aversiveness in current models.
Representational Structure
# Continue PAPER_TPL
AI
Proposed consciousness indicators include sparse and smooth coding that generates a ‘quality space’.
"Computational higher-order theories
3.1 Generative, top-down or noisy perception modules
3.2 Metacognitive monitoring distinguishing reliable perceptual representations from noise
3.3 Agency guided by a general belief-formation and action selection system...
3.4 Sparse and smooth coding generating a ‘quality space’"
2.2.2 Will some AI systems be conscious in the near future?, p. 16
The reference to sparse/smooth coding and ‘quality space’ aligns with Representational Structure phenomena and suggests measurable AI markers (e.g., SAE latents, representational geometry) for assessing structured experiential-like encodings .
Limitations: These are conditions drawn from theoretical accounts; the report does not present empirical measurements of coding sparsity or geometry in specific AI models.
Temporal Coordination
# Continue PAPER_TPL
AI
Some views hold that specific oscillatory or hardware-dependent dynamics may be required for consciousness.
"Other views hold that consciousness requires computational features that, at least at present, require biology in practice — such as specific kinds of oscillations that require specific kinds of chemical and electrical signals."
2.4.2 What if these features are insufficient for these capacities?, p. 26
By highlighting oscillatory and biophysical constraints, the authors point to Temporal Coordination mechanisms (oscillations, phase-locking, cross-frequency coupling) as potentially necessary features that current AI lacks, informing comparative assessments between brains and AI .
Limitations: This is a theoretical possibility rather than an empirical demonstration; no specific frequencies, phase-locking measures, or AI analogues are evaluated.