Key concepts and current views on AI welfare

Robert Long, Kyle Fish, Eleos AI Research · 2025 · View original paper

← Back to overview
Evidence (5)
Self Model and Reportability # Continue PAPER_TPL AI
Consistency and reliability of self-reports proposed as part of criteria for credible reportability.
"Verbal reports of consciousness, sentience, and agency that are consistent with each other, and with the system’s capabilities and behaviors. ○ At least as much as humans, the AI system’s self-reports about these issues are not inconsistent under circumstances that should not cause them to vary (like trivial changes in prompt). ○ At least as much as humans, the AI system’s statements about its internal states match up with its capabilities and behaviors (see Perez & Long, 2023, section 10). If it says it has color vision, it can accurately discriminate between different colored things. If it says it feels pain, then it tends to avoid “noxious” stimuli via the equivalents of its “pain” sensors. If it has preferences, these preferences explain its behavior."
What kinds of future systems would update us?, p. 14
The paper proposes reliability and consistency criteria for AI self-reports as part of evaluating consciousness-related properties, aligning with the Self-Model & Reportability phenomenon by operationalizing report pathways and confidence.
Limitations: This is a conceptual proposal rather than an empirical validation; it notes risks of mimicry and prompt-sensitivity and does not specify standardized protocols for eliciting or verifying reports.
Valence and Welfare # Continue PAPER_TPL AI
Differentiates reward-trained behavior from conscious valenced experience and outlines functional roles of valence.
"Sentience involves more than just being trained with positive and negative reward signals (Tomasik, 2014; Schubert, 2014). For one thing, sentience (in the sense discussed here) must somehow involve the conscious representation of positive or negative value. Simple entities that are not plausibly conscious, both artificial and biological, can learn from reward and take actions shaped by reward. Sentience also involves more than just having dispositions to approach or avoid certain things. Conscious valenced experiences might have more specific ways in which they shape behavior—for example, regulating what an entity attends to, or promoting particular kinds of learning (Schukraft, 2020)."
Sentience evaluations, p. 9
By distinguishing reward learning from conscious representation of value and describing behavioral regulation roles of valence, the text targets the Valence & Welfare phenomenon relevant to assessing suffering capacity in AI.
Limitations: No operational test is provided; the discussion highlights conceptual gaps and the nascency of AI valence research.
Representational Structure # Continue PAPER_TPL AI
Highlights the role of evaluative representations as a structural feature linking agency and sentience.
"Agency and sentience are especially closely related: both sentience and agency are about ways in which an entity represents certain things as valuable or disvaluable (“evaluative” representations). Given this close relationship between sentience and agency, research into the nature of evaluative representations in AI systems will be important, regardless of whether this work is classified as evaluating for agency or evaluating for sentience."
Relationship between evaluations for consciousness, sentience, and agency, p. 12
This explicitly foregrounds representational content—evaluative representations—as a structural target for analysis, matching the Representational Structure phenomenon and guiding interpretability goals.
Limitations: Identifies a target (evaluative representations) but offers no concrete interpretability methods or measurement protocols.
Information Integration # Continue PAPER_TPL OTHER
Frames global broadcasting/integration as a candidate computational signature derived from neuroscientific theories.
"For example, global workspace theory identifies consciousness with the global broadcast of information to several otherwise-independent modules in the brain, which allows integration between them."
Overview of the indicator approach, p. 6
By emphasizing global broadcast and integration as functional hallmarks, the text maps neuroscientific markers (global workspace-like activation) to candidate AI indicators for Information Integration.
Limitations: Conceptual linkage only; no specific neural or model-internal measurements are provided to validate integration metrics in AI or brains.
Self Model and Reportability # Continue PAPER_TPL AI
Cautions that AI self-reports should be validated and triangulated with other evidence.
"So this approach must be handled with care: LLM outputs should be extensively checked for reliability and assessed alongside other sources of evidence (Perez & Long, 2023)."
Future research directions, p. 15
The document underscores that reportability must be supported by credibility checks and independent evidence, refining how Self-Model & Reportability should be operationalized in AI evaluations.
Limitations: Provides high-level guidance without specifying concrete reliability metrics or standardized cross-check procedures.