Principles for Responsible AI Consciousness Research

Butlin, Lappas · View original paper

← Back to overview
Evidence (4)
Information Integration # Continue PAPER_TPL AI
Unintentional implementation of global workspace-like elements in an AI architecture.
"Evidence hinting at this possibility comes from the Perceiver architecture (Jaegle et al. 2021a, b), which unintentionally implemented some elements of a global workspace (Juliani et al. 2022)."
4.1 Objectives: Understanding AI Consciousness, p. 8
This links an existing model architecture to global workspace-like information integration, aligning with AI markers such as attention convergence and aggregator tokens, and thereby bearing on whether AI systems can exhibit integration associated with conscious access .
Limitations: Interpretive remark citing external work; no direct analysis or quantitative measurement of global workspace dynamics provided in this paper.
Valence and Welfare # Continue PAPER_TPL AI
Proposed safeguards to minimize AI suffering via deployment limits, staged assessments, gradual capability increases, and information controls.
"Several kinds of measures are possible to minimise the potential suffering of conscious AI systems, which could be put in place when building experimental systems. These include: controlling the breadth of deployment and the ways in which systems are used; assessing the capabilities and potential for consciousness of systems at several stages of development and deployment; increasing capabilities gradually and only introducing those that are needed for the system’s intended purpose (as far as this is possible, given the difficulties of predicting the capabilities of some systems in advance); and controlling access to information that would enable irresponsible actors to build systems that may be conscious."
4.2 Development: Value and Constraints, p. 10
This passage proposes concrete controls to reduce the risk of negative-valence states or suffering in potentially conscious AI, directly connecting to welfare considerations and safeguards for moral patients .
Limitations: Normative recommendations rather than empirical evidence about valence mechanisms; does not measure or model aversive states.
Causal Control # Continue PAPER_TPL AI
Staged evaluations before/during training and around deployment to control risks and system behavior.
"As methods improve, organisations should use them to assess whether systems are likely to be conscious at several stages of development. As Shevlane et al. (2023) describe in the context of evaluations for AI safety, there are reasons to evaluate systems before and during training, before deployment, and later, after deployment, when more is known about their capabilities."
4.3 Phased Approach: Gradual Development with Monitoring, p. 12
Recommends staged interventions across training and deployment that causally shape system access and behavior via evaluation and gating, aligning with causal control over computational pathways relevant to conscious access .
Limitations: Provides procedural guidance but no concrete demonstration of interventions like ablations, routing edits, or objective modifications.
Emergent Dynamics # Continue PAPER_TPL AI
Concept of capability overhangs, where latent attributes can yield sudden performance leaps if unlocked.
"The potential problem to be avoided here relates to the concept of overhangs, which has been identified in AI safety research: underexplored systems may have hidden capabilities, or latent attributes that would allow leaps in performance if unlocked (Dafoe, 2018)."
4.3 Phased Approach: Gradual Development with Monitoring, p. 12
Acknowledges emergent higher-order dynamics—sudden capability onsets—arising from interactions and hidden structure, paralleling emergent dynamics and phase-change phenomena relevant to consciousness science .
Limitations: Conceptual claim; no empirical measurements of emergent strategies or complexity indices are presented.