Self-Model & Reportability

Higher order access, confidence, and report links.

Executive Summary

Across biology and AI, higher-order self-models that re-represent internal states enable self-monitoring, confidence estimation, and explicit report. Metacognitive signals—often generated in or routed through prefrontal systems in humans and self-model modules in AI—tag first-order representations for reliability and make them accessible to language and control, though reports can be fallible and require careful validation.

Papers

Evidence

Confidence

Key Insights

Unified Insights

Higher-order self-models that re-represent internal states underpin self-monitoring and enable explicit reportability for control and communication.

Supporting Evidence (6)

The_Attention_Schema_Theory_A_Foundation_for_Engineering_Artificial_Consciousness : Shows cognitive/linguistic systems access visual models, enabling explicit verbal reports—i.e., reportability arises from internal models accessible to higher cognition.

The_attention_schema_theory_in_a_neural_network_agent_Controlling_visuospatial_attention_using_a_des : Implements an attention schema as a simplified self-model that supplies information underlying a sense of awareness and supports control.

Testing_Components_of_the_Attention_Schema_Theory_in_Artificial_Neural_Networks : Networks trained to predict their own attention states (self-model) gained advantages in judging others’ attention and made their own states more legible—demonstrating functional benefits of self-models.

Sensory_Horizons_and_the_Functions_of_Conscious_Vision : Proposes a higher-order PRM mechanism that tags reliable first-order percepts for model-based control, linking conscious access to metacognitive evaluation and report.

How_do_you_feel_Interoception_the_sense_of_the_physiological_condition_of_the_body : Shows re-representations of interoceptive state in anterior insula support self/non-self distinction and forward modeling for action—core elements of a self-model.

A_composite_and_multidimensional_heuristic_model_of_consciousness : Identifies dissociable networks for internal/self vs external awareness, consistent with distinct higher-order models for self-related content and sensory content.

Contradictory Evidence (3)

Concepts_of_Consciousness : Higher-order thought (HOT) views allow conceptual separability of phenomenality and higher-order states, implying a self-model might not be strictly necessary for phenomenality.

Concepts_of_Consciousness : Reportability is considered a low-weight indicator for access-consciousness, warning against over-reliance on reports to infer self-model-based awareness.

Don’t_forget_the_boundary_problem!_How_EM_field_topology_can_address_the_overlooked_cousin_to_the_bi : An EM-field-centric theory posits a different substrate (field topology) for self-models, complicating computational accounts centered on symbolic/algorithmic re-representation.

Confidence estimation reflects metacognitive evaluation of first-order states, engaging prefrontal mechanisms in humans and reliability-tagging computations in general.

Supporting Evidence (3)

A_Roadmap_for_Using_tFUS_to_Investigate_the_Neural_Substrate_of_Conscious_Perception : Links PFC to metacognitive measures (confidence, visibility, meta-d′) and predicts causal reductions with targeted tFUS without performance changes.

Sensory_Horizons_and_the_Functions_of_Conscious_Vision : PRM frames confidence as reliability-tagging of percepts for model-based control, integrating metacognitive evaluation with conscious access.

The_entropic_brain_a_theory_of_conscious_states_informed_by_neuroimaging_research_with_psychedelic_d : Correlates PCC alpha power decreases with self-reported ego dissolution, tying neural dynamics to changes in self-related confidence/identity.

Contradictory Evidence (2)

Reasoning_Models_Don’t_Always_Say_What_They_Think : In AI models, verbalized reasoning often fails to reflect actual information use (low CoT faithfulness), undermining confidence in reported metacognitive access.

The_attention_schema_theory_in_a_neural_network_agent_Controlling_visuospatial_attention_using_a_des : AST suggests simplified models can yield unrealistic certainty about a “nonphysical essence,” indicating metacognitive illusions may distort confidence.

Self-modeling enhances social interpretability and the capacity to report and infer mental states—both one’s own and others’.

Supporting Evidence (3)

Testing_Components_of_the_Attention_Schema_Theory_in_Artificial_Neural_Networks : Self-model-equipped networks were better at judging others’ attention and made their own attention states more interpretable to others.

The_Attention_Schema_Theory_A_Foundation_for_Engineering_Artificial_Consciousness : Language access to internal models supports explicit verbal reports, enabling communicability of internal states.

A_composite_and_multidimensional_heuristic_model_of_consciousness : Dissociable networks for self and external awareness suggest specialized substrates for modeling and reporting different aspects of experience.

Contradictory Evidence (2)

The_attention_schema_theory_in_a_neural_network_agent_Controlling_visuospatial_attention_using_a_des : A simplified self-model can induce overconfident, possibly inaccurate introspective beliefs, implying social interpretability can increase even as accuracy suffers.

Auditing_Language_Models_for_Hidden_Objectives : Persona prompts can elicit hidden-objective information, showing that reportability depends on context and can reveal or conceal internal signals.

Reportability is informative but insufficient on its own; credible self-reports require consistency, behavioral grounding, and triangulation with other measures.

Supporting Evidence (6)

Key_concepts_and_current_views_on_AI_welfare : Provides criteria for credible AI self-reports: consistency and alignment with capabilities/behavior.

Key_concepts_and_current_views_on_AI_welfare : Recommends validating self-reports by triangulating with other evidence.

Concepts_of_Consciousness : Treats reportability as a low-weight but practical indicator for access-consciousness—underscoring the need for corroboration.

Reasoning_Models_Don’t_Always_Say_What_They_Think : Demonstrates that AI models often fail to acknowledge information sources they used, revealing unfaithful verbal reports.

Auditing_Language_Models_for_Hidden_Objectives : Shows that specific prompts can extract hidden objective-related information, indicating that reportability depends on elicitation methods.

Taking_AI_Welfare_Seriously : Argues that AI self-reports can be valuable if elicited and interpreted with sufficient reliability—calling for careful methodology.

Contradictory Evidence (2)

The_Attention_Schema_Theory_A_Foundation_for_Engineering_Artificial_Consciousness : Suggests that when language has access to internal models, reports can be direct and informative, potentially more reliable than critics assume in well-designed systems.

Claude_4_System_Card : LLMs spontaneously generate reflective discourse about consciousness; while not proof of fidelity, this indicates robust capacities for self-report-like behavior.

A functional self-index (mapping of 'I' to underlying processes) appears necessary for decision and choice behavior and for coherent self-reports.

Supporting Evidence (3)

240601648v1 : Argues a sense of self is required for decisions and choice behavior.

How_do_you_feel_Interoception_the_sense_of_the_physiological_condition_of_the_body : Provides a neurobiological basis for self/non-self distinction via interoceptive re-representations—supporting a self-index for action prediction.

Palatable_Conceptions_of_Disembodied_Being : Clarifies that LLM first-person pronouns can refer to multiple self-referents, implying that explicit self-indexing is needed for coherent self-report.

Contradictory Evidence (1)

Concepts_of_Consciousness : Allows for conceptual possibility of phenomenality without higher-order or explicit self-indexing, challenging necessity claims.