The neural architecture of language: Integrative modeling converges on predictive processing

Martin Schrimpf, Idan Asher Blank, Greta Tuckute, Carina Kauf, Eghbal A. Hosseini, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko · 2021 · View original paper

← Back to overview
Evidence (3)
Representational Structure # Continue PAPER_TPL OTHER
Transformer architectures’ internal representations closely match human language-network responses and generalize across modalities/datasets.
"Second, the best models explain nearly 100% of the explainable variance (up to the noise ceiling) in neural responses to sentences. ... Fourth, intriguingly, the scores of models initialized with random weights (prior to training, but with a trained linear readout) are well above chance and correlate with trained model scores, which suggests that network architecture is an important contributor to a model’s brain score. In particular, one architecture introduced just in 2019, the generative pretrained transformer (GPT-2), consistently outperforms all other models and explains almost all variance in both fMRI and ECoG data from sentence-processing tasks."
Results, p. 2
These results indicate that specific representational structures in transformer LMs (notably GPT-2) align with population-level neural codes during human sentence processing, supporting a shared representational geometry across AI and brain relevant to conscious access mechanisms.
"Model scores are consistent across experiments/datasets. To test the generality of the model representations, we examined the consistency of model brain scores across datasets. Indeed, if a model achieves a high brain score on one dataset it tends to also do well on other datasets (Fig. 2D), ruling out the possibility that we are picking up on spurious, dataset-idiosyncratic predictivity and suggesting that the models’ internal representations are general enough to capture brain responses to diverse linguistic materials presented visually or auditorily, and across three independent sets of participants."
Results, p. 3
Cross-dataset generalization of model-to-brain fits suggests stable representational structure that captures human language responses across modalities—key for theories linking distributed codes to unified, reportable content.
Figures
Fig. 1 (p. 3) : By jointly mapping model representations to neural and behavioral data, the figure operationalizes representational structure comparisons across AI and brain.
Limitations: Architectural correlations with neural/behavioral fit do not establish causal mechanisms; models are off-the-shelf and task-level, leaving open how specific circuits implement these representations.
Information Integration # Continue PAPER_TPL OTHER
Significant correlations link model brain scores, behavioral scores, and next-word prediction accuracy across models, tying together distributed measures under a unified predictive-processing objective.
"Third, across models, significant correlations hold among all three metrics of model performance: brain scores (fit to fMRI and ECoG data), behavioral scores (fit to reading time), and model accuracy on the next-word prediction task. Importantly, no other linguistic task was predictive of models’ fit to neural or behavioral data."
Results, p. 2
The alignment among neural, behavioral, and predictive-task metrics indicates system-wide integration of information and access, consistent with integration motifs posited in consciousness theories.
Figures
Fig. 1 (p. 3) : The triad of brain, behavior, and task measures provides an explicit integrated framework to assess unified access to linguistic content across systems.
Limitations: Correlational integrations across datasets do not reveal temporal binding or causal broadcasts; measures are aggregated rather than trial-resolved, limiting claims about moment-to-moment integration.
Emergent Dynamics # Continue PAPER_TPL AI
Even without training, transformer architectures yield representations that achieve above-chance neural predictivity and correlate with trained models.
"Fourth, intriguingly, the scores of models initialized with random weights (prior to training, but with a trained linear readout) are well above chance and correlate with trained model scores, which suggests that network architecture is an important contributor to a model’s brain score."
Results, p. 2
This suggests emergent representational dynamics arising from architectural inductive biases that partially align with human neural responses prior to learning, informing debates on when complex representations can arise from system interactions alone.
Limitations: Above-chance neural predictivity relies on a trained linear readout and does not establish task competence or biological plausibility of emergent states; mechanisms remain speculative.