Artificial neural network language models predict human brain responses to language even after a developmentally realistic amount of training

Representational Structure # Continue PAPER_TPL AI

Layer-wise model representations linearly map to human fMRI in the language network; lower perplexity models yield higher neural predictivity, with later layers capturing more contextualized structure.

"Critically, across both Experiments 1 and 2, we observed a consistent relationship between perplexity and neural predictivity, such that lower perplexity is associated with higher predictivity. However, once a model reaches a certain level of perplexity, further improvements in the model’s ability to predict the next word are no longer associated with increases in predictivity, in line with recent findings (Oh & Schuler, 2022, 2023)."

Model Perplexity Predicts Model Performance, p. 56

This establishes that representational quality (proxied by perplexity) systematically tracks how well model embeddings align with brain responses, supporting representational-structure links between ANN internals and human language representations .

"C) Model representations were related to human representations by building a linear regression between unit activations for each layer of the model and voxel activity (in the language-selective network; Fedorenko et al., 2011) or reading times for the stimuli used in each of the benchmarks. This regression was then used to make predictions about human neural/behavioral responses for unseen language stimuli, and a Pearson correlation was computed between these predictions and the observed responses."

Human Datasets (Benchmarks), p. 45

The analysis pipeline explicitly links model-layer activations to distributed voxel patterns in the language network, operationalizing a cross-system readout of representational structure that is central to consciousness-relevant mapping work .

"for the Pereira2018 benchmark, we observed that for layers 4–9, performance peaks for the 1 million word model, and for the last three layers (layers 10–12), a consistent improvement in performance is observed with larger datasets (Figure 2E)."

In exploratory analyses (Results), p. 52

Layer-specific behavior indicates that deeper layers encode more contextualized linguistic structure, reinforcing a hierarchical representational geometry that can be probed for access and content in both AI and brain studies .

Figures

Figure 3 (p. 55) : Shows that lower perplexity tracks higher fMRI predictivity, linking representational quality in ANN embeddings to neural responses during sentence processing .

Figure 1 (p. 44) : Depicts the architecture and the model-to-brain comparison pipeline, grounding how layer-wise representations are related to fMRI signals .

Limitations: Encoding links are correlational and depend on fMRI’s coarse spatiotemporal resolution and benchmark specifics; representational–brain alignment may plateau beyond a perplexity threshold and may differ for longer narrative stimuli.

Emergent Dynamics # Continue PAPER_TPL AI

Model-to-brain alignment increases during training and plateaus around 10% of training steps; early layers peak earlier than later layers, indicating staged emergence.

"Critically, mirroring the results from Experiment 1, we observed a consistent increase in how well the model predicts fMRI responses to sentences until the model reaches the 10% checkpoint, at which point the performance plateaus."

Models Trained on a Small Portion of a Massive Corpus Predict Human Responses, p. 53

Training induces a systematic, then saturating, increase in brain-aligned representations—an emergent dynamic that constrains expectations about when ANN representations become brain-like during learning .

"In exploratory analyses of the individual model layers, we observed that performance shows a consistent increase across layers up to the 1.0% checkpoint... earlier layers reach close to maximal performance earlier in the training (at the 1% checkpoint), whereas later layers reach their peak close to the 10% checkpoint (Figure 2F)."

Models Trained on a Small Portion of a Massive Corpus Predict Human Responses, p. 53

Layer-staged improvement suggests hierarchical emergence of contextual representations, paralleling staged processing regimes hypothesized in biological systems .

Limitations: Plateau timing may depend on dataset, objective, and measurement noise; fMRI’s temporal averaging could obscure continued improvements that might appear with intracranial data.

Causal Control # Continue PAPER_TPL AI

Changing weight initialization or attention directionality causally alters the model’s ability to predict fMRI responses and task performance.

"We showed that initializing a model with a normal distribution for all weights leads to the model being unable to predict fMRI response to sentences (predictivity is at ~0; of course, such a model is also unable to perform the next-word prediction task)."

Performance of Untrained Models, p. 55

An initialization change functions like a causal intervention that eliminates both task competence and brain predictivity, directly linking internal parameterization to representational access and behavior .

"However, the 100 million word model still performs below the fully trained model. This difference between the GPT-2 and miniBERTa models in the amount of training they require to align with human data is likely due to the difference in the directionality of the attention mechanisms, with unidirectional-attention mechanisms being more sample efficient."

Results (Experiment 1 and generalization to miniBERTa), p. 51

Manipulating attention directionality (uni- vs. bidirectional) changes data efficiency and alignment, a causal handle on routing/representation formation relevant to control analyses in AI–brain comparisons .

Limitations: Causal inferences are within-model manipulations; differences could interact with dataset and objective, and fMRI alignment is an indirect proxy for representational adequacy.

Information Integration # Continue PAPER_TPL BIO

Language-selective frontotemporal voxels collectively encode sentence content that can be predicted from ANN representations, indicating distributed integration across the network.

"The contrast between sentences and nonword lists has been shown to robustly identify the frontotemporal language-selective network of brain areas (Fedorenko et al., 2011; Lipkin et al., 2022). These areas support language comprehension across modalities (listening, reading, etc.) and have been established to be sensitive to both word meanings and syntactic structure processing..."

Human Datasets (Benchmarks), p. 45

By targeting a functionally defined network that integrates lexical and syntactic information, the mapping demonstrates system-level access to distributed content—key to information integration accounts bridging AI and brain data .

Limitations: Integration is inferred from encoding performance and functional localization; fMRI does not directly reveal fast binding dynamics or causal inter-area routing.