ChineseEEG-2: An EEG Dataset for Multimodal Semantic Alignment and Neural Decoding during Reading and Listening

Representational Structure # Continue PAPER_TPL BIO

Token-aligned EEG recordings paired with audio/text embeddings enable analysis of how linguistic information is represented across brain and model spaces.

"The data modalities included in the structure of the ChineseEEG-2 dataset are shown in Figure 3a, including raw data and derivatives. Raw data contains raw EEG, raw text, and audio materials, and derivatives contain pre-processed EEG data, audio and text embeddings generated by model Wav2Vec2 and Bert-base Chinese22."

Data Records, p. 4

Pairing EEG with token-level audio/text embeddings provides a direct bridge for studying representational structure between neural population activity and LLM semantic spaces, facilitating cross-modal mapping relevant to consciousness-related coding and access .

"The visualization of the source activities corresponding to each word in the example sentence is provided. The result of source localization demonstrates a more focused activation near the left middle temporal gyrus in subjects in the RA task, which is related to language comprehension26. For subjects in the PL task, it is observed that activation areas are more dispersed."

EEG Source localization and cross-modal analysis, p. 5

Source-resolved activations tied to word-level content indicate identifiable cortical population codes for linguistic features, aligning with representational structure analyses of how information is encoded and accessed in the brain .

Figures

Figure 3 (p. 11) : The figure shows the multimodal dataset organization and preprocessing steps that enable token-level alignment of EEG with language materials for representational analyses .

Limitations: Evidence is correlational and dataset-oriented; it does not identify specific neural codes at the single-neuron or assembly level, nor does it establish causal read/write access to representations.

Temporal Coordination # Continue PAPER_TPL BIO

Hardware and character-level triggers provide precise timing of linguistic units; preprocessing targets oscillatory bands implicated in coordination of cognitive processing.

"During data acquisition in both RA and PL tasks, triggers were embedded in the EEG recordings time-locked to the onset and offset of each presented stimulus, marking the exact start and end of every text line. Character-level temporal mapping was achieved using the fixed presentation rate of 0.25 seconds per character."

Temporal Alignment of EEG, Text, and Audio Sequences, p. 4

Time-locked triggers and fixed-rate character presentation enable precise temporal segmentation of content, supporting analyses of how neural oscillations coordinate and bind linguistic information over time .

"The retained epochs were downsampled to 250 Hz... followed by 1-40 Hz bandpass filtering to isolate neurocognitive rhythms: delta (0.5-4 Hz) for sustained attention, theta (4-8 Hz) for semantic integration, alpha (8-12 Hz) for inhibitory control, and beta (12-30 Hz) for predictive processing, targeting the frequency bands most relevant to cognitive processing."

EEG data pre-processing, p. 4

Focusing analyses on canonical frequency bands directly supports investigating temporal coordination mechanisms (e.g., phase-locked dynamics) during linguistic processing .

Figures

Figure 3 (p. 11) : The preprocessing pipeline and segmentation are the basis for time-resolved analyses of coordination across oscillatory bands and linguistic units .

Limitations: While timing precision is high, the paper does not report phase-locking or cross-frequency coupling analyses; it establishes prerequisites rather than demonstrating specific temporal coordination signatures.

Information Integration # Continue PAPER_TPL BIO

Cross-task correlations between language regions indicate integrated processing of semantic information across modalities.

"Cross-modal correlation analysis was performed... As shown in Figure 5b, the left superior temporal gyrus and the left middle temporal gyrus... exhibited stronger cross-task correlations than between the left superior temporal gyrus and the left pericalcarine cortex... The correlation between task-activated regions substantiates the dataset’s sensitivity to language-related neural dynamics."

EEG Source localization and cross-modal analysis, p. 5

Higher correlations between language-selective regions across reading-aloud and passive-listening tasks suggest integrated, modality-invariant representations within the language network, consistent with information integration accounts .

Limitations: Findings are correlational and region-level; they do not demonstrate global workspace-like ignition or long-range causal integration, and the dataset paper does not provide direct frontoparietal hub evidence.