<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.1d2 20140930//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.1d2" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="nlm-ta">CJIF</journal-id>
      <journal-id journal-id-type="publisher-id">ICCK</journal-id>
      <journal-title-group>
        <journal-title>Chinese Journal of Information Fusion</journal-title>
      </journal-title-group>
      <issn pub-type="ppub" publication-format="print">2998-3363</issn>
      <issn pub-type="epub" publication-format="electronic">2998-3371</issn>
      <publisher>
        <publisher-name>Institute of Central Computation and Knowledge Inc</publisher-name>
        <publisher-loc>522 W RIVERSIDE AVE STE N, SPOKANE, WA, 99201, UNITED STATES</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.62762/CJIF.2024.876830</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Review Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>A Comprehensive Survey on Emerging Techniques and Fusion Technologies in Spatio-Temporal EEG Data Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0008-5593-2653</contrib-id>
          <name>
            <surname>Wang</surname>
            <given-names>Pengfei</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0005-6261-6702</contrib-id>
          <name>
            <surname>Zheng</surname>
            <given-names>Huanran</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0002-7065-6059</contrib-id>
          <name>
            <surname>Dai</surname>
            <given-names>Silong</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0007-2344-6055</contrib-id>
          <name>
            <surname>Wang</surname>
            <given-names>Yiqiao</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0002-8406-7511</contrib-id>
          <name>
            <surname>Gu</surname>
            <given-names>Xiaotian</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0009-5670-4940</contrib-id>
          <name>
            <surname>Wu</surname>
            <given-names>Yuanbin</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-4594-6946</contrib-id>
          <name>
            <surname>Wang</surname>
            <given-names>Xiaoling</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff1"><label>1</label>School of Computer Science and Technology, East China Normal University, Shanghai 200062, China</aff>
      </contrib-group>
      <author-notes>
        <corresp id="cor6">Corresponding Author: Yuanbin Wu. Email: <email>ybwu@cs.ecnu.edu.cn</email></corresp>
        <corresp id="cor7">Corresponding Author: Xiaoling Wang. Email: <email>xlwang@cs.ecnu.edu.cn</email></corresp>
      </author-notes>
      <pub-date date-type="pub" pub-type="epub" publication-format="online">
        <day>15</day>
        <month>12</month>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <issue>3</issue>
      <fpage>183</fpage>
      <lpage>211</lpage>
      <history>
        <date date-type="received">
          <day>31</day>
          <month>7</month>
          <year>2024</year>
        </date>
        <date date-type="accepted">
          <day>10</day>
          <month>12</month>
          <year>2024</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2024 by the Authors. Published by Institute of Central Computation and Knowledge. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/).</copyright-statement>
        <copyright-year>2024</copyright-year>
        <copyright-holder>The Authors</copyright-holder>
        <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
        </license>
      </permissions>
      <self-uri xlink:href="https://www.icck.org/article/abs/cjif.2024.876830">This article is available from https://www.icck.org/article/abs/cjif.2024.876830</self-uri>
      <abstract>
        <p>In recent years, the field of electroencephalography (EEG) analysis has witnessed remarkable advancements, driven by the integration of machine learning and artificial intelligence. This survey aims to encapsulate the latest developments, focusing on emerging methods and technologies that are poised to transform our comprehension and interpretation of brain activity. The structure of this paper is organized according to the categorization within the machine learning community, with representation learning as the foundational concept that encompasses both discriminative and generative approaches. We delve into self-supervised learning methods that enable the robust representation of brain signals, which are fundamental for a variety of downstream applications. Within the realm of discriminative methods, we explore advanced techniques such as graph neural networks (GNN), foundation models, and and large language models (LLMs)-based fusion approaches. On the generative front, we examine technologies that leverage EEG data to produce images or text, offering novel perspectives on brain activity visualization and interpretation. This survey provides an extensive overview of these cutting-edge techniques, their current applications, and the profound implications they hold for future research and clinical practice. The relevant literature and open-source materials have been compiled and are consistently updated at <ext-link xlink:href="https://github.com/wpf535236337/LLMs4TS">https://github.com/wpf535236337/LLMs4TS</ext-link>.</p>
      </abstract>
      <kwd-group kwd-group-type="author" xml:lang="en">
        <kwd>electroencephalography (EEG)</kwd>
        <kwd>multi-modal fusion</kwd>
        <kwd>self-supervised learning (SSL)</kwd>
        <kwd>graph neural networks (GNN)</kwd>
        <kwd>foundation models</kwd>
        <kwd>large language models (LLMs)</kwd>
        <kwd>generative models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="S1">
      <label>1.</label>
      <title>Introduction</title>
      <p id="S1.p1">Electroencephalography (EEG) has long been a cornerstone in the study of brain function, offering a non-invasive means to monitor electrical activity within the brain. Non-invasive are easier to implement without surgery, but they lack simultaneous consideration of temporal and spatial resolution, as well as the ability to capture deep brain information. In contrast, invasive methods like Stereoelectroencephalography (SEEG) [<xref rid="ref001" ref-type="bibr">1</xref>] can measure these brain signals more precise with higher signal-to-noise data [<xref rid="ref002" ref-type="bibr">2</xref>], albeit requiring surgical procedures to insert recording devices. Overall, non-invasive signals are relatively safer, more portable, have greater potential for use, and are applicable to a wider population, reflecting voltage fluctuations caused by ion currents in neurons.</p>
      <p id="S1.p2">While our understanding of the brain deepens and computational methods advance [<xref rid="ref003" ref-type="bibr">3</xref>, <xref rid="ref004" ref-type="bibr">4</xref>], the field of EEG analysis faces many challenges. The first challenge is the effective capture of representations in EEG data, particularly in the absence of labels. The second challenge involves the identification and classification of complex and subtle patterns within brain activity, requiring advanced discriminative methods that can accurately interpret the nuanced differences indicative of various brain states or conditions. Lastly, the challenge of creating meaningful visualizations or interpretations from EEG data calls for generative methods that can transform the abstract EEG signals into more tangible and comprehensible forms, such as images or text, thereby enhancing our understanding of the brain's intricate workings. Addressing these challenges collectively advances the field of EEG analysis, making it more robust, insightful, and applicable to a wider range of scientific and clinical applications. From a broader perspective, EEG analysis is fundamentally a task of deciphering complex, noisy, and often high-dimensional brain signals. Addressing this challenge requires integrating knowledge and data from various modalities—ranging from behavioral and physiological signals to textual clinical notes and even structural neuroimaging data. This naturally aligns with the goals of information fusion, which aims to combine heterogeneous data sources to improve decision-making, robustness, and interpretability.</p>
      <p id="S1.p3">In response to aforementioned challenges, recent developments in deep learning and artificial intelligence have paved the way for more robust and nuanced EEG analysis strategies. This paper surveys three key areas of advancement that are reshaping the field of EEG analysis: </p>
      <p>
        <list list-type="bullet" id="S1.I1">
          <list-item id="S1.I1.i1">
            <p id="S1.I1.i1.p1"><bold>Representation Learning in EEG Analysis</bold>: Representation learning is the first fundamental step in EEG analysis, concentrate on automatically extracting useful features from EEG signals. Self-supervised learning methods are being employed to develop robust signal representations that enhance the precision and interpretability of downstream tasks. These unsupervised learning methods are naturally suited for the vast amounts of brain signal data and mimic human learning processes.</p>
          </list-item>
          <list-item id="S1.I1.i2">
            <p id="S1.I1.i2.p1"><bold>Discriminative EEG Analysis</bold>: Discriminative methods focus on distinguishing between different categories or patterns in EEG signals. Advanced architectures such as Graph Neural Networks (GNNs), Foundation Models, and LLMs-based Methods are being utilized to gain deeper insights into brain activity. These architectures efficiently capture discriminative patterns, which are crucial for understanding complex neural processes.</p>
          </list-item>
          <list-item id="S1.I1.i3">
            <p id="S1.I1.i3.p1"><bold>Generative EEG Analysis</bold>: Generative methods aim to generate new modalities or signal data from EEG signals. Innovative approaches such as diffusion produce images or text from EEG data are providing novel approaches to the understanding and visualization of brain activity. These generative techniques are also important applications for AI-generated content (AIGC).</p>
          </list-item>
        </list>
      </p>
      <p id="S1.p4">This paper aims to provide a comprehensive overview of cutting-edge techniques, discuss their details, and explore the significant implications they hold for future research and clinical practice in EEG analysis. A taxonomy of the surveyed methods is illustrated in Figure <xref ref-type="fig" rid="F1">1</xref>. <bold>The structure of this paper is organized according to the categorization within the machine learning community, with representation learning as the foundational concept that encompasses both discriminative and generative approaches [<xref rid="ref005" ref-type="bibr">5</xref>].</bold> The remainder of this paper is organized as follows: Section <xref rid="S2">2</xref> summarizes the background and related surveys of our work. Section <xref rid="S3">3</xref> discusses the robust representation learning strategy and its significance in EEG data analysis. Section <xref rid="S4">4</xref> explores the emergent discriminative architecture, detailing the role of GNNs (<xref rid="S4.SS1">4.1</xref>), Foundation Models (<xref rid="S4.SS2">4.2</xref>), and LLMs-based Methods (<xref rid="S4.SS3">4.3</xref>). Section <xref rid="S5">5</xref> examines the innovative generative applications of EEG data. Section <xref rid="S6">6</xref> provides an introduction of the most widely used datasets and the key metrics employed to assess the performance of various EEG analysis models. Finally, Section <xref rid="S7">7</xref> concludes the paper and discusses potential future directions for EEG analysis.</p>
      <p>
        <fig id="F1">
          <label>Figure 1.</label>
          <caption>
            <p>A comprehensive taxonomy of advancements in EEG analysis.</p>
          </caption>
          <graphic xlink:href="images/fig1.png"/>
        </fig>
      </p>
    </sec>
    <sec id="S2">
      <label>2.</label>
      <title>Related survey</title>
      <sec id="S2.SS1">
        <label>2.1</label>
        <title>Existing Surveys on EEG Analysis</title>
        <p id="S2.SS1.p1">In the domain of EEG-related concepts and research, numerous review studies have provided comprehensive summaries. Hosseini et al. [<xref rid="ref004" ref-type="bibr">4</xref>] introduced the application of machine learning in EEG signal processing, covering traditional methods such as Support Vector Machines (SVM), k-Nearest Neighbors (kNN), and Naive Bayes in classification scenarios. However, this review did not consider the extensive discussion of deep learning algorithms that have demonstrated superior performance. Jiang et al. [<xref rid="ref006" ref-type="bibr">6</xref>] discussed the removal of artifacts from EEG signals, making their review more detailed in technical aspects. Nevertheless, their work did not cover deep learning algorithms and did not consider a broader range of EEG downstream tasks. In contrast, Zhang et al. [<xref rid="ref007" ref-type="bibr">7</xref>] provided a more comprehensive perspective, introducing the origins and applications of Brain-Computer Interface (BCI) and discussing the integration of mainstream deep learning algorithms such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Generative Adversarial Networks (GAN) with EEG tasks. With the continuous innovation in artificial intelligence community, EEG research based on foundational models and large language models has begun to emerge. However, to the best of our knowledge, there is currently no literature that reviews EEG analysis from a more holistic frontier technology perspective, which is the gap this paper aims to fill.</p>
      </sec>
      <sec id="S2.SS2">
        <label>2.2</label>
        <title>Emerging Surveys on General Time-Series Analysis</title>
        <p id="S2.SS2.p1">In the general time series domain, a substantial body of work has summarized the application of the latest technologies in various downstream tasks. Zhang et al. [<xref rid="ref008" ref-type="bibr">8</xref>] categorized existing self-supervised learning-based time series analysis methods into three types: generative, contrastive, and adversarial, and discussed their key intuitions and main frameworks in detail. Jin et al. [<xref rid="ref009" ref-type="bibr">9</xref>] provided an overview of the application of graph neural networks in time series tasks such as forecasting, classification, imputation, and anomaly detection. Liang et al. [<xref rid="ref010" ref-type="bibr">10</xref>] reviewed foundational models in time series analysis from the perspectives of model architectures, pre-training techniques, adaptation methods, and data modalities. Similarly, [<xref rid="ref011" ref-type="bibr">11</xref>, <xref rid="ref012" ref-type="bibr">12</xref>, <xref rid="ref013" ref-type="bibr">13</xref>] systematically outlined methods and procedures for time series analysis based on large language models. Yang et al. [<xref rid="ref014" ref-type="bibr">14</xref>] reviewed the application of diffusion models in time series and spatio-temporal data. Additionally, there are some works focusing on more specific model architectures or downstream tasks [<xref rid="ref015" ref-type="bibr">15</xref>, <xref rid="ref016" ref-type="bibr">16</xref>]. We refer the reader to the corresponding publication for a more in-depth understanding.</p>
        <p id="S2.SS2.p2">Although numerous reviews exist within the broader time series field, few surveys concentrate exclusively on EEG data. Moreover, EEG data possesses unique characteristics, and a substantial body of related work has emerged recently. Thus necessitating a comprehensive review and synthesis, this paper seeks to offer an in-depth examination of state-of-the-art techniques, elaborate on their intricacies, and explore their profound implications for future EEG research and clinical applications.</p>
      </sec>
      <sec id="S2.SS3">
        <label>2.3</label>
        <title>Information Fusion in EEG Foundation Models</title>
        <p id="S2.SS3.p1">Information fusion refers to the integration of multiple data sources or modalities to achieve more comprehensive, robust, and generalizable understanding. In the context of EEG analysis, fusion may occur at different levels:</p>
        <p>
          <list list-type="bullet" id="S2.I1">
            <list-item id="S2.I1.i1">
              <p id="S2.I1.i1.p1"><bold>Data-level fusion</bold>: Merging raw EEG signals with complementary signals such as fNIRS, fMRI, or EMG.</p>
            </list-item>
            <list-item id="S2.I1.i2">
              <p id="S2.I1.i2.p1"><bold>Feature-level fusion</bold>: Joint representation learning of EEG with language, vision, or structured clinical data using attention-based or graph-based methods.</p>
            </list-item>
            <list-item id="S2.I1.i3">
              <p id="S2.I1.i3.p1"><bold>Decision-level fusion</bold>: Ensembling predictions from multiple modalities or models to improve classification or generation.</p>
            </list-item>
          </list>
        </p>
        <p id="S2.SS3.p3">Recent EEG foundation models have begun to incorporate such fusion mechanisms. For example, EEG-T5 aligns EEG with natural language for brain-to-text generation, while Meta-Transfer Learning (MTL) frameworks leverage cross-subject and cross-task signals. These fusion strategies enhance not only performance but also interpretability—vital for medical and real-world applications.</p>
        <p id="S2.SS3.p4">As the field moves toward larger and more heterogeneous datasets, information fusion will likely become a key enabler of generalization across populations, tasks, and environments.</p>
      </sec>
    </sec>
    <sec id="S3">
      <label>3.</label>
      <title>Representation Learning in EEG Analysis</title>
      <p id="S3.p1">In recent years, deep learning has excelled in extracting hidden patterns and features of the data. Typically, feature extraction models based on deep learning rely heavily on large volumes of labeled data, a method commonly referred to as supervised learning. However, in certain practical applications, particularly in time-series data such as Electroencephalograms (EEG), acquiring extensive labeled data is both time-consuming and costly. As an alternative, Self-Supervised Learning (SSL) has garnered increasing attention due to its label efficiency and generalization capabilities. SSL, a subset of unsupervised learning, extracts supervisory signals by solving tasks automatically generated from unlabeled data, thereby creating valuable representations for downstream tasks.</p>
      <p id="S3.p2">With the significant success of SSL in fields such as computer Vision(CV) [<xref rid="ref017" ref-type="bibr">17</xref>] and Natural Language Processing(NLP) [<xref rid="ref018" ref-type="bibr">18</xref>], its application to time-series data appears particularly promising. However, directly applying tasks designed for visual or linguistic processing to time-series data is challenging and often yields limited effectiveness. The primary reasons include:</p>
      <p>
        <list list-type="bullet" id="S3.I1">
          <list-item id="S3.I1.i1">
            <p id="S3.I1.i1.p1">Time-series data possess unique attributes such as seasonality, trends, and frequency domain information, which are typically not considered in tasks designed for images or language.</p>
          </list-item>
          <list-item id="S3.I1.i2">
            <p id="S3.I1.i2.p1">Common data augmentation techniques in computer vision, such as rotation, flipping, and cropping, can disrupt the temporal dependencies and integrity of time-series data, such as EEG signals. For instance, rotating or flipping the time points in an EEG signal could completely lose physiological significance and contextual information.</p>
          </list-item>
          <list-item id="S3.I1.i3">
            <p id="S3.I1.i3.p1">Many time-series datasets are multidimensional, with each dimension potentially representing a different measurement channel. This contrasts with handling single images or text data, requiring synchronous analysis and processing across multiple dimensions.</p>
          </list-item>
        </list>
      </p>
      <p id="S3.p4">To address these issues, this section summarizes two main paradigms of SSL: contrastive learning, which trains models to distinguish between similar and dissimilar pairs of data points and masked autoencoders, which aim to learn the intrinsic feature information of the data. All of the methods are summarized in Table <xref rid="T1" ref-type="table">1</xref>.</p>
      <sec id="S3.SS1">
        <label>3.1</label>
        <title>Contrastive Learning</title>
        <p id="S3.SS1.p1">Contrastive learning is a self-supervised learning method that acquires invariant representations of data by learning the similarities and differences between samples. This approach maps similar samples to proximate representation spaces and dissimilar samples to distant ones, thereby enabling the learning of generalized feature representations without the need for explicit label information. Formally, given a set of samples <inline-formula><mml:math alttext="\mathcal{X}=\left\{x^{1},x^{2},\cdots,x^{N}\right\}" display="inline"><mml:mrow><mml:mi class="ltx_font_mathcaligraphic">𝒳</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>N</mml:mi></mml:msup><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, contrastive learning aims to learn a mapping function <inline-formula><mml:math alttext="f" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula> that maximizes the similarity between positive sample pairs of the same class and minimizes the similarity between negative sample pairs of different classes. For positive sample pairs <inline-formula><mml:math alttext="(x,x^{+})" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>+</mml:mo></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> and negative sample pairs <inline-formula><mml:math alttext="(x,x^{-})" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>−</mml:mo></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, the objective of contrastive learning is to optimize the following loss function:</p>
        <p>
          <disp-formula id="S3.E1">
            <mml:math alttext="L\left(x,x^{+},x^{-}\right)=-\log\left(\frac{e^{f\left(x,x^{+}\right)/\tau}}{e%&#10;^{f(x,x^{+})/\tau}+e^{f(x,x^{-})/\tau}}\right)" display="block">
              <mml:mrow>
                <mml:mrow>
                  <mml:mi>L</mml:mi>
                  <mml:mo>⁢</mml:mo>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mi>x</mml:mi>
                    <mml:mo>,</mml:mo>
                    <mml:msup>
                      <mml:mi>x</mml:mi>
                      <mml:mo>+</mml:mo>
                    </mml:msup>
                    <mml:mo>,</mml:mo>
                    <mml:msup>
                      <mml:mi>x</mml:mi>
                      <mml:mo>−</mml:mo>
                    </mml:msup>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo rspace="0.167em">−</mml:mo>
                  <mml:mrow>
                    <mml:mi>log</mml:mi>
                    <mml:mo>⁡</mml:mo>
                    <mml:mrow>
                      <mml:mo>(</mml:mo>
                      <mml:mfrac>
                        <mml:msup>
                          <mml:mi>e</mml:mi>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mi>f</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo>(</mml:mo>
                                <mml:mi>x</mml:mi>
                                <mml:mo>,</mml:mo>
                                <mml:msup>
                                  <mml:mi>x</mml:mi>
                                  <mml:mo>+</mml:mo>
                                </mml:msup>
                                <mml:mo>)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>/</mml:mo>
                            <mml:mi>τ</mml:mi>
                          </mml:mrow>
                        </mml:msup>
                        <mml:mrow>
                          <mml:msup>
                            <mml:mi>e</mml:mi>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mi>f</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mrow>
                                  <mml:mo stretchy="false">(</mml:mo>
                                  <mml:mi>x</mml:mi>
                                  <mml:mo>,</mml:mo>
                                  <mml:msup>
                                    <mml:mi>x</mml:mi>
                                    <mml:mo>+</mml:mo>
                                  </mml:msup>
                                  <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                              <mml:mo>/</mml:mo>
                              <mml:mi>τ</mml:mi>
                            </mml:mrow>
                          </mml:msup>
                          <mml:mo>+</mml:mo>
                          <mml:msup>
                            <mml:mi>e</mml:mi>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mi>f</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mrow>
                                  <mml:mo stretchy="false">(</mml:mo>
                                  <mml:mi>x</mml:mi>
                                  <mml:mo>,</mml:mo>
                                  <mml:msup>
                                    <mml:mi>x</mml:mi>
                                    <mml:mo>−</mml:mo>
                                  </mml:msup>
                                  <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                              <mml:mo>/</mml:mo>
                              <mml:mi>τ</mml:mi>
                            </mml:mrow>
                          </mml:msup>
                        </mml:mrow>
                      </mml:mfrac>
                      <mml:mo>)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p>where <inline-formula><mml:math alttext="f(x,x^{+})" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>+</mml:mo></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> denotes the similarity of feature representations for positive pairs, <inline-formula><mml:math alttext="f(x,x^{-})" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mo>−</mml:mo></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> for negative pairs, and <inline-formula><mml:math alttext="\tau" display="inline"><mml:mi>τ</mml:mi></mml:math></inline-formula> is a temperature parameter that adjusts the scale of similarity. The intuitive interpretation of this loss function is that by maximizing the similarity of positive pairs while minimizing that of negative pairs, the model learns high-level semantic relationships between samples, resulting in more distinctive representations.</p>
        <p id="S3.SS1.p2">In this section, we will introduce two types of contrastive learning methods, which are contrastive learning based on data augmentation and contrastive learning combined with expert knowledge(as shown in Figure <xref ref-type="fig" rid="F2">2</xref>). All of the methods are presented in Table <xref rid="T1" ref-type="table">1</xref>.</p>
        <p>
          <fig id="F2">
            <label>Figure 2.</label>
            <caption>
              <p>Two types of contrastive learning methods.</p>
            </caption>
            <graphic xlink:href="images/CL_pic.pdf"/>
          </fig>
        </p>
        <p>
          <table-wrap id="T1">
            <label>Table 1</label>
            <caption>
              <p>Summary of self-supervised learning for EEG analysis.</p>
            </caption>
            <table>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>SSL</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Method</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Strategy</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Backbone</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Task</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Datasets</bold>
                  </td>
                  <td style="border-top: 1px solid black;" align="center">
                    <bold>Metric</bold>
                  </td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="8" align="center">CL</td>
                  <td style="border-top: 1px solid black;" align="center">SeqCLR[<xref rid="ref019" ref-type="bibr">19</xref>]</td>
                  <td style="border-top: 1px solid black;" align="center">Signal transformation</td>
                  <td style="border-top: 1px solid black;" align="center">CNN &amp; GRU</td>
                  <td style="border-top: 1px solid black;" align="center">Multiple tasks</td>
                  <td style="border-top: 1px solid black;" align="center">THU[<xref rid="ref037" ref-type="bibr">37</xref>], SEED[<xref rid="ref038" ref-type="bibr">38</xref>], SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], ISRUC-S3[<xref rid="ref040" ref-type="bibr">40</xref>]</td>
                  <td style="border-top: 1px solid black;" align="center">Accuracy</td>
                </tr>
                <tr>
                  <td align="center">TS-TCC[<xref rid="ref021" ref-type="bibr">21</xref>]</td>
                  <td align="center">Weak &amp; strong augmentation</td>
                  <td align="center">Transformer</td>
                  <td align="center">Sleep &amp; seizure detection</td>
                  <td align="center">HAR[<xref rid="ref041" ref-type="bibr">41</xref>], SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], ESR[<xref rid="ref042" ref-type="bibr">42</xref>], FD[<xref rid="ref043" ref-type="bibr">43</xref>]</td>
                  <td align="center">Accuracy, F1</td>
                </tr>
                <tr>
                  <td align="center">SSCL for EEG[<xref rid="ref022" ref-type="bibr">22</xref>]</td>
                  <td align="center">Signal transformation</td>
                  <td align="center">CNN</td>
                  <td align="center">Sleep stage classification</td>
                  <td align="center">SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], DOD[<xref rid="ref044" ref-type="bibr">44</xref>]</td>
                  <td align="center">Accuracy, F1</td>
                </tr>
                <tr>
                  <td align="center">MulEEG[<xref rid="ref023" ref-type="bibr">23</xref>]</td>
                  <td align="center">Multi-view contrast</td>
                  <td align="center">CNN</td>
                  <td align="center">Sleep stage classification</td>
                  <td align="center">SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], SHHS[<xref rid="ref045" ref-type="bibr">45</xref>]</td>
                  <td align="center">Accuracy, Kappa, F1</td>
                </tr>
                <tr>
                  <td align="center">ContraWR[<xref rid="ref026" ref-type="bibr">26</xref>]</td>
                  <td align="center">Non-negative contrast</td>
                  <td align="center">CNN</td>
                  <td align="center">Sleep stage classification</td>
                  <td align="center">SHHS[<xref rid="ref045" ref-type="bibr">45</xref>], SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], MGH[<xref rid="ref046" ref-type="bibr">46</xref>]</td>
                  <td align="center">Accuracy</td>
                </tr>
                <tr>
                  <td align="center">COMET[<xref rid="ref027" ref-type="bibr">27</xref>]</td>
                  <td align="center">Multi-level contrast</td>
                  <td align="center">CNN</td>
                  <td align="center">Disease detection</td>
                  <td align="center">AD[<xref rid="ref047" ref-type="bibr">47</xref>], PTB[<xref rid="ref048" ref-type="bibr">48</xref>], TDBRAIN[<xref rid="ref049" ref-type="bibr">49</xref>]</td>
                  <td align="center">Accuracy, F1, AUROC, AUPRC</td>
                </tr>
                <tr>
                  <td align="center">SleepPriorCL[<xref rid="ref028" ref-type="bibr">28</xref>]</td>
                  <td align="center">Expert knowledge incorporation</td>
                  <td align="center">CNN</td>
                  <td align="center">Sleep stage classification</td>
                  <td align="center">SleepEDF[<xref rid="ref039" ref-type="bibr">39</xref>], MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>]</td>
                  <td align="center">Accuracy, F1</td>
                </tr>
                <tr>
                  <td align="center">KDC2[<xref rid="ref029" ref-type="bibr">29</xref>]</td>
                  <td align="center">Cross-view contrast</td>
                  <td align="center">CNN &amp; GNN</td>
                  <td align="center">Multiple tasks</td>
                  <td align="center">SEED[<xref rid="ref038" ref-type="bibr">38</xref>], MMI[<xref rid="ref051" ref-type="bibr">51</xref>], CHB-MIT[<xref rid="ref052" ref-type="bibr">52</xref>]</td>
                  <td align="center">Accuracy</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="3" align="center">MAE</td>
                  <td style="border-top: 1px solid black;" align="center">BENDR[<xref rid="ref031" ref-type="bibr">31</xref>]</td>
                  <td style="border-top: 1px solid black;" align="center">Temporal-domain mask</td>
                  <td style="border-top: 1px solid black;" align="center">CNN &amp; Transformer</td>
                  <td style="border-top: 1px solid black;" align="center">Multiple tasks</td>
                  <td style="border-top: 1px solid black;" align="center">MMI[<xref rid="ref051" ref-type="bibr">51</xref>], BCIC[<xref rid="ref053" ref-type="bibr">53</xref>], ERN[<xref rid="ref054" ref-type="bibr">54</xref>], SSC[<xref rid="ref048" ref-type="bibr">48</xref>]</td>
                  <td style="border-top: 1px solid black;" align="center">Accuracy</td>
                </tr>
                <tr>
                  <td align="center">MAEEG[<xref rid="ref034" ref-type="bibr">34</xref>]</td>
                  <td align="center">Temporal-domain mask</td>
                  <td align="center">Transformer</td>
                  <td align="center">Sleep stage classification</td>
                  <td align="center">MGH[<xref rid="ref046" ref-type="bibr">46</xref>]</td>
                  <td align="center">Accuracy</td>
                </tr>
                <tr>
                  <td style="border-bottom: 1px solid black;" align="center">Wavelet2vec[<xref rid="ref035" ref-type="bibr">35</xref>]</td>
                  <td style="border-bottom: 1px solid black;" align="center">Frequency-domain mask</td>
                  <td style="border-bottom: 1px solid black;" align="center">ViT</td>
                  <td style="border-bottom: 1px solid black;" align="center">Seizure detection</td>
                  <td style="border-bottom: 1px solid black;" align="center">CHSZ[<xref rid="ref055" ref-type="bibr">55</xref>], TUSZ[<xref rid="ref056" ref-type="bibr">56</xref>]</td>
                  <td style="border-bottom: 1px solid black;" align="center">Accuracy, BCA, F1, MAE</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <sec id="S3.SS1.SSS1">
          <label>3.1.1</label>
          <title>Based on Data Augmentation</title>
          <p id="S3.SS1.SSS1.p1">Data augmentation is an indispensable component of contrastive learning. It generates different views of input samples using data augmentation techniques, and then learns representations by maximizing the similarity between views of the same sample while minimizing the similarity between views of different samples. SeqCLR [<xref rid="ref019" ref-type="bibr">19</xref>] introduces a set of data augmentation techniques specifically for EEG and extends the SimCLR [<xref rid="ref020" ref-type="bibr">20</xref>] framework to extract channel-level features from EEG data.</p>
          <p id="S3.SS1.SSS1.p2">TS-TCC [<xref rid="ref021" ref-type="bibr">21</xref>] generates different views of input data using both strong and weak augmentation methods. Weak augmentation employs jittering and scaling strategies, while strong augmentation uses permutation and jittering strategies, applying them to the temporal contrast module of EEG signals for temporal representation learning. This method maximizes the similarity between contexts of the same sample while minimizing the similarity between contexts of different samples. Jiang et al. [<xref rid="ref022" ref-type="bibr">22</xref>] applies transformations such as horizontal flipping and adding Gaussian noise to EEG signals, then learns the correlation between signals by measuring the feature similarity of these transformed signal pairs. Additionally, the authors explore the impact of transformation combinations on the network's representation capability to find the optimal combination for downstream tasks. mulEEG [<xref rid="ref023" ref-type="bibr">23</xref>] proposes a novel multi-view self-supervised method. By designing EEG augmentation strategies and introducing a diversity loss function, mulEEG effectively leverages complementary information from multiple views to learn better representations. However, these EEG data augmentation methods often lead to sampling bias [<xref rid="ref024" ref-type="bibr">24</xref>], especially for noisy EEG data, which can significantly affect performance [<xref rid="ref025" ref-type="bibr">25</xref>]. To address these limitations, ContraWR [<xref rid="ref026" ref-type="bibr">26</xref>] constructs positive sample pairs using data augmentation and employs global average representations as negative samples to provide contrastive information, thereby learning robust EEG representations without labels. Additionally, ContraWR assigns greater weight to closer samples when calculating the global average.</p>
          <p id="S3.SS1.SSS1.p3">Existing contrastive learning methods primarily focus on a single data level and fail to fully exploit the complexity of EEG signals. Therefore, COMET [<xref rid="ref027" ref-type="bibr">27</xref>] leverages all data levels of medical time-series, including patient, trial, sample, and observation levels, to design a hierarchical contrastive representation learning framework. Its advantage lies in fully utilizing the hierarchical structure of medical time-series, enabling a more comprehensive understanding of the intrinsic relationships within the data.</p>
        </sec>
        <sec id="S3.SS1.SSS2">
          <label>3.1.2</label>
          <title>Combined with Expert Knowledge</title>
          <p id="S3.SS1.SSS2.p1">Expert knowledge contrastive learning is a relatively new representation learning framework. Generally, this modeling framework incorporates expert prior knowledge or information into deep neural networks to guide model training. In a contrastive learning framework, prior knowledge can help the model select the correct positive and negative samples during training. SleepPriorCL [<xref rid="ref028" ref-type="bibr">28</xref>] was proposed to mitigate the sampling bias problem in data augmentation-based contrastive learning. It is well known that each sleep stage occupies a certain frequency range. The authors utilized this fact to calculate the energy of these frequency bands and used it as prior knowledge for training. Specifically, the authors calculated the rhythm energy vector <inline-formula><mml:math alttext="E=[E(\delta),E(\theta),E(\alpha),E(\beta)]" display="inline"><mml:mrow><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>δ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>θ</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>α</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>E</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>β</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> for each EEG segment <inline-formula><mml:math alttext="x" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>, referred to as prior features, and then defined the dissimilarity <inline-formula><mml:math alttext="d_{i,j}" display="inline"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> between the anchor <inline-formula><mml:math alttext="x_{i}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and the sample <inline-formula><mml:math alttext="x_{j}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula> as follows:</p>
          <p>
            <disp-formula id="S3.E2">
              <mml:math alttext="d_{i,j}=\log\left(\left\|E_{i}-E_{j}\right\|_{2}\right)" display="block">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>d</mml:mi>
                    <mml:mrow>
                      <mml:mi>i</mml:mi>
                      <mml:mo>,</mml:mo>
                      <mml:mi>j</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mi>log</mml:mi>
                    <mml:mo>⁡</mml:mo>
                    <mml:mrow>
                      <mml:mo>(</mml:mo>
                      <mml:msub>
                        <mml:mrow>
                          <mml:mo>‖</mml:mo>
                          <mml:mrow>
                            <mml:msub>
                              <mml:mi>E</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>−</mml:mo>
                            <mml:msub>
                              <mml:mi>E</mml:mi>
                              <mml:mi>j</mml:mi>
                            </mml:msub>
                          </mml:mrow>
                          <mml:mo>‖</mml:mo>
                        </mml:mrow>
                        <mml:mn>2</mml:mn>
                      </mml:msub>
                      <mml:mo>)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS1.SSS2.p2">Samples are ranked by dissimilarity, with the top K samples selected as positive samples and the rest as negative samples. Additionally, SleepPriorCL introduces a mechanism to adjust the gradient penalty strength of each sample based on its confidence as a positive or negative sample. To achieve this, each sample is assigned a customized temperature. The multi-positive contrastive loss is modified as follows:</p>
          <p>
            <disp-formula id="S3.E3">
              <mml:math alttext="\mathcal{L}\left(x_{i}\right)=\frac{-1}{\left|P(i)\right|}\sum_{p\in P(i)}\log%&#10;\frac{\exp\left(s_{i,p}/\tau_{p}\right)}{\exp\left(s_{i,p}/\tau_{p}\right)+%&#10;\sum_{n\in N(i)}\exp\left(s_{i,n}/\tau_{n}\right)}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi class="ltx_font_mathcaligraphic">ℒ</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo>(</mml:mo>
                      <mml:msub>
                        <mml:mi>x</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                      <mml:mo>)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mfrac>
                      <mml:mrow>
                        <mml:mo>−</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                      <mml:mrow>
                        <mml:mo>|</mml:mo>
                        <mml:mrow>
                          <mml:mi>P</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mi>i</mml:mi>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mo>|</mml:mo>
                      </mml:mrow>
                    </mml:mfrac>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mo>∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>p</mml:mi>
                          <mml:mo>∈</mml:mo>
                          <mml:mrow>
                            <mml:mi>P</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mi>i</mml:mi>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:msub>
                      <mml:mrow>
                        <mml:mi>log</mml:mi>
                        <mml:mo lspace="0.167em">⁡</mml:mo>
                        <mml:mfrac>
                          <mml:mrow>
                            <mml:mi>exp</mml:mi>
                            <mml:mo>⁡</mml:mo>
                            <mml:mrow>
                              <mml:mo>(</mml:mo>
                              <mml:mrow>
                                <mml:msub>
                                  <mml:mi>s</mml:mi>
                                  <mml:mrow>
                                    <mml:mi>i</mml:mi>
                                    <mml:mo>,</mml:mo>
                                    <mml:mi>p</mml:mi>
                                  </mml:mrow>
                                </mml:msub>
                                <mml:mo>/</mml:mo>
                                <mml:msub>
                                  <mml:mi>τ</mml:mi>
                                  <mml:mi>p</mml:mi>
                                </mml:msub>
                              </mml:mrow>
                              <mml:mo>)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mi>exp</mml:mi>
                              <mml:mo>⁡</mml:mo>
                              <mml:mrow>
                                <mml:mo>(</mml:mo>
                                <mml:mrow>
                                  <mml:msub>
                                    <mml:mi>s</mml:mi>
                                    <mml:mrow>
                                      <mml:mi>i</mml:mi>
                                      <mml:mo>,</mml:mo>
                                      <mml:mi>p</mml:mi>
                                    </mml:mrow>
                                  </mml:msub>
                                  <mml:mo>/</mml:mo>
                                  <mml:msub>
                                    <mml:mi>τ</mml:mi>
                                    <mml:mi>p</mml:mi>
                                  </mml:msub>
                                </mml:mrow>
                                <mml:mo>)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>+</mml:mo>
                            <mml:mrow>
                              <mml:mstyle displaystyle="false">
                                <mml:msub>
                                  <mml:mo>∑</mml:mo>
                                  <mml:mrow>
                                    <mml:mi>n</mml:mi>
                                    <mml:mo>∈</mml:mo>
                                    <mml:mrow>
                                      <mml:mi>N</mml:mi>
                                      <mml:mo>⁢</mml:mo>
                                      <mml:mrow>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mi>i</mml:mi>
                                        <mml:mo stretchy="false">)</mml:mo>
                                      </mml:mrow>
                                    </mml:mrow>
                                  </mml:mrow>
                                </mml:msub>
                              </mml:mstyle>
                              <mml:mrow>
                                <mml:mi>exp</mml:mi>
                                <mml:mo>⁡</mml:mo>
                                <mml:mrow>
                                  <mml:mo>(</mml:mo>
                                  <mml:mrow>
                                    <mml:msub>
                                      <mml:mi>s</mml:mi>
                                      <mml:mrow>
                                        <mml:mi>i</mml:mi>
                                        <mml:mo>,</mml:mo>
                                        <mml:mi>n</mml:mi>
                                      </mml:mrow>
                                    </mml:msub>
                                    <mml:mo>/</mml:mo>
                                    <mml:msub>
                                      <mml:mi>τ</mml:mi>
                                      <mml:mi>n</mml:mi>
                                    </mml:msub>
                                  </mml:mrow>
                                  <mml:mo>)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mfrac>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS1.SSS2.p4">where <inline-formula><mml:math alttext="x_{i}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the sleep epoch, <inline-formula><mml:math alttext="s_{i,j}" display="inline"><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the cosine similarity between <inline-formula><mml:math alttext="z_{i}" display="inline"><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="z_{j}" display="inline"><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula><mml:math alttext="z_{i}" display="inline"><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="z_{j}" display="inline"><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula> are the vectors of <inline-formula><mml:math alttext="x_{i}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> after encoding and projection. The index <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> is referred to as the anchor, the index <inline-formula><mml:math alttext="p" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> as the positive sample, <inline-formula><mml:math alttext="N(i)" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> is the set of all negative samples in the batch, and the index <inline-formula><mml:math alttext="n" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> as the negative sample. <inline-formula><mml:math alttext="P(i)" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> is the set of positive samples containing all true positive samples of <inline-formula><mml:math alttext="x_{i}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> in the batch.</p>
          <p id="S3.SS1.SSS2.p5">KDC2 [<xref rid="ref029" ref-type="bibr">29</xref>] is based on the neural theory of EEG generation, which states that EEG signals are produced by synchronized synaptic activity that stimulates neuronal excitation, generating a negative extracellular voltage that transforms neurons into dipoles. The voltage generated by the dipoles is transmitted to the scalp via capacitive and volume conduction and is captured by electrodes as EEG signals. Therefore, the authors constructed scalp and neural views to describe the external and internal information of brain activity, respectively, and designed a knowledge-driven cross-view contrastive loss to extract neural knowledge by contrasting the same augmented samples between views. Positive sample pairs are composed of representations of the same augmented samples in different views, while negative sample pairs are composed of representations of different augmented samples in different views. By minimizing the distance between positive sample pairs and maximizing the distance between negative sample pairs, the model learns complementary features that describe the internal and external manifestations of brain activity. The designed cross-view contrastive loss can be calculated as follows:</p>
          <p>
            <disp-formula id="S3.E4">
              <mml:math alttext="\mathcal{L}_{cross}=-\frac{1}{|\mathcal{B}|}log(\frac{pair^{+}}{pair^{+}+pair^%&#10;{-}})" display="block">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi class="ltx_font_mathcaligraphic">ℒ</mml:mi>
                    <mml:mrow>
                      <mml:mi>c</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>r</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>o</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>s</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>s</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mo>−</mml:mo>
                    <mml:mrow>
                      <mml:mfrac>
                        <mml:mn>1</mml:mn>
                        <mml:mrow>
                          <mml:mo stretchy="false">|</mml:mo>
                          <mml:mi class="ltx_font_mathcaligraphic">ℬ</mml:mi>
                          <mml:mo stretchy="false">|</mml:mo>
                        </mml:mrow>
                      </mml:mfrac>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>l</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>o</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>g</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mfrac>
                          <mml:mrow>
                            <mml:mi>p</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mi>a</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mi>i</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:msup>
                              <mml:mi>r</mml:mi>
                              <mml:mo>+</mml:mo>
                            </mml:msup>
                          </mml:mrow>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mi>p</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>a</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:msup>
                                <mml:mi>r</mml:mi>
                                <mml:mo>+</mml:mo>
                              </mml:msup>
                            </mml:mrow>
                            <mml:mo>+</mml:mo>
                            <mml:mrow>
                              <mml:mi>p</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>a</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:msup>
                                <mml:mi>r</mml:mi>
                                <mml:mo>−</mml:mo>
                              </mml:msup>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mfrac>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p>
            <disp-formula id="S3.E5">
              <mml:math alttext="pair^{+}=\sum_{b\in\mathcal{B}}\sum_{i=0}^{m}exp(s(r_{sa,b}^{i},r_{ta,b}^{i})/\tau)" display="block">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi>p</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>a</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>i</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:msup>
                      <mml:mi>r</mml:mi>
                      <mml:mo>+</mml:mo>
                    </mml:msup>
                  </mml:mrow>
                  <mml:mo rspace="0.111em">=</mml:mo>
                  <mml:mrow>
                    <mml:munder>
                      <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                      <mml:mrow>
                        <mml:mi>b</mml:mi>
                        <mml:mo>∈</mml:mo>
                        <mml:mi class="ltx_font_mathcaligraphic">ℬ</mml:mi>
                      </mml:mrow>
                    </mml:munder>
                    <mml:mrow>
                      <mml:munderover>
                        <mml:mo movablelimits="false">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>0</mml:mn>
                        </mml:mrow>
                        <mml:mi>m</mml:mi>
                      </mml:munderover>
                      <mml:mrow>
                        <mml:mi>e</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mi>x</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mi>p</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msubsup>
                                  <mml:mi>r</mml:mi>
                                  <mml:mrow>
                                    <mml:mrow>
                                      <mml:mi>s</mml:mi>
                                      <mml:mo>⁢</mml:mo>
                                      <mml:mi>a</mml:mi>
                                    </mml:mrow>
                                    <mml:mo>,</mml:mo>
                                    <mml:mi>b</mml:mi>
                                  </mml:mrow>
                                  <mml:mi>i</mml:mi>
                                </mml:msubsup>
                                <mml:mo>,</mml:mo>
                                <mml:msubsup>
                                  <mml:mi>r</mml:mi>
                                  <mml:mrow>
                                    <mml:mrow>
                                      <mml:mi>t</mml:mi>
                                      <mml:mo>⁢</mml:mo>
                                      <mml:mi>a</mml:mi>
                                    </mml:mrow>
                                    <mml:mo>,</mml:mo>
                                    <mml:mi>b</mml:mi>
                                  </mml:mrow>
                                  <mml:mi>i</mml:mi>
                                </mml:msubsup>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>/</mml:mo>
                            <mml:mi>τ</mml:mi>
                          </mml:mrow>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p>
            <disp-formula id="S3.E6">
              <mml:math alttext="pair^{-}=\sum_{b\in\mathcal{B}}\sum_{i=0}^{m}\sum_{j=i+1}^{m}exp(s(r_{sa,b}^{i%&#10;},r_{ta,b}^{j})/\tau)" display="block">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi>p</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>a</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>i</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:msup>
                      <mml:mi>r</mml:mi>
                      <mml:mo>−</mml:mo>
                    </mml:msup>
                  </mml:mrow>
                  <mml:mo rspace="0.111em">=</mml:mo>
                  <mml:mrow>
                    <mml:munder>
                      <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                      <mml:mrow>
                        <mml:mi>b</mml:mi>
                        <mml:mo>∈</mml:mo>
                        <mml:mi class="ltx_font_mathcaligraphic">ℬ</mml:mi>
                      </mml:mrow>
                    </mml:munder>
                    <mml:mrow>
                      <mml:munderover>
                        <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>0</mml:mn>
                        </mml:mrow>
                        <mml:mi>m</mml:mi>
                      </mml:munderover>
                      <mml:mrow>
                        <mml:munderover>
                          <mml:mo movablelimits="false">∑</mml:mo>
                          <mml:mrow>
                            <mml:mi>j</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>+</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mi>m</mml:mi>
                        </mml:munderover>
                        <mml:mrow>
                          <mml:mi>e</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>x</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>p</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mi>s</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mrow>
                                  <mml:mo stretchy="false">(</mml:mo>
                                  <mml:msubsup>
                                    <mml:mi>r</mml:mi>
                                    <mml:mrow>
                                      <mml:mrow>
                                        <mml:mi>s</mml:mi>
                                        <mml:mo>⁢</mml:mo>
                                        <mml:mi>a</mml:mi>
                                      </mml:mrow>
                                      <mml:mo>,</mml:mo>
                                      <mml:mi>b</mml:mi>
                                    </mml:mrow>
                                    <mml:mi>i</mml:mi>
                                  </mml:msubsup>
                                  <mml:mo>,</mml:mo>
                                  <mml:msubsup>
                                    <mml:mi>r</mml:mi>
                                    <mml:mrow>
                                      <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mo>⁢</mml:mo>
                                        <mml:mi>a</mml:mi>
                                      </mml:mrow>
                                      <mml:mo>,</mml:mo>
                                      <mml:mi>b</mml:mi>
                                    </mml:mrow>
                                    <mml:mi>j</mml:mi>
                                  </mml:msubsup>
                                  <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                              <mml:mo>/</mml:mo>
                              <mml:mi>τ</mml:mi>
                            </mml:mrow>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS1.SSS2.p7">where <inline-formula><mml:math alttext="pair^{+}" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mi>a</mml:mi><mml:mo>⁢</mml:mo><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:msup><mml:mi>r</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math alttext="pair^{-}" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mi>a</mml:mi><mml:mo>⁢</mml:mo><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:msup><mml:mi>r</mml:mi><mml:mo>−</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> represent the cross-view positive and negative pairs, respectively, <inline-formula><mml:math alttext="\mathcal{B}" display="inline"><mml:mi class="ltx_font_mathcaligraphic">ℬ</mml:mi></mml:math></inline-formula> is the sample batch, and <inline-formula><mml:math alttext="\tau" display="inline"><mml:mi>τ</mml:mi></mml:math></inline-formula> is the temperature parameter. The function <inline-formula><mml:math alttext="s(\cdot)" display="inline"><mml:mrow><mml:mi>s</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo lspace="0em" rspace="0em">⋅</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> represents the cosine similarity. The representation generated from the scalp view is denoted as <inline-formula><mml:math alttext="r_{s}" display="inline"><mml:msub><mml:mi>r</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:math></inline-formula>, and the representation generated from the inner neural topology view is denoted as <inline-formula><mml:math alttext="r_{t}" display="inline"><mml:msub><mml:mi>r</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula>. <inline-formula><mml:math alttext="r_{sa}" display="inline"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>⁢</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="r_{ta}" display="inline"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>⁢</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represent the corresponding augmented samples, and <inline-formula><mml:math alttext="b" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> indexes the samples contained in the batch.</p>
        </sec>
      </sec>
      <sec id="S3.SS2">
        <label>3.2</label>
        <title>Mask Autoencoder Approaches</title>
        <p id="S3.SS2.p1">Masked language modeling is a widely adopted method for pre-training in NLP. BERT [<xref rid="ref030" ref-type="bibr">30</xref>] retains a portion of the input sequence and predicts the missing content during the training phase, which generates effective representations for various downstream tasks. MAE can be represented as:</p>
        <p>
          <disp-formula id="S3.E7">
            <mml:math alttext="x_{m}=\mathcal{M}(x),\quad z=E(x_{m}),\quad\tilde{x}=D(z)," display="block">
              <mml:mrow>
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>x</mml:mi>
                      <mml:mi>m</mml:mi>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mi class="ltx_font_mathcaligraphic">ℳ</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mi>x</mml:mi>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo rspace="1.167em">,</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mi>z</mml:mi>
                      <mml:mo>=</mml:mo>
                      <mml:mrow>
                        <mml:mi>E</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:msub>
                            <mml:mi>x</mml:mi>
                            <mml:mi>m</mml:mi>
                          </mml:msub>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo rspace="1.167em">,</mml:mo>
                    <mml:mrow>
                      <mml:mover accent="true">
                        <mml:mi>x</mml:mi>
                        <mml:mo>~</mml:mo>
                      </mml:mover>
                      <mml:mo>=</mml:mo>
                      <mml:mrow>
                        <mml:mi>D</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mi>z</mml:mi>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
                <mml:mo>,</mml:mo>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p>
          <disp-formula id="S3.E8">
            <mml:math alttext="\mathcal{L}=\mathcal{M}(\|x-\tilde{x}\|_{2})" display="block">
              <mml:mrow>
                <mml:mi class="ltx_font_mathcaligraphic">ℒ</mml:mi>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mi class="ltx_font_mathcaligraphic">ℳ</mml:mi>
                  <mml:mo>⁢</mml:mo>
                  <mml:mrow>
                    <mml:mo stretchy="false">(</mml:mo>
                    <mml:msub>
                      <mml:mrow>
                        <mml:mo stretchy="false">‖</mml:mo>
                        <mml:mrow>
                          <mml:mi>x</mml:mi>
                          <mml:mo>−</mml:mo>
                          <mml:mover accent="true">
                            <mml:mi>x</mml:mi>
                            <mml:mo>~</mml:mo>
                          </mml:mover>
                        </mml:mrow>
                        <mml:mo stretchy="false">‖</mml:mo>
                      </mml:mrow>
                      <mml:mn>2</mml:mn>
                    </mml:msub>
                    <mml:mo stretchy="false">)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p>where <inline-formula><mml:math alttext="\mathcal{M}(\cdot)" display="inline"><mml:mrow><mml:mi class="ltx_font_mathcaligraphic">ℳ</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo lspace="0em" rspace="0em">⋅</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> denotes the masking operation, <inline-formula><mml:math alttext="x_{m}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:math></inline-formula> represents the masked input, <inline-formula><mml:math alttext="E(\cdot)" display="inline"><mml:mrow><mml:mi>E</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo lspace="0em" rspace="0em">⋅</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math alttext="D(\cdot)" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo lspace="0em" rspace="0em">⋅</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> represent the encoder and decoder.</p>
        <p id="S3.SS2.p2">Inspired by this, BENDR [<xref rid="ref031" ref-type="bibr">31</xref>] follows the wav2vec2.0 [<xref rid="ref032" ref-type="bibr">32</xref>] architecture. It first encodes EEG data into temporal embeddings using 1D convolutions, then creates a mask vector to randomly mask these embeddings. A transformer-based module [<xref rid="ref033" ref-type="bibr">33</xref>] is then used to extract temporal correlations and output the reconstructed embeddings. The contrastive loss function aims to make the reconstructed embeddings as similar as possible to the original unmasked embeddings while making them as different as possible from the remaining embeddings. It can be calculated as follows:</p>
        <p>
          <disp-formula id="S3.E9">
            <mml:math alttext="\mathcal{L}=-log\frac{exp(cossim(c_{t},b_{t}))/\kappa}{\sum_{b_{i}\in B_{D}}%&#10;\exp(cossim(c_{t},b_{i}))/\kappa}" display="block">
              <mml:mrow>
                <mml:mi class="ltx_font_mathcaligraphic">ℒ</mml:mi>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>−</mml:mo>
                  <mml:mrow>
                    <mml:mi>l</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>o</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>g</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mfrac>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:mi>e</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>x</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>p</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mrow>
                              <mml:mi>c</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>o</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>m</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                  <mml:mi>c</mml:mi>
                                  <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo>,</mml:mo>
                                <mml:msub>
                                  <mml:mi>b</mml:mi>
                                  <mml:mi>t</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mo>/</mml:mo>
                        <mml:mi>κ</mml:mi>
                      </mml:mrow>
                      <mml:mrow>
                        <mml:msub>
                          <mml:mo>∑</mml:mo>
                          <mml:mrow>
                            <mml:msub>
                              <mml:mi>b</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>∈</mml:mo>
                            <mml:msub>
                              <mml:mi>B</mml:mi>
                              <mml:mi>D</mml:mi>
                            </mml:msub>
                          </mml:mrow>
                        </mml:msub>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mi>exp</mml:mi>
                            <mml:mo>⁡</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mrow>
                                <mml:mi>c</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>o</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>s</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>s</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>i</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>m</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mrow>
                                  <mml:mo stretchy="false">(</mml:mo>
                                  <mml:msub>
                                    <mml:mi>c</mml:mi>
                                    <mml:mi>t</mml:mi>
                                  </mml:msub>
                                  <mml:mo>,</mml:mo>
                                  <mml:msub>
                                    <mml:mi>b</mml:mi>
                                    <mml:mi>i</mml:mi>
                                  </mml:msub>
                                  <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mo>/</mml:mo>
                          <mml:mi>κ</mml:mi>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mfrac>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS2.p3">where <inline-formula><mml:math alttext="c_{t}" display="inline"><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> represents the output of the transformer module at position <inline-formula><mml:math alttext="t" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math alttext="b_{i}" display="inline"><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> represents the original vector at some offset <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math alttext="B_{D}" display="inline"><mml:msub><mml:mi>B</mml:mi><mml:mi>D</mml:mi></mml:msub></mml:math></inline-formula> is a set of 20 negative samples uniformly selected from the same sequence, along with <inline-formula><mml:math alttext="b_{t}" display="inline"><mml:msub><mml:mi>b</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math alttext="cossim" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>⁢</mml:mo><mml:mi>o</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi><mml:mo>⁢</mml:mo><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:math></inline-formula> denotes the cosine similarity, and <inline-formula><mml:math alttext="\kappa" display="inline"><mml:mi>κ</mml:mi></mml:math></inline-formula> is a temperature parameter controlling the contrastive loss.</p>
        <p id="S3.SS2.p4">MAEEG [<xref rid="ref034" ref-type="bibr">34</xref>] has a similar structure to BENDR but includes two additional layers to map the output of the transformer module back to the original EEG dimensions. The reconstruction loss is calculated by comparing the reconstructed EEG <inline-formula><mml:math alttext="(\hat{\mathrm{x}})" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mover accent="true"><mml:mi mathvariant="normal">x</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> with the input EEG <inline-formula><mml:math alttext="(\mathrm{x})" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="normal">x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> signal, using the formula <inline-formula><mml:math alttext="1-\frac{\mathbf{\hat{x}}\cdot\mathbf{x}}{\|\mathbf{\hat{x}}\|\|\mathbf{x}\|}" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo>−</mml:mo><mml:mfrac><mml:mrow><mml:mover accent="true"><mml:mi>𝐱</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo lspace="0.222em" rspace="0.222em">⋅</mml:mo><mml:mi>𝐱</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">‖</mml:mo><mml:mover accent="true"><mml:mi>𝐱</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">‖</mml:mo></mml:mrow><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">‖</mml:mo><mml:mi>𝐱</mml:mi><mml:mo stretchy="false">‖</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula>. The key difference between BENDR and MAEEG is that MAEEG learns representations by minimizing the reconstruction loss rather than using contrastive learning.</p>
        <p id="S3.SS2.p5">Unlike the above two methods that mask temporal embeddings, WAVELET2VEC [<xref rid="ref035" ref-type="bibr">35</xref>] performs masking and reconstruction tasks in different frequency bands to capture time-frequency information. Specifically, the authors apply low-pass and high-pass filtering to the raw EEG signal, recursively calculate the coefficients of each level of decomposition, and obtain wavelets in different frequency bands. They then design an encoder consisting of six parallel ViT [<xref rid="ref036" ref-type="bibr">36</xref>] units, each corresponding to a frequency band wavelet. Each wavelet is flattened and divided into patches, and 10% of the input patches are randomly masked. The decoder reconstructs the missing patch sequences, and self-supervised pre-training is performed by minimizing the Euclidean distance between the patch sequences of the original signal and the reconstructed patch sequences. This method forces the model to learn the time-frequency information and understand its correlations by masking the frequency patch sequences of the EEG.</p>
      </sec>
      <sec id="S3.SS3">
        <label>3.3</label>
        <title>Discussion</title>
        <p id="S3.SS3.p1">Contrastive learning and Masked Autoencoders (MAE) have demonstrated significant advantages in EEG analysis. Contrastive learning effectively extracts feature representations by exploring the similarities and differences among samples, while MAE enhances the model's understanding of data by predicting missing information. These self-supervised learning methods not only reduce dependency on large amounts of labeled data but also improve model generalization.</p>
        <p id="S3.SS3.p2">However, current self-supervised learning methods have some limitations. First, contrastive learning often relies on carefully designed data augmentation strategies, which may disrupt the temporal dependencies in EEG data and negatively affect the model's learning effectiveness. Additionally, the analysis and processing of multi-channel EEG data remains complex, and existing methods still face challenges in effectively handling multi-channel signals. Although MAE can capture the intrinsic characteristics of input data, its mask strategy may lead to information loss in some cases, thereby affecting reconstruction quality and downstream task performance.</p>
        <p id="S3.SS3.p3">Future research directions could focus on the following aspects: developing more efficient and flexible data augmentation techniques to better preserve the structural characteristics of EEG data while ensuring that augmentation does not compromise the physiological significance of the signals. Given the complexity of multi-channel time series data, researchers should continue to design self-supervised learning frameworks capable of handling multi-channel signals effectively, thereby improving model performance in practical applications. Additionally, combining contrastive learning and MAE could be explored to ensure that the reconstructed data not only resembles the original data but also forms meaningful distinctions with other samples, potentially mitigating issues related to information loss.</p>
      </sec>
    </sec>
    <sec id="S4">
      <label>4.</label>
      <title>Discriminative-based EEG Analysis</title>
      <p id="S4.p1">For a more profound comprehension of brain activity, this survey examines advanced architectures, including: <bold>Graph Neural Networks (GNNs)</bold> in <xref rid="S4.SS1">section 4.1</xref>: These networks capitalize on the structural information inherent in brain connectivity to offer deeper insights. <bold>Foundation Models</bold> in <xref rid="S4.SS2">section 4.2</xref>: Models pre-trained on extensive datasets and adaptable for specific EEG analysis tasks through fine-tuning. <bold>LLMs-based Methods</bold> in <xref rid="S4.SS3">section 4.3</xref>: Leveraging the power of large language models to improve the interpretability of EEG data.</p>
      <sec id="S4.SS1">
        <label>4.1</label>
        <title>Graph Neural Networks</title>
        <p id="S4.SS1.p1">EEG data is a type of multi-channel time series data, in which multiple channels (brain regions) are related to each other, with structural and functional connectivity [<xref rid="ref057" ref-type="bibr">57</xref>]. Due to brain regions are in non-Euclidean space, graph is the most appropriate data structure to indicate brain connection [<xref rid="ref058" ref-type="bibr">58</xref>]. In recent years, graph neural networks(GNN), represented by graph convolutional networks(GCN) [<xref rid="ref059" ref-type="bibr">59</xref>], have developed rapidly and become a powerful tool for learning non-Euclidean data representations. They are able to capture intricate relationships inter-variable and inter-temporal, therefore emerging as one of the mainstream frameworks for modeling multivariate time series. Motivated by the success of graph representation learning, a line of studies has utilized GNNs to perform multivariate time series analysis and demonstrate promising results in many downstream tasks such as classification [<xref rid="ref060" ref-type="bibr">60</xref>], forecasting [<xref rid="ref061" ref-type="bibr">61</xref>], and anomaly detection [<xref rid="ref062" ref-type="bibr">62</xref>]. The survey by Jin et al. [<xref rid="ref009" ref-type="bibr">9</xref>] has summarized the application of GNNs in time series analysis, but it does not specifically concentrate on EEG data and only briefly outlines the application in the field of healthcare. In contrast, this paper mainly focuses on EEG data, reviews the recent advances in mainstream EEG analysis tasks with GNNs. It covers a wide range of tasks such as epilepsy detection, sleep staging, and emotion recognition, and sorts out related works from the perspective of EEG graph construction and dependency modeling(as shown in Figure <xref ref-type="fig" rid="F3">3</xref>). All of the methods are summarized in Table <xref rid="T2" ref-type="table">2</xref>.</p>
        <p>
          <fig id="F3">
            <label>Figure 3.</label>
            <caption>
              <p>General pipeline for EEG analysis using graph neural networks.</p>
            </caption>
            <graphic xlink:href="images/GNN_pipeline.pdf"/>
          </fig>
        </p>
        <p>
          <table-wrap-group>
            <table-wrap id="T2">
              <label>Table 2</label>
              <caption>
                <p> Summary of representative GNN-based methods for EEG analysis.</p>
              </caption>
              <table-wrap-foot>
                <p>
                  <p>
                    <table-wrap>
                      <table>
                        <tbody>
                          <tr>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Task</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Method</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Graph Construction</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Dependency Modeling</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Training</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Datasets</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Metric</bold>
                            </td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="2" align="center">Sleep Stage Classification</td>
                            <td style="border-top: 1px solid black;" align="center">GraphSleepNet[<xref rid="ref058" ref-type="bibr">58</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Learned</td>
                            <td style="border-top: 1px solid black;" align="center">Spectral, Attention, CNN</td>
                            <td style="border-top: 1px solid black;" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td align="center">MSTGCN [<xref rid="ref069" ref-type="bibr">69</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">Spectral, Attention, CNN</td>
                            <td align="center">-</td>
                            <td align="center">ISRUC-S3[<xref rid="ref040" ref-type="bibr">40</xref>], MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>]</td>
                            <td align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="2" align="center">Emotion Recognition</td>
                            <td style="border-top: 1px solid black;" align="center">HetEmotionNet [<xref rid="ref072" ref-type="bibr">72</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">FC</td>
                            <td style="border-top: 1px solid black;" align="center">Spectral, GRU</td>
                            <td style="border-top: 1px solid black;" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">DEAP[<xref rid="ref074" ref-type="bibr">74</xref>], MAHNOB-HCI[<xref rid="ref075" ref-type="bibr">75</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Valence, Arousal</td>
                          </tr>
                          <tr>
                            <td align="center">MD-AGCN [<xref rid="ref070" ref-type="bibr">70</xref>]</td>
                            <td align="center">FC, Learned</td>
                            <td align="center">Spatial</td>
                            <td align="center">-</td>
                            <td align="center">SEED, SEED-IV, SEED-V[<xref rid="ref038" ref-type="bibr">38</xref>]</td>
                            <td align="center">Accuracy</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="5" align="center">Seizure Detection</td>
                            <td style="border-top: 1px solid black;" align="center">Tang et al. [<xref rid="ref067" ref-type="bibr">67</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">SC, FC</td>
                            <td style="border-top: 1px solid black;" align="center">Spatial, Spectral, GRU</td>
                            <td style="border-top: 1px solid black;" align="center">Generative Learning</td>
                            <td style="border-top: 1px solid black;" align="center">TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">AUC, F1</td>
                          </tr>
                          <tr>
                            <td align="center">BrainNet [<xref rid="ref073" ref-type="bibr">73</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">Spatial</td>
                            <td align="center">Contrastive Learning</td>
                            <td align="center">Private data</td>
                            <td align="center">Precision, Recall, F1, F2, AUC</td>
                          </tr>
                          <tr>
                            <td align="center">MBrain [<xref rid="ref002" ref-type="bibr">2</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">-</td>
                            <td align="center">Contrastive Learning</td>
                            <td align="center">Private data, TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td align="center">Precision, Recall, F1, F2</td>
                          </tr>
                          <tr>
                            <td rowspan="2" align="center">EEG-CGS[<xref rid="ref068" ref-type="bibr">68</xref>]</td>
                            <td rowspan="2" align="center">SC, FC</td>
                            <td rowspan="2" align="center">-</td>
                            <td align="center">Contrastive Learning</td>
                            <td rowspan="2" align="center">TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td align="center">AUC, Precision, F1</td>
                          </tr>
                          <tr>
                            <td align="center">and Generative Learning</td>
                            <td align="center">Sensitivity, Specificity</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" align="center">Sleep Stage Classification</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">BayesEEGNet [<xref rid="ref071" ref-type="bibr">71</xref>]</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Learned</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Spatial</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>], SEED[<xref rid="ref038" ref-type="bibr">38</xref>]</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td style="border-bottom: 1px solid black;" align="center">and Emotion Recognition</td>
                            <td style="border-bottom: 1px solid black;" align="center">ISRUC-S3 [<xref rid="ref040" ref-type="bibr">40</xref>]</td>
                          </tr>
                        </tbody>
                      </table>
                    </table-wrap>
                  </p>
                </p>
              </table-wrap-foot>
            </table-wrap>
            <table-wrap id="T">
              <caption>
                <p>Graph Construction: "SC" and "FC" denote "structural connectivity" and "functional connectivity", respectively. "Learned" indicates that the graph structure is learned from data.</p>
              </caption>
              <table-wrap-foot>
                <p>
                  <p>
                    <table-wrap>
                      <table>
                        <tbody>
                          <tr>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Task</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Method</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Graph Construction</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Dependency Modeling</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Training</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Datasets</bold>
                            </td>
                            <td style="border-top: 1px solid black;" align="center">
                              <bold>Metric</bold>
                            </td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="2" align="center">Sleep Stage Classification</td>
                            <td style="border-top: 1px solid black;" align="center">GraphSleepNet[<xref rid="ref058" ref-type="bibr">58</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Learned</td>
                            <td style="border-top: 1px solid black;" align="center">Spectral, Attention, CNN</td>
                            <td style="border-top: 1px solid black;" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td align="center">MSTGCN [<xref rid="ref069" ref-type="bibr">69</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">Spectral, Attention, CNN</td>
                            <td align="center">-</td>
                            <td align="center">ISRUC-S3[<xref rid="ref040" ref-type="bibr">40</xref>], MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>]</td>
                            <td align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="2" align="center">Emotion Recognition</td>
                            <td style="border-top: 1px solid black;" align="center">HetEmotionNet [<xref rid="ref072" ref-type="bibr">72</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">FC</td>
                            <td style="border-top: 1px solid black;" align="center">Spectral, GRU</td>
                            <td style="border-top: 1px solid black;" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">DEAP[<xref rid="ref074" ref-type="bibr">74</xref>], MAHNOB-HCI[<xref rid="ref075" ref-type="bibr">75</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">Valence, Arousal</td>
                          </tr>
                          <tr>
                            <td align="center">MD-AGCN [<xref rid="ref070" ref-type="bibr">70</xref>]</td>
                            <td align="center">FC, Learned</td>
                            <td align="center">Spatial</td>
                            <td align="center">-</td>
                            <td align="center">SEED, SEED-IV, SEED-V[<xref rid="ref038" ref-type="bibr">38</xref>]</td>
                            <td align="center">Accuracy</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" rowspan="5" align="center">Seizure Detection</td>
                            <td style="border-top: 1px solid black;" align="center">Tang et al. [<xref rid="ref067" ref-type="bibr">67</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">SC, FC</td>
                            <td style="border-top: 1px solid black;" align="center">Spatial, Spectral, GRU</td>
                            <td style="border-top: 1px solid black;" align="center">Generative Learning</td>
                            <td style="border-top: 1px solid black;" align="center">TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td style="border-top: 1px solid black;" align="center">AUC, F1</td>
                          </tr>
                          <tr>
                            <td align="center">BrainNet [<xref rid="ref073" ref-type="bibr">73</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">Spatial</td>
                            <td align="center">Contrastive Learning</td>
                            <td align="center">Private data</td>
                            <td align="center">Precision, Recall, F1, F2, AUC</td>
                          </tr>
                          <tr>
                            <td align="center">MBrain [<xref rid="ref002" ref-type="bibr">2</xref>]</td>
                            <td align="center">Learned</td>
                            <td align="center">-</td>
                            <td align="center">Contrastive Learning</td>
                            <td align="center">Private data, TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td align="center">Precision, Recall, F1, F2</td>
                          </tr>
                          <tr>
                            <td rowspan="2" align="center">EEG-CGS[<xref rid="ref068" ref-type="bibr">68</xref>]</td>
                            <td rowspan="2" align="center">SC, FC</td>
                            <td rowspan="2" align="center">-</td>
                            <td align="center">Contrastive Learning</td>
                            <td rowspan="2" align="center">TUSZ[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                            <td align="center">AUC, Precision, F1</td>
                          </tr>
                          <tr>
                            <td align="center">and Generative Learning</td>
                            <td align="center">Sensitivity, Specificity</td>
                          </tr>
                          <tr>
                            <td style="border-top: 1px solid black;" align="center">Sleep Stage Classification</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">BayesEEGNet [<xref rid="ref071" ref-type="bibr">71</xref>]</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Learned</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Spatial</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">-</td>
                            <td style="border-top: 1px solid black;" align="center">MASS-SS3[<xref rid="ref050" ref-type="bibr">50</xref>], SEED[<xref rid="ref038" ref-type="bibr">38</xref>]</td>
                            <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Accuracy, F1, Kappa</td>
                          </tr>
                          <tr>
                            <td style="border-bottom: 1px solid black;" align="center">and Emotion Recognition</td>
                            <td style="border-bottom: 1px solid black;" align="center">ISRUC-S3 [<xref rid="ref040" ref-type="bibr">40</xref>]</td>
                          </tr>
                        </tbody>
                      </table>
                    </table-wrap>
                  </p>
                </p>
              </table-wrap-foot>
            </table-wrap>
          </table-wrap-group>
        </p>
        <sec id="S4.SS1.SSS1">
          <label>4.1.1</label>
          <title>EEG Graph Construction</title>
          <p id="S4.SS1.SSS1.p1">In general, each channel in the EEG signal is considered as a node in the graph. Referring to structural connectivity and functional connectivity, the methods for calculating adjacency matrix can be roughly divided into two categories. One is based on the geometry of EEG channels, the other is based on functional connectivity between brain regions. Based on the geometry between the channels, i.e., the anatomical connections between brain regions, previous studies have presented that adjacent brain regions affect each other and the strength of the impact is inversely proportional to the actual physical distance [<xref rid="ref063" ref-type="bibr">63</xref>]. Thus, the adjacency matrix of the graph is constructed from the Euclidean distance between the electrodes, and it is worth noting that this matrix is the same for all EEG. The other is based on functional connectivity between brain regions, which captures dynamic brain connections that vary between different EEG. It is often calculated based on correlations or dependencies among signals, and the most common methods are Pearson Correlation Coefficient(PCC) [<xref rid="ref064" ref-type="bibr">64</xref>], Mutual Information(MI) [<xref rid="ref065" ref-type="bibr">65</xref>], and Phase Locking Value(PLV) [<xref rid="ref066" ref-type="bibr">66</xref>].</p>
          <p id="S4.SS1.SSS1.p2">Tang et al. [<xref rid="ref067" ref-type="bibr">67</xref>] utilizes the above two methods to construct EEGs as graphs and only uses one type of graph as input at a time. Experimental results on the TUSZ v1.5.2 dataset show that the correlation-based graph structure can better localizes focal seizures than the distance-based graph. For a given EEG, Ho et al. [<xref rid="ref068" ref-type="bibr">68</xref>] employs four different metrics to construct graphs, including nodes Euclidean distance, randomly connection of nodes, node features correlations, and directed transfer function. The first two are meant to capture the geometry of EEG channels and the last two are for capturing connectivity of brain regions.</p>
          <p id="S4.SS1.SSS1.p3">Although the correlation-based graph can be used even when the physical locations of electrodes are unknown, the adjacency matrix is still fixed, which limits its performance to a certain extent. To solve this problem, a lot of research has explored adaptive graph learning strategies. For example, GraphSleepNet [<xref rid="ref058" ref-type="bibr">58</xref>] learns the connection relationship between two nodes based on their input features. Specifically, it is implemented through a layer neural network. If the distance between the features of the two nodes is larger, the connection of the two in the adjacency matrix is smaller. And the loss function is defined to be optimized towards this direction. The superiority of adaptive (learnable) adjacency matrix is demonstrated by comparing it with fixed adjacency matrices in the experiment. MSTGCN [<xref rid="ref069" ref-type="bibr">69</xref>] uses the adaptive graph learning method proposed by GraphSleepNet [<xref rid="ref058" ref-type="bibr">58</xref>], and also computes the spatial distance-based brain graph. Both views serve as the input of the model to extract features and a concatenate operation is employed to perform feature fusion on the two views. The results of the ablation experiment show that multi-view fusion is more effective than using only one single view. MD-AGCN [<xref rid="ref070" ref-type="bibr">70</xref>] constructs temporal domain functional brain connectivity and frequency domain functional brain connectivity, respectively. Pearson's correlation coefficient is used as the connectivity index in the temporal domain. The frequency-domain adjacency matrix is divided into public part and private part. Public part is shared by all of the samples and is set to be trainable parameters, which illustrates the general functional brain connectivity patterns for emotional recognition. Private part is obtained by computing the dot product between two vertexes, and is unique to each sample. Before performing classification, functional brain connections in the two domains are combined together. By visualization of the learned graphs, the results indicate that the model can process global connectivities with the deep layers. BayesEEGNet [<xref rid="ref071" ref-type="bibr">71</xref>] considers an electrical impulse between two nodes in the brain as a Poisson process, the countless electrical impulses generated by the brain in a period are represented as an infinite number of connection probability graphs. Then, the countless graphs are coupled into a summary graph by superposition of Poisson distributions, and the summary graph is subsequently transformed into the functional connectivity graph through two three-layer MLPs. By comparing with the adaptive learning strategy proposed by GraphSleepNet [<xref rid="ref058" ref-type="bibr">58</xref>], the connectivity graph obtained in this paper has the best performance in downstream tasks.</p>
        </sec>
        <sec id="S4.SS1.SSS2">
          <label>4.1.2</label>
          <title>Dependency Modeling and Graph Representation Learning</title>
          <p id="S4.SS1.SSS2.p1">Once the EEG graph is constructed, it is often necessary to model the dependencies in the graph to learn the representation that is more discriminative for the downstream task. For example, Tang et al. [<xref rid="ref067" ref-type="bibr">67</xref>] models the spatial dependency in the EEG signals by graph diffusion convolution. And to model the temporal dependency in EEGs, Gated Recurrent Units(GRUs) is employed. Also, in order to learn task-agnostic representations, a self-supervised pretraining method that predicts preprocessed signals for the next time period is proposed. For GraphSleepNet [<xref rid="ref058" ref-type="bibr">58</xref>], a spatial-temporal convolution is designed, which consists of graph convolutions for capturing spatial features and temporal convolutions for capturing temporal context information. Moreover, the attention mechanism is applied in the spatial dimension and the temporal dimension respectively to extract valuable information. BayesEEGNet [<xref rid="ref071" ref-type="bibr">71</xref>] also employs the spatial-based graph convolution to aggregate neighbor information directly in the spatial domain. For the emotion recognition task based on multi-modal signals, HetEmotionNet [<xref rid="ref072" ref-type="bibr">72</xref>] first combines the temporal domain feature vector and the mutual information based adjacency matrix to form a heterogeneous spatial-temporal graph at the current moment, and then stacks the heterogeneous graphs of all time steps to form a heterogeneous graph sequence. Next, the Graph Transformer Network(GTN) is used to model the heterogeneity of multi-modal signals by automatically extracting the meta-paths from the adjacency matrix set. GCN is used to capture the correlation between multi-modal signals, and GRU is applied to extract temporal domain features from the graph sequence obtained after GCN. BrainNet [<xref rid="ref073" ref-type="bibr">73</xref>] utilizes GCN to model two types of brain wave diffusion processes. Concretely, cross-time diffusion models the propagation of longer epileptic waves between two consecutive time segments. Meanwhile, fast signal spreading within the same time segments of each channel are captured by inner-time diffusion. The experimental results show that both diffusion processes can promote the performance of seizure detection.</p>
          <p id="S4.SS1.SSS2.p2">There are also methods to mine patterns in a graph by designing self-supervised learning tasks. To capture the correlation patterns in space and time, MBrain [<xref rid="ref002" ref-type="bibr">2</xref>] proposes two self-supervised tasks. Instantaneous time shift that is based on multi-channel Contrastive Predictive Coding(CPC) aims to capture the short-term correlations focusing on spatial patterns and delayed time shift is used for temporal patterns in broader time scales. In addition, replace discriminative learning is designed to preserve the unique characteristics of each channel so as to achieve accurate channel-wise seizure prediction. Ho et al. [<xref rid="ref068" ref-type="bibr">68</xref>] leverages a random walk with restart(RWR) technique to create two positive and one negative sub-graphs for every node in every constructed EEG graph, and employs them to perform contrastive learning. Also, a generative learning module is proposed to learn the contextual information hidden in the graph through reconstructing the target node anonymized in the positive sub-graphs, using the other node features and edges of the sub-graph. To promote spatial consistency in multiple sensors, GCC [<xref rid="ref060" ref-type="bibr">60</xref>] proposes novel graph augmentations including node augmentations and edge augmentations, to augment sensors and their correlations respectively. Next, a graph contrasting method is designed. Node-level Contrasting is achieved by contrasting sensors in different views within each sample while Graph-level Contrasting is achieved by contrasting the samples within each training batch. Through these two contrasting procedures, robust sensor-level features and global-level features can be learned.</p>
        </sec>
        <sec id="S4.SS1.SSS3">
          <label>4.1.3</label>
          <title>Discussion</title>
          <p id="S4.SS1.SSS3.p1">Due to the non-Euclidean nature of EEG signals, graphs have become one of the most suitable data structures for modeling EEG data. By capturing both structural and functional connectivity within the brain, Graph Neural Networks (GNNs) can effectively model the diffusion process of brain waves across channels (or brain regions), thereby revealing different sleep patterns, emotional states, and seizure activities, among others. This highlights the importance of GNNs as a significant method in the field of EEG data analysis.</p>
          <p id="S4.SS1.SSS3.p2">However, existing approaches still face several challenges. Most graph construction methods are heuristic and rely on prior knowledge, which in turn necessitates extensive data to experimentally validate the performance and interpretability of these methods. Moreover, considering the clinical deployment in real-world settings, the generalization performance of the model and the ethical implications of data usage must be thoroughly investigated and addressed.</p>
        </sec>
      </sec>
      <sec id="S4.SS2">
        <label>4.2</label>
        <title>Foundation Models</title>
        <p id="S4.SS2.p1">Foundation models (FMs) [<xref rid="ref076" ref-type="bibr">76</xref>], often known as large-scale pretrained models, are advanced neural networks trained on extensive datasets. These models possess a vast range of general knowledge and can recognize numerous patterns. As a result, they offer a flexible and comprehensive foundation for addressing various tasks across multiple domains. ChatGPT [<xref rid="ref077" ref-type="bibr">77</xref>] is the most famous textural foundation model that has a powerful ability to understand and generate natural language texts, and can perform a variety of natural language processing tasks, including text classification, sentiment analysis, machine translation, etc., showing extremely high flexibility and generalization capabilities. CLIP [<xref rid="ref078" ref-type="bibr">78</xref>] and SAM [<xref rid="ref079" ref-type="bibr">79</xref>] are representative visual foundation models, which exhibit robust general understanding and reasoning performance. Foundation models consistently demonstrate high performance in diverse domains, from natural language processing to computer vision, showcasing their versatility and the potential to revolutionize the way AI systems interact with and understand the world.</p>
        <p id="S4.SS2.p2">In the field of EEG data processing, researchers usually proposed specially designed methods or models for specific data or tasks. However, data annotation in the medical field is more difficult and expensive than in other fields. As a result, the size of EEG medical data sets is usually small, which greatly restricts the capabilities of the model [<xref rid="ref080" ref-type="bibr">80</xref>, <xref rid="ref073" ref-type="bibr">73</xref>]. The emergence of large language models provides a new solution for the processing of biological signal data such as EEG. Recently, a lot of work has begun to draw on the ideas of large language models, using a large amount of unlabeled data and unsupervised pre-training methods to build foundation models for EEG or biological signal data [<xref rid="ref081" ref-type="bibr">81</xref>, <xref rid="ref011" ref-type="bibr">11</xref>, <xref rid="ref082" ref-type="bibr">82</xref>, <xref rid="ref083" ref-type="bibr">83</xref>, <xref rid="ref084" ref-type="bibr">84</xref>, <xref rid="ref085" ref-type="bibr">85</xref>, <xref rid="ref086" ref-type="bibr">86</xref>]. These foundation models have learned a lot of knowledge about time series signals, can well represent EEG data, have generalization capabilities that previous models did not have, and can achieve excellent performance on different downstream tasks. Below, we outline the existing work related to foundation models in the field of EEG signals, considering the three important elements: data, model structure, and training methods. While the datasets themselves are thoroughly described in Table <xref rid="S6">6</xref>, this chapter will focus on how they are used in the process of EEG foundation models established.</p>
        <p id="S4.SS2.p3">While the datasets are crucial and will be extensively discussed, this chapter is dedicated to the presentation of the models and training methodologies. The summary of existing foundation models is shown as Table <xref rid="T3" ref-type="table">3</xref>.</p>
        <p>
          <table-wrap id="T3">
            <label>Table 3</label>
            <caption>
              <p>Summary of foundation models for EEG analysis.</p>
            </caption>
            <table>
              <thead>
                <tr>
                  <th style="border-top: 1px solid black;" align="center">
                    <bold>Method</bold>
                  </th>
                  <th style="border-top: 1px solid black;" align="center">
                    <bold>Model Structure</bold>
                  </th>
                  <th style="border-top: 1px solid black;" align="center">
                    <bold>Training</bold>
                  </th>
                  <th style="border-top: 1px solid black;" align="center">
                    <bold>Datasets</bold>
                  </th>
                  <th style="border-top: 1px solid black;" align="center">
                    <bold>Metric</bold>
                  </th>
                </tr>
              </thead>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" align="center">BrainBERT [<xref rid="ref087" ref-type="bibr">87</xref>]</td>
                  <td style="border-top: 1px solid black;" align="center">Transformer blocks</td>
                  <td style="border-top: 1px solid black;" align="center">Masked Autoencoder</td>
                  <td style="border-top: 1px solid black;" align="center">Private data</td>
                  <td style="border-top: 1px solid black;" align="center">AUC</td>
                </tr>
                <tr>
                  <td align="center">Neuro-GPT [<xref rid="ref082" ref-type="bibr">82</xref>]</td>
                  <td align="center">Convolutional blocks + Transformer blocks</td>
                  <td align="center">Future Forecast</td>
                  <td align="center">TUH EEG corpus[<xref rid="ref037" ref-type="bibr">37</xref>]</td>
                  <td align="center">MSE, Accuracy</td>
                </tr>
                <tr>
                  <td align="center">Brant [<xref rid="ref081" ref-type="bibr">81</xref>]</td>
                  <td align="center">Transformer blocks</td>
                  <td align="center">Masked Autoencoder</td>
                  <td align="center">Private data</td>
                  <td align="center">MSE, MAE, F1, F2</td>
                </tr>
                <tr>
                  <td align="center">BFM [<xref rid="ref083" ref-type="bibr">83</xref>]</td>
                  <td align="center">Convolutional blocks</td>
                  <td align="center">Contrastive Learning</td>
                  <td align="center">AHMS corpus[<xref rid="ref088" ref-type="bibr">88</xref>]</td>
                  <td align="center">AUC, MAE</td>
                </tr>
                <tr>
                  <td style="border-bottom: 1px solid black;" align="center">LaBraMs [<xref rid="ref011" ref-type="bibr">11</xref>]</td>
                  <td style="border-bottom: 1px solid black;" align="center">Convolutional blocks + Transformer blocks</td>
                  <td style="border-bottom: 1px solid black;" align="center">Masked Autoencoder</td>
                  <td style="border-bottom: 1px solid black;" align="center">Public data + Private data</td>
                  <td style="border-bottom: 1px solid black;" align="center">Accuracy, AUROC, F1</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <sec id="S4.SS2.SSS1">
          <label>4.2.1</label>
          <title>Model Structure</title>
          <p id="S4.SS2.SSS1.p1">With the rapid development of deep learning, many model structures have emerged, such as Convolutional Neural Network (CNN) [<xref rid="ref089" ref-type="bibr">89</xref>], Recurrent Neural Network (RNN) [<xref rid="ref090" ref-type="bibr">90</xref>], Transformers [<xref rid="ref091" ref-type="bibr">91</xref>], Mamba [<xref rid="ref092" ref-type="bibr">92</xref>], etc. How to design a model structure suitable for processing time series signals is the top priority in building a foundation model. A good structure can allow the foundation model to better understand and learn the information and knowledge in time series signals. Most of the existing EEG foundation models construct the main model by stacking Transformer layers or convolutional blocks. Because both structures have strong scalability and are suitable for mining information in time series signals.</p>
          <p id="S4.SS2.SSS1.p2">Brant [<xref rid="ref081" ref-type="bibr">81</xref>] has two encoders, temporal encoder and spatial encoder. The temporal encoder contains a 12-layer Transformer encoder and the spatial encoder contains a 5-layer Transformer encoder. They are used to capture the time correlation and channel correlation in time series signals, respectively. Salar et al. [<xref rid="ref083" ref-type="bibr">83</xref>] built the foundation model based on an EfficientNet-style 1D convolutional neural network. Neuro-GPT [<xref rid="ref082" ref-type="bibr">82</xref>] and LaBraM [<xref rid="ref011" ref-type="bibr">11</xref>] use both convolutional layers and Transformers layers. They first use a small number of convolutional layers to preliminarily extract the features of time series signals and transform their dimensions, and then use a large number of Transformers layers to further capture the correlation between different sequence patches and better represent time series signals.</p>
          <p id="S4.SS2.SSS1.p3">Since the input of the Transformer layer is tokens, and the time series data is a continuous value, the foundation model needs to convert the time series data into patches before subsequent calculations can be performed. A common approach is to split the original data by a fixed window size and a fixed strides. Specifically, given a neural signal <inline-formula><mml:math alttext="\bm{x}\in\mathbb{R}^{N\times C}" display="inline"><mml:mrow><mml:mi>𝒙</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math alttext="N" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of timestamps and <inline-formula><mml:math alttext="C" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> is the number of electrode channels, we divide <inline-formula><mml:math alttext="x" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> with window size <inline-formula><mml:math alttext="M" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> and stride <inline-formula><mml:math alttext="S" display="inline"><mml:mi>S</mml:mi></mml:math></inline-formula> to generate a set of patches <inline-formula><mml:math alttext="\bm{p}\in\mathbb{R}^{N_{p}\times C\times M}" display="inline"><mml:mrow><mml:mi>𝒑</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>C</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math alttext="N_{p}=\left\lfloor\frac{N-M}{S}\right\rfloor+1" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mo>⌊</mml:mo><mml:mfrac><mml:mrow><mml:mi>N</mml:mi><mml:mo>−</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mi>S</mml:mi></mml:mfrac><mml:mo>⌋</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mrow></mml:math></inline-formula> is the number of patches in each channel. After obtaining the segmented patches, additional position or frequency encoding information is usually added to them to help the model learn better. Some researchers [<xref rid="ref011" ref-type="bibr">11</xref>] also map each patch to a fixed codebook in order to make the foundation model have a fixed vocabulary like a large language model. Specifically, it first represents the patch and then utilizes quantizer to quantize all the patch representations into the neural codebook embeddings. The codebook looks up the nearest neighbor of each patch in the neural codebook.</p>
          <p id="S4.SS2.SSS1.p4">The parameter size of the existing foundation models in the EEG field is usually between tens and hundreds of millions, which is still relatively small compared to the parameters of large language models. This may be because the amount of EEG data is still much smaller than text data. However, we believe that with the continuous development of the field, the scale of the foundation model will continue to increase, and its capabilities will continue to increase.</p>
        </sec>
        <sec id="S4.SS2.SSS2">
          <label>4.2.2</label>
          <title>Training Methods</title>
          <p id="S4.SS2.SSS2.p1">In order for the model to learn useful knowledge from massive amounts of unlabeled data, it is essential to design an effective training method. A good training method is like a good teacher, which can make the learning process more efficient.</p>
          <p>
            <fig id="F4">
              <label>Figure 4.</label>
              <caption>
                <p>The different training methods of EEG foundation models.</p>
              </caption>
              <graphic xlink:href="images/foundation_model.pdf"/>
            </fig>
          </p>
          <p id="S4.SS2.SSS2.p2">Existing foundation models are all pre-trained using self-supervised methods. One of the mainstream approaches is to use masked autoencoder as a pre-training task [<xref rid="ref081" ref-type="bibr">81</xref>, <xref rid="ref011" ref-type="bibr">11</xref>, <xref rid="ref087" ref-type="bibr">87</xref>]. Masked autoencoder has been proven to be a simple and effective method in many fields, which trains model to reconstruct the whole input given its partial observation (as shown in Figure <xref ref-type="fig" rid="F4">4</xref>(a)). In this way, the foundation model can be forced to infer the whole from partial information, so that the model can learn powerful representation capabilities.</p>
          <p id="S4.SS2.SSS2.p3">There is another pre-training method that is similar to masked autoencoder, which can be understood as masking only the latter part of the input (as shown in Figure <xref ref-type="fig" rid="F4">4</xref>(b)). During the training process, the model predicts the future situation based on the historical content of the time series data [<xref rid="ref082" ref-type="bibr">82</xref>]. Its goal is actually the same as the short-term or long-term prediction in the downstream task. Therefore, the foundation model pre-trained by this method usually has strong predictive ability, which can capture regularities from historical time series data.</p>
          <p id="S4.SS2.SSS2.p4">Another type of work uses contrastive learning to train the foundation model. The core idea is to learn how to effectively distinguish similar (positive) and dissimilar (negative) data points by comparing data samples, so as to optimize the data representation or feature vector. This method can help the model capture the intrinsic structure and relationship between data, thereby improving its generalization ability on downstream tasks. For example, Salar et al. [<xref rid="ref083" ref-type="bibr">83</xref>] constructed positive and negative pairs at the participant level. Specifically, the positive pairs are selected as augmented views of two different segments from the same participant, while the segments from different subjects are regarded as negative samples (as shown in Figure <xref ref-type="fig" rid="F4">4</xref>(c)). Through this training method, the model can not only acquire strong representation capabilities, but also enhance its generalization ability on different subjects.</p>
          <p id="S4.SS2.SSS2.p5">Using various pre-training methods, the foundation model can acquire enough knowledge from a large amount of unlabeled data. Therefore, it only needs to be fine-tuned with a small amount of data to be well adapted to various downstream tasks. It can even have zero-shot capabilities like a large language model. This makes it possible to build a universal EEG foundation model.</p>
        </sec>
        <sec id="S4.SS2.SSS3">
          <label>4.2.3</label>
          <title>Discussion</title>
          <p id="S4.SS2.SSS3.p1">The emergence of foundation models for EEG data processing marks a significant advancement in the field of EEG signal analysis. By leveraging the principles and techniques of large-scale pretrained models, researchers can increasingly overcome the limitations posed by traditionally small and costly annotated EEG datasets. The ability of these foundation models to extract meaningful patterns from large amounts of unlabeled EEG data opens new avenues for improving diagnostic and therapeutic applications.</p>
          <p id="S4.SS2.SSS3.p2">Despite these advancements, several challenges remain. Although the parameter size of existing EEG foundation models has significantly improved compared to before, it is still smaller than that of large language models because the amount of EEG data is far less than the amount of text data. This disparity highlights the need for more extensive and standardized EEG datasets, potentially through collaborative data-sharing initiatives or the integration of synthetic data generation techniques. Meanwhile, ethical considerations surrounding the use of EEG data must also be addressed. Issues of privacy, data security, and informed consent are paramount, especially as these models become more integrated into clinical workflows. Ensuring that these models are developed and implemented with a strong ethical framework will be crucial for their acceptance and success in the medical community.</p>
        </sec>
      </sec>
      <sec id="S4.SS3">
        <label>4.3</label>
        <title>LLMs-based Methods</title>
        <p id="S4.SS3.p1">Large Language Models (LLMs) [<xref rid="ref093" ref-type="bibr">93</xref>, <xref rid="ref094" ref-type="bibr">94</xref>, <xref rid="ref095" ref-type="bibr">95</xref>] have revolutionized the field of natural language processing (NLP) by demonstrating remarkable capabilities in understanding, generating, and translating human language. The application of LLMs in EEG analysis represents a novel and innovative approach to interpreting complex brain signals. Unlike traditional machine learning methods, LLMs can be fine-tuned with relatively small amounts of task-specific data, making them particularly well-suited for the analysis of EEG data, which can be challenging to annotate and label.</p>
        <p>
          <table-wrap-group>
            <table-wrap id="T4">
              <label>Table 4</label>
              <caption>
                <p>Summary of LLMs-based methods for EEG analysis.</p>
              </caption>
              <table-wrap-foot>
                <p>0.3cm2.5cm   <styled-content style-type="scale-box" style="0.5"/></p>
              </table-wrap-foot>
            </table-wrap>
            <table-wrap id="T">
              <caption>
                <p><bold>LTSF</bold> contains ETTh1/h2/m1/m2, Weather, Electricity, Traffic</p>
              </caption>
              <table-wrap-foot>
                <p>0.3cm2.5cm   <styled-content style-type="scale-box" style="0.5"/></p>
              </table-wrap-foot>
            </table-wrap>
          </table-wrap-group>
        </p>
        <p id="S4.SS3.p2">The integration of LLMs into EEG analysis can take two forms: <bold>Single-tower Models</bold>: These approaches use LLMs as feature extractors for EEG data sets, which are of a single modality, implicitly leveraging the semantic knowledge that these models contain. Here, LLMs can be fine-tuned to classify different neurological states or forecast outcomes based on EEG data with Parameter Efficient Fine-Tuning (PEFT) techniques [<xref rid="ref129" ref-type="bibr">129</xref>], such as LoRA [<xref rid="ref101" ref-type="bibr">101</xref>] or soft prompt [<xref rid="ref130" ref-type="bibr">130</xref>]. Their proficiency in handling sequential data makes them particularly adept at time-series analysis. <bold>Dual-tower Models</bold>: These approaches deals with multi-modal data, where EEG is paired with text using LLMs through knowledge distillation [<xref rid="ref131" ref-type="bibr">131</xref>] or cross-modal contrastive learning [<xref rid="ref078" ref-type="bibr">78</xref>]. What's more, there has been significant progress in adapting LLMs for general time series analysis[<xref rid="ref132" ref-type="bibr">132</xref>, <xref rid="ref012" ref-type="bibr">12</xref>, <xref rid="ref010" ref-type="bibr">10</xref>]. For those familiar with the field, it is well understood that EEG data is a type of time series data. Given this, we are confident that the advancements made in general time series analysis can be successfully applied to EEG data analysis in the near future. Consequently, we intend to provide a brief overview of some mainstream methods currently utilized in general time series analysis. All of the methods are summarized in Table <xref rid="T4" ref-type="table">4</xref>.</p>
        <sec id="S4.SS3.SSS1">
          <label>4.3.1</label>
          <title>Sinlge-tower Models</title>
          <p>
            <fig id="F5">
              <label>Figure 5.</label>
              <caption>
                <p>Two types of LLMs-based methods.</p>
              </caption>
              <graphic xlink:href="fig5.png"/>
            </fig>
          </p>
          <p id="S4.SS3.SSS1.p1">These approaches use LLMs as the backbone, harnessing the models' inherent semantic understanding(as shown in Figure <xref ref-type="fig" rid="F5">5</xref>(a)). Some works adapts them for time-series forecasting tasks. Victor et.al [<xref rid="ref096" ref-type="bibr">96</xref>] first employs the Kolmogorov-Chaitin algorithm to convert EEG data into a text-like format, and then constructs a machine-learning model based on language models to predict epilepsy. PromptCast [<xref rid="ref098" ref-type="bibr">98</xref>] introduces an innovative "codeless" approach to time series forecasting, offering a fresh perspective that moves away from the sole emphasis on creating complex architectures. TEMPO [<xref rid="ref135" ref-type="bibr">135</xref>] concentrats exclusively on time series forecasting while integrating additional intricate elements such as time series decomposition and soft prompts. LLM4TS [<xref rid="ref103" ref-type="bibr">103</xref>] proposes a two-stage fine-tuning framework for time-series forecasting, addresses challenges in incorporating LLMs with time-series data. Time-LLM [<xref rid="ref105" ref-type="bibr">105</xref>] reprograms time series by incorporating the source data modality and utilizing natural language-based prompting, which unlocks the potential of LLMs as efficient time series machines. <inline-formula><mml:math alttext="S^{2}" display="inline"><mml:msup><mml:mi>S</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math></inline-formula>IP-LLM [<xref rid="ref106" ref-type="bibr">106</xref>] leverages LLMs by aligning their semantic space with time series embeddings to enhance time series forecasting through semantic space-informed prompt learning. The vast majority of existing research in the field has been centered on time-series forecasting tasks. This focus may stem from the inherent similarities between the autoregressive processes of LLMs and the forecasting nature of time-series prediction models. In other words, the resemblance lies in the fact that both types of models rely on historical data (or context) to make predictions about future data points (or words in the case of LLMs). In addition to forecasting, a few works have adapted LLMs for time-series classification. GPT4TS [<xref rid="ref107" ref-type="bibr">107</xref>] presents a unified framework with frozening the self-attention and feedforward layers of the residual blocks in the LLMs and fine-tuning the layer norm layer. TEST [<xref rid="ref109" ref-type="bibr">109</xref>] converts time-series data into a format suitable for pre-trained LLMs by employing a three-level contrast approach, which includes instance-wise, feature-wise, and text-prototype-aligned contrasts. Zhang et al. [<xref rid="ref110" ref-type="bibr">110</xref>] utilize LLMs to generate labels that guide a new reading embedding representation for EEG, enabling the prediction of human reading comprehension at the word level. In summary, recent studies reflect a burgeoning interest in harnessing the capabilities of LLMs for time-series analysis by integrating them into the architecture in ways that capitalize on the inherent strengths of LLMs.</p>
        </sec>
        <sec id="S4.SS3.SSS2">
          <label>4.3.2</label>
          <title>Dual-tower Models</title>
          <p id="S4.SS3.SSS2.p1">In addition to methods that focus solely on time series data, there have been significant efforts to develop multi-modal applications(as shown in Figure <xref ref-type="fig" rid="F5">5</xref>(b)). EEG-To-Text [<xref rid="ref133" ref-type="bibr">133</xref>] presents a novel framework using LLMs to extend brain-to-text decoding to open vocabulary and achieve zero-shot sentiment classification. MTAM [<xref rid="ref112" ref-type="bibr">112</xref>] uses a multimodal transformer alignment model to investigate the correlation between EEG data and language, enabling the observation of synchronized representations across these modalities and utilizing these aligned representations for various downstream tasks. METS [<xref rid="ref114" ref-type="bibr">114</xref>] employs a trainable ECG encoder alongside a frozen language model to embed paired ECG signals and automatically generated clinical reports separately through multimodal contrastive learning. GPT4MTS [<xref rid="ref118" ref-type="bibr">118</xref>] introduces a multimodal time series dataset for news impact forecasting and proposes a prompt-based LLM framework that leverages both numerical values and textual information. ESI [<xref rid="ref119" ref-type="bibr">119</xref>] integrates a retrieval-augmented generation (RAG) pipeline to obtain external medical knowledge, thereby enriching textual descriptions. InstructTime [<xref rid="ref122" ref-type="bibr">122</xref>] formulates the classification of time series as a multimodal understanding task, treating both task-specific instructions and raw time series data as multimodal inputs, with label information represented in text form. CrossTimeNet [<xref rid="ref126" ref-type="bibr">126</xref>] designs a time series tokenization module that effectively converts raw time series data into a sequence of discrete tokens based on a reconstruction optimization process. CALF [<xref rid="ref136" ref-type="bibr">136</xref>] develops a cross-modal match module to align cross-modal input distributions between textual and temporal data, further bridging the modality distribution gap in both feature and output spaces. EEG-GPT [<xref rid="ref127" ref-type="bibr">127</xref>] offers intermediate reasoning steps and coordinate EEG tools across different scales, providing a transparent, interpretable, step-by-step analysis that enhances trustworthiness in clinical application. K-Link [<xref rid="ref128" ref-type="bibr">128</xref>] proposes a framework that enriches a signal-derived graph by integrating a knowledge-link graph, which is constructed using LLMs, through the process of graph alignment. In summary, these efforts underscore the potential of integrating time series methods with the capabilities of LLMs to develop more robust and informative models. This is achieved through techniques that utilize a dual-tower architecture, such as cross-modal contrastive learning and knowledge distillation processes.</p>
        </sec>
        <sec id="S4.SS3.SSS3">
          <label>4.3.3</label>
          <title>Discussion</title>
          <p id="S4.SS3.SSS3.p1">The application of LLMs to EEG and other time series data modalities offers a promising approach, bridging the gap between advancements in natural language processing and time series analysis. By leveraging the inherent strengths of LLMs in semantic understanding and sequence processing, these models can unify various EEG tasks, as many of these tasks—like neurological state classification and signal forecasting—can be framed similarly to tasks in NLP. This positions LLMs as a competitive choice for building more generalizable models for EEG data analysis.</p>
          <p>
            <table-wrap id="T5">
              <label>Table 5</label>
              <caption>
                <p>Summary of EEG-To-Modality generation models.</p>
              </caption>
              <table>
                <tbody>
                  <tr>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Modality</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Method</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Encoder</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Decoder</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Pretrained</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Dataset</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Eval Metric</bold>
                    </td>
                  </tr>
                  <tr>
                    <td style="border-top: 1px solid black;" rowspan="6" align="center">Image</td>
                    <td style="border-top: 1px solid black;" align="center">Brain2Image[<xref rid="ref143" ref-type="bibr">143</xref>]</td>
                    <td style="border-top: 1px solid black;" align="center">LSTM</td>
                    <td style="border-top: 1px solid black;" align="center">VAE</td>
                    <td style="border-top: 1px solid black;" align="center">Classification</td>
                    <td style="border-top: 1px solid black;" align="center">Spampinato[<xref rid="ref144" ref-type="bibr">144</xref>]</td>
                    <td style="border-top: 1px solid black;" align="center">IS</td>
                  </tr>
                  <tr>
                    <td align="center">ThoughtViz[<xref rid="ref145" ref-type="bibr">145</xref>]</td>
                    <td align="center">CNN</td>
                    <td align="center">GAN</td>
                    <td align="center">Classification</td>
                    <td align="center">Kumar[<xref rid="ref146" ref-type="bibr">146</xref>]</td>
                    <td align="center">IS &amp; Accuracy</td>
                  </tr>
                  <tr>
                    <td align="center">EEG2Image[<xref rid="ref147" ref-type="bibr">147</xref>]</td>
                    <td align="center">LSTM</td>
                    <td align="center">DCGAN</td>
                    <td align="center">Constrastive learning</td>
                    <td align="center">Kumar[<xref rid="ref146" ref-type="bibr">146</xref>]</td>
                    <td align="center">IS</td>
                  </tr>
                  <tr>
                    <td align="center">EEGStyleGAN-ADA[<xref rid="ref148" ref-type="bibr">148</xref>]</td>
                    <td align="center">LSTM</td>
                    <td align="center">SyleGAN-ADA</td>
                    <td align="center">Constrastive learning</td>
                    <td align="center">Spampinato[<xref rid="ref144" ref-type="bibr">144</xref>] Kumar[<xref rid="ref146" ref-type="bibr">146</xref>] Kaneshiro[<xref rid="ref149" ref-type="bibr">149</xref>]</td>
                    <td align="center">IS &amp; FID &amp; KID</td>
                  </tr>
                  <tr>
                    <td align="center">DreamDiffusion[<xref rid="ref150" ref-type="bibr">150</xref>]</td>
                    <td align="center">VQ</td>
                    <td align="center">LDM</td>
                    <td align="center">MAE</td>
                    <td align="center">Spampinato[<xref rid="ref144" ref-type="bibr">144</xref>]</td>
                    <td align="center">Accuracy</td>
                  </tr>
                  <tr>
                    <td align="center">NeuroImagen[<xref rid="ref151" ref-type="bibr">151</xref>]</td>
                    <td align="center">Saliency Map, BLIP</td>
                    <td align="center">LDM</td>
                    <td align="center">Map</td>
                    <td align="center">Spampinato[<xref rid="ref144" ref-type="bibr">144</xref>]</td>
                    <td align="center">IS &amp; Accuracy &amp; SSIM</td>
                  </tr>
                  <tr>
                    <td style="border-top: 1px solid black;" rowspan="4" align="center">Text</td>
                    <td style="border-top: 1px solid black;" align="center">EEG-To-Text[<xref rid="ref133" ref-type="bibr">133</xref>]</td>
                    <td style="border-top: 1px solid black;" align="center">Transformer</td>
                    <td style="border-top: 1px solid black;" rowspan="4" align="center">BART[<xref rid="ref134" ref-type="bibr">134</xref>]</td>
                    <td style="border-top: 1px solid black;" align="center">Map</td>
                    <td style="border-top: 1px solid black;" align="center">ZuCo[<xref rid="ref111" ref-type="bibr">111</xref>]</td>
                    <td style="border-top: 1px solid black;" rowspan="4" align="center">BLEU-N &amp; ROUGE-1</td>
                  </tr>
                  <tr>
                    <td align="center">EEG2Text[<xref rid="ref152" ref-type="bibr">152</xref>]</td>
                    <td align="center">Convolutional Transformer</td>
                    <td align="center">MAE</td>
                    <td align="center">ZuCo[<xref rid="ref111" ref-type="bibr">111</xref>] Image-EEG[<xref rid="ref153" ref-type="bibr">153</xref>]</td>
                  </tr>
                  <tr>
                    <td align="center">E2T-PTR[<xref rid="ref154" ref-type="bibr">154</xref>]</td>
                    <td align="center">Multi-stream Transformer</td>
                    <td align="center">MAE</td>
                    <td align="center">ZuCo[<xref rid="ref111" ref-type="bibr">111</xref>]</td>
                  </tr>
                  <tr>
                    <td align="center">DeWave[<xref rid="ref155" ref-type="bibr">155</xref>]</td>
                    <td align="center">VQ-VAE</td>
                    <td align="center">-</td>
                    <td align="center">ZuCo[<xref rid="ref111" ref-type="bibr">111</xref>]</td>
                  </tr>
                  <tr>
                    <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="center">Others</td>
                    <td style="border-top: 1px solid black;" align="center">ETCAS[<xref rid="ref156" ref-type="bibr">156</xref>]</td>
                    <td style="border-top: 1px solid black;" align="center">-</td>
                    <td style="border-top: 1px solid black;" align="center">Dual-DualGAN</td>
                    <td style="border-top: 1px solid black;" align="center">-</td>
                    <td style="border-top: 1px solid black;" align="center">Privated data</td>
                    <td style="border-top: 1px solid black;" align="center">Accuracy &amp; PCC &amp; MCD</td>
                  </tr>
                  <tr>
                    <td style="border-bottom: 1px solid black;" align="center">NDMusic[<xref rid="ref157" ref-type="bibr">157</xref>]</td>
                    <td style="border-bottom: 1px solid black;" align="center">-</td>
                    <td style="border-bottom: 1px solid black;" align="center">BiLSTM</td>
                    <td style="border-bottom: 1px solid black;" align="center">-</td>
                    <td style="border-bottom: 1px solid black;" align="center">MusicAffect</td>
                    <td style="border-bottom: 1px solid black;" align="center">Rank accuracy</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
          <p id="S4.SS3.SSS3.p2">However, it is important to acknowledge the inherent limitations of current approaches that leverage LLMs as backbones for EEG analysis. LLMs may encounter difficulties with the unique characteristics of EEG data, including the requirement to capture fine-grained temporal patterns and the dynamic nature of evolving brain signals. Additional challenges arise as LLMs attempt to model the intricate dependencies within EEG data or to fully account for the broader topological relationships that may become relevant when integrating EEG with other data sources, such as clinical notes or external physiological signals. These limitations underscore the need for continued research and innovation in the application of LLM-based models to EEG data.</p>
          <p id="S4.SS3.SSS3.p3">There remains active debate over whether LLMs are truly effective for time series analysis [<xref rid="ref137" ref-type="bibr">137</xref>, <xref rid="ref138" ref-type="bibr">138</xref>]. Greater theoretical support is needed—for instance, in single-tower structures, where it has been suggested that the self-attention module functions analogously to principal component analysis (PCA) [<xref rid="ref139" ref-type="bibr">139</xref>], and in dual-tower structures, which offer a probabilistic perspective that supports cross-modal fine-tuning techniques [<xref rid="ref136" ref-type="bibr">136</xref>].</p>
          <p id="S4.SS3.SSS3.p4">Future research directions should focus on enhancing the ability of LLMs to more efficiently process and understand the temporal and structural information embedded in EEG data. Given that LLMs were not originally designed to directly handle time series, new modeling techniques—such as structured embeddings and task-specific adaptations—are required to bridge the gap between natural language prompts and the detailed temporal patterns present in EEG. Insights from a recent survey [<xref rid="ref140" ref-type="bibr">140</xref>] may provide valuable guidance. These developments will likely help unlock the full potential of LLMs in this domain and drive further advancements in EEG analysis through a language model-based approach.</p>
        </sec>
      </sec>
    </sec>
    <sec id="S5">
      <label>5.</label>
      <title>Generative-based EEG Analysis</title>
      <p id="S5.p1">In this section, we will delve into innovative generative applications that utilize EEG data to produce images or text, providing novel approaches to the visualization and understanding of brain activity.In this section, we explore the performance of EEG analysis methods on multi-modal generation tasks. Previous works have proved that EEG signal contain abundant semantics. It's intuitively that we can reconstruct the semantics information from EEG signal instead of just catch their representation from raw data with the help of generative model such as GANs [<xref rid="ref141" ref-type="bibr">141</xref>], Diffusion Models [<xref rid="ref142" ref-type="bibr">142</xref>] and Transformers based models. All of the methods are presented in Table <xref rid="T5" ref-type="table">5</xref>.</p>
      <sec id="S5.SS1">
        <label>5.1</label>
        <title>Image Generation</title>
        <p id="S5.SS1.p1">EEG-Image generation tasks typically follow the Map-Train-Finetune paradigm, which ensures high semantic fidelity but poses challenges in training and fine-tuning. As shown in Figure <xref ref-type="fig" rid="F6">6</xref>, the EEG-to-Image generation task involves three phases: data collection, model training, and testing. During the data collection phase, paired EEG signals and corresponding images are recorded while the subject views an image. This paired data is then used to jointly train the EEG encoder and image generator. In the testing phase, the trained model generates images directly from EEG signals. Brain2Image [<xref rid="ref143" ref-type="bibr">143</xref>] addresses these challenges by dividing the EEG-Image generation task into two distinct phases. In the first phase, Brain2Image encodes EEG signals into a lower-dimensional feature vector for conditioning in image generation. Specifically, a standard LSTM layer followed by a nonlinear layer is trained to classify the EEG signals, serving as the encoder. An additional fully-connected layer is then added to ensure the learned EEG feature vector follows a Gaussian distribution, as required by Variational Autoencoders (VAEs). In the second phase, for each EEG sequence provided to the encoder, Brain2Image uses the encoder's output to train the VAE's decoder to generate images corresponding to what the subject is observing at that precise moment. Compared to Brain2Image, ThoughtViz [<xref rid="ref145" ref-type="bibr">145</xref>] employs a 1D-CNN followed by a 2D-CNN for EEG classification as an encoder. Building on the traditional GAN architecture, ThoughtViz introduces a pre-trained classifier to classify the samples generated by the generator. The generator loss in ThoughtViz incorporates both the discriminative loss from the discriminator and the classification loss from the classifier.</p>
        <p>
          <fig id="F6">
            <label>Figure 6.</label>
            <caption>
              <p>EEG based image generation task pipeline.</p>
            </caption>
            <graphic xlink:href="images/EEG2Image.pdf"/>
          </fig>
        </p>
        <p id="S5.SS1.p2">Unlike training the EEG encoder through a supervised classification task, EEG2Image [<xref rid="ref147" ref-type="bibr">147</xref>] and EEGStyleGAN-ADA [<xref rid="ref148" ref-type="bibr">148</xref>] employ a triplet loss-based contrastive learning approach in their proposed frameworks for EEG feature learning. The triplet loss function aims to minimize the distance between data points with the same labels while maximizing the distance between data points with different labels. This approach prevents the EEG encoder from compressing the representations into small, indistinct clusters. EEG2Image utilizes a Conditional DCGAN [<xref rid="ref158" ref-type="bibr">158</xref>] architecture with hinge loss for stable training, whereas EEGStyleGAN-ADA employs StyleGAN-ADA [<xref rid="ref159" ref-type="bibr">159</xref>] with adaptive discriminator augmentation. This augmentation helps the discriminator effectively learn from limited data by augmenting real images during training.</p>
        <p id="S5.SS1.p3">With the powerful generative capabilities of Diffusion Models, an increasing number of researchers are applying these models to the EEG-Image generation task. DreamDiffusion [<xref rid="ref150" ref-type="bibr">150</xref>], for instance, collects a large-scale unlabeled EEG dataset from the MOABB [<xref rid="ref160" ref-type="bibr">160</xref>] platform and uses the MAE method for brain pretraining. During the fine-tuning stage, DreamDiffusion employs a projection layer to align brain latent representations with CLIP-Image semantic information. NeuroImagen [<xref rid="ref151" ref-type="bibr">151</xref>], on the other hand, uses detail and semantic extractors to map EEG signals to pixel and CLIP-Text priors, which are then decoded by a pretrained Stable Diffusion model following the image-to-image pipeline.</p>
      </sec>
      <sec id="S5.SS2">
        <label>5.2</label>
        <title>Text Generation</title>
        <p id="S5.SS2.p1">Unlike EEG-image generation, EEG-text generation is a sequence-to-sequence process. As shown in Figure <xref ref-type="fig" rid="F7">7</xref>, the EEG-to-Text generation task involves collecting word-level EEG signals while the subject views text (e.g., "He likes apple"). Eye-tracking may also be utilized to align EEG signals with specific words. These word-level EEG signals are then processed by an EEG-to-Text model, which decodes the signals and generates the corresponding text. Inspired by machine translation applications using pretrained BART [<xref rid="ref134" ref-type="bibr">134</xref>], Wang et al. [<xref rid="ref133" ref-type="bibr">133</xref>] consider the human brain as a unique type of encoder. They treat each EEG feature sequence as an encoded sentence by the human brain and then train an additional encoder to map the brain's embeddings to the embeddings from the pretrained BART model. Instead of using the word-level EEG features crafted based on the eye-tracking data like [<xref rid="ref133" ref-type="bibr">133</xref>], EEG2Text [<xref rid="ref152" ref-type="bibr">152</xref>] directly use the sentence-level EEG signals as input to the model. Specifically, EEG2Text leverages EEG pre-training to enhance the learning of semantics from EEG signals and proposes a multiview transformer to model the EEG signal processing by different spatial regions of the brain. Wang et al. [<xref rid="ref154" ref-type="bibr">154</xref>] introduced CET-MAE, a model that combines contrastive learning and masked signal modeling via a multi-stream encoder. It effectively learns EEG and text representations by balancing self-reconstructed latent embeddings with aligned text and EEG features. They also propose an EEG-to-Text decoding framework using Pretrained Transferable Representations, leveraging LLMs for language understanding and generation, and fully utilizing the pre-trained representations from CET-MAE. To address significant distribution variances in EEG waves across individuals and rectify order mismatches between raw wave sequences and text, DeWave [<xref rid="ref155" ref-type="bibr">155</xref>] uses a vector quantized variational encoder. This encoder transforms EEG waves into a discrete codex, linking them to tokens based on proximity to codex book entries. DeWave is the first to introduce discrete encoding into EEG signal representation, benefiting both word-level EEG features and raw EEG wave translation.</p>
        <p>
          <fig id="F7">
            <label>Figure 7.</label>
            <caption>
              <p>EEG based text generation task pipeline.</p>
            </caption>
            <graphic xlink:href="images/EEG2Text.pdf"/>
          </fig>
        </p>
      </sec>
      <sec id="S5.SS3">
        <label>5.3</label>
        <title>Others</title>
        <p id="S5.SS3.p1">In addition to image and text generation, many other EEG-to-modality generation tasks deserve attention. ETCAS [<xref rid="ref156" ref-type="bibr">156</xref>], an end-to-end GAN model tailored for EEG-based sound generation tasks, introduces a Dual-DualGAN to directly map EEG signals to speech signals. NDMusic [<xref rid="ref157" ref-type="bibr">157</xref>] adopts an end-toend bidirectional LSTM (BiLSTM) architecture to establish a direct mapping from fMRI-informed EEG signals to music signals.</p>
      </sec>
      <sec id="S5.SS4">
        <label>5.4</label>
        <title>Discussion</title>
        <p id="S5.SS4.p1">The advancements in EEG-based generation tasks, spanning image, text, and even audio outputs, highlight the growing potential of generative models in decoding brain activity into multi-modal representations. The evolution of methods from simpler Map-Train-Finetune paradigms to more advanced approaches like contrastive learning and transformer-based architectures illustrates a robust progression toward higher semantic fidelity and model adaptability. Works such as Brain2Image [<xref rid="ref143" ref-type="bibr">143</xref>], ThoughtViz [<xref rid="ref145" ref-type="bibr">145</xref>], EEG2Image [<xref rid="ref147" ref-type="bibr">147</xref>], and EEGStyleGAN-ADA [<xref rid="ref148" ref-type="bibr">148</xref>] demonstrate the success of diverse model architectures—including GANs, StyleGANs, and VAEs—particularly in leveraging novel feature extraction techniques that preserve the temporal and semantic richness of EEG data. Likewise, diffusion models, such as DreamDiffusion [<xref rid="ref150" ref-type="bibr">150</xref>] and NeuroImagen [<xref rid="ref151" ref-type="bibr">151</xref>], signify a leap in the generative quality and capacity to incorporate external semantic information, revealing promising directions for highly detailed image reconstruction.</p>
        <p id="S5.SS4.p2">In the realm of EEG-text generation, models inspired by sequence-to-sequence frameworks in NLP, like BART [<xref rid="ref134" ref-type="bibr">134</xref>] and multi-view transformers, enable more coherent mapping from EEG signals to language representations. Innovations such as CET-MAE [<xref rid="ref154" ref-type="bibr">154</xref>] and DeWave [<xref rid="ref155" ref-type="bibr">155</xref>] further address challenges related to individual variability and sequence alignment, showcasing effective strategies to bridge the distinct characteristics of EEG signals and natural language representations. These frameworks mark significant progress toward the seamless integration of pre-trained language models and EEG features, opening new avenues for interpretable and accurate text generation.</p>
        <p id="S5.SS4.p3">Future research should aim to address several critical challenges: (1) improving the robustness of EEG-based generation models across diverse data sources and populations; (2) enhancing data efficiency through unsupervised or few-shot learning approaches to mitigate the need for large labeled datasets; and (3) refining alignment techniques for cross-modal integration with clinical and physiological data. Additionally, continued exploration of discrete and structured representations, as in DeWave, could prove transformative for other EEG-based tasks by establishing a consistent framework for handling the complexity of EEG signals. These efforts will be vital in pushing the boundaries of brain decoding technologies and in developing universally applicable, robust EEG-based generative models for various modalities.</p>
      </sec>
    </sec>
    <sec id="S6">
      <label>6.</label>
      <title>Datasets and Metrics</title>
      <p id="S6.p1">The analysis of spatio-temporal EEG data relies heavily on the availability of high-quality datasets and robust evaluation metrics. This section provides an overview of the most widely used datasets and the key metrics employed to assess the performance of various EEG analysis models.</p>
      <sec id="S6.SS1">
        <label>6.1</label>
        <title>Datasets</title>
        <sec id="S6.SS1.SSS1">
          <label>6.1.1</label>
          <title>
            <bold>Publicly Available EEG Datasets</bold>
          </title>
          <p id="S6.SS1.SSS1.p1">Several publicly available EEG datasets have been instrumental in advancing the field. These datasets vary in their focus, including different cognitive tasks, subject demographics, and recording conditions. <bold>Discriminative EEG Task Dataset</bold>: These datasets are typically employed for tasks that involve distinguishing between different cognitive states or mental activities, such as classifying brain signals associated with motor imagery, attention, or emotional responses. Some of the most notable datasets include:</p>
          <p>
            <list list-type="bullet" id="S6.I1">
              <list-item id="S6.I1.i1">
                <p id="S6.I1.i1.p1"><bold>BCI Competition IV [<xref rid="ref161" ref-type="bibr">161</xref>]</bold>: This dataset comprises multiple sub-datasets, each designed for specific brain-computer interface (BCI) challenges. It includes motor imagery tasks and event-related potentials (ERPs) recorded from healthy subjects.</p>
              </list-item>
              <list-item id="S6.I1.i2">
                <p id="S6.I1.i2.p1"><bold>TUH EEG Corpus[<xref rid="ref037" ref-type="bibr">37</xref>]</bold>: The Temple University Hospital EEG Corpus is one of the largest publicly available EEG datasets. It contains EEG signals collected from 14,987 subjects, with more than 40 different channel configurations and different recording duration, including normal and abnormal samples, making it suitable for both research and clinical applications.</p>
              </list-item>
              <list-item id="S6.I1.i3">
                <p id="S6.I1.i3.p1"><bold>DEAP (Database for Emotion Analysis using Physiological Signals[<xref rid="ref074" ref-type="bibr">74</xref>])</bold>: This dataset includes EEG and other physiological signals recorded while subjects watched music videos. It is widely used for emotion recognition and affective computing studies.</p>
              </list-item>
              <list-item id="S6.I1.i4">
                <p id="S6.I1.i4.p1"><bold>CHB-MIT Scalp EEG Database[<xref rid="ref052" ref-type="bibr">52</xref>]</bold>: This dataset contains EEG recordings from pediatric subjects with intractable seizures. It is commonly used for seizure detection and prediction research.</p>
              </list-item>
              <list-item id="S6.I1.i5">
                <p id="S6.I1.i5.p1"><bold>SEED (SJTU Emotion EEG Dataset)[<xref rid="ref038" ref-type="bibr">38</xref>]</bold>: The SEED dataset includes EEG recordings from subjects experiencing emotional stimuli, such as movie clips. It is used to study emotional recognition and related applications.</p>
              </list-item>
              <list-item id="S6.I1.i6">
                <p id="S6.I1.i6.p1"><bold>ISRUC-S3 dataset[<xref rid="ref040" ref-type="bibr">40</xref>]</bold>: This dataset contains 10 healthy subjects. Each recording contains 6 EEG channels, 2 EOG channels, 3 EMG channels, and 1 ECG channel. It is widely used for sleep stage classification studies.</p>
              </list-item>
              <list-item id="S6.I1.i7">
                <p id="S6.I1.i7.p1"><bold>MASS-SS3 dataset[<xref rid="ref050" ref-type="bibr">50</xref>]</bold>: This dataset contains 62 healthy subjects. Each recording contains 20 EEG channels, 2 EOG channels, 3 EMG channels, and 1 ECG channel. It is widely used for sleep stage classification studies.</p>
              </list-item>
            </list>
          </p>
          <p id="S6.SS1.SSS1.p2"><bold>Generative EEG Task Dataset</bold>: These datasets are typically used for tasks that involve the generation of images, sentences, and other signals. For the image generative task, Spampinato et al. [<xref rid="ref144" ref-type="bibr">144</xref>], Kumar et al. [<xref rid="ref146" ref-type="bibr">146</xref>], and Kaneshiro et al. [<xref rid="ref149" ref-type="bibr">149</xref>] obtain image semantics from EEG by employing EEG data recorded while subjects looked at images on a screen. The classical dataset constructed for the generative EEG task is shown in Table <xref rid="T6" ref-type="table">6</xref>.</p>
          <p>
            <table-wrap id="T6">
              <label>Table 6</label>
              <caption>
                <p>EEG-Image dataset for image generation.</p>
              </caption>
              <table>
                <tbody>
                  <tr>
                    <th style="border-top: 1px solid black;" rowspan="2" align="center">
                      <!-- The element picture with attributes
	height=18.92tex=\diagbox[]{{\shortstack[l]{{Item}}}}{{\shortstack[r]{{Dataset}}}}{}width=93.56xml:id=T6.pic1fragid=T6.pic1 
 is currently not supported for the main body.
	-->
                    </th>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Spampinato</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Kumar</bold>
                    </td>
                    <td style="border-top: 1px solid black;" align="center">
                      <bold>Kaneshiro</bold>
                    </td>
                  </tr>
                  <tr>
                    <td align="center">[<xref rid="ref144" ref-type="bibr">144</xref>]</td>
                    <td align="center">[<xref rid="ref146" ref-type="bibr">146</xref>]</td>
                    <td align="center">[<xref rid="ref149" ref-type="bibr">149</xref>]</td>
                  </tr>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">Classes</th>
                    <td style="border-top: 1px solid black;" align="center">40</td>
                    <td style="border-top: 1px solid black;" align="center">30</td>
                    <td style="border-top: 1px solid black;" align="center">6</td>
                  </tr>
                  <tr>
                    <th align="center">Subjects</th>
                    <td align="center">6</td>
                    <td align="center">23</td>
                    <td align="center">10</td>
                  </tr>
                  <tr>
                    <th align="center">Channels</th>
                    <td align="center">128</td>
                    <td align="center">14</td>
                    <td align="center">128</td>
                  </tr>
                  <tr>
                    <th align="center">Quantity</th>
                    <td align="center">2000</td>
                    <td align="center">30</td>
                    <td align="center">72</td>
                  </tr>
                  <tr>
                    <th align="center">Frequency (Hz)</th>
                    <td align="center">1000</td>
                    <td align="center">2048</td>
                    <td align="center">1000</td>
                  </tr>
                  <tr>
                    <th align="center">Time(s)</th>
                    <td align="center">0.5</td>
                    <td align="center">10</td>
                    <td align="center">0.5</td>
                  </tr>
                  <tr>
                    <th style="border-bottom: 1px solid black;" align="center">Pause(s)</th>
                    <td style="border-bottom: 1px solid black;" align="center">10</td>
                    <td style="border-bottom: 1px solid black;" align="center">20</td>
                    <td style="border-bottom: 1px solid black;" align="center">0.75</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
          <p>
            <list list-type="bullet" id="S6.I2">
              <list-item id="S6.I2.i1">
                <p id="S6.I2.i1.p1">Spampinato et al[<xref rid="ref144" ref-type="bibr">144</xref>] employed a subset of ImageNet containing 40 classes of easily recognizable objects for visual stimuli, using a 128-channel cap (actiCAP 128Ch), Brainvision DAQs and amplifiers for the EEG data acquisition. Sampling frequency and data resolution were set, respectively, to 1000 Hz and 16 bits. During the recording process, 2,000 images (50 from each class) were shown in bursts for 0.5 seconds each. A burst lasts for 25 seconds, followed by a 10-second pause where a black image was shown for a total running time of 1,400 seconds (23 minutes and 20 seconds).</p>
              </list-item>
              <list-item id="S6.I2.i2">
                <p id="S6.I2.i2.p1">Kumar et al[<xref rid="ref146" ref-type="bibr">146</xref>] prepared a slide presentation that consisted of 20 text and 10 non-text items in 3 categories of object to the subjects, namely digits, characters and object images, each slide was showed for 10 seconds, then recording the EEG data via a wireless neuro-headset Emotiv EPOC+ at a frequency of 2048Hz and there was a 20 seconds gap between 2 record.</p>
              </list-item>
              <list-item id="S6.I2.i3">
                <p id="S6.I2.i3.p1">Kaneshiro et al[<xref rid="ref149" ref-type="bibr">149</xref>] used 72 images from 6 categories of real objects as visual stimuli, acquired the EEG data via 128-channel EGI HCGSN 110 nets in the frequency of 1000 Hz. Each image was displayed for 0.5 seconds, and there was a 0.75 second interval between each image.</p>
              </list-item>
              <list-item id="S6.I2.i4">
                <p id="S6.I2.i4.p1"><bold>ZuCo[<xref rid="ref111" ref-type="bibr">111</xref>]</bold> contains EEG and eyetracking data from 12 healthy adult native English speakers engaged in natural English text reading for 4 - 6 hours. This dataset covers two standard reading tasks and a taskspecific reading task, offering EEG and eye-tracking data for 21,629 words across 1,107 sentences and 154,173 fixations.</p>
              </list-item>
            </list>
          </p>
        </sec>
        <sec id="S6.SS1.SSS2">
          <label>6.1.2</label>
          <title>
            <bold>Private EEG Datasets</bold>
          </title>
          <p id="S6.SS1.SSS2.p1">In addition to publicly available datasets, researchers often collect private EEG datasets tailored to specific research questions or applications. These datasets may focus on particular cognitive tasks, clinical conditions, or subject populations. Specifically, private data also forms the basis of foundation models, and while its importance has been highlighted in Section <xref rid="S4.SS2">4.2</xref>. Collecting custom datasets allows for greater control over experimental conditions and data quality, but it also requires significant resources and expertise.</p>
          <p>
            <list list-type="bullet" id="S6.I3">
              <list-item id="S6.I3.i1">
                <p id="S6.I3.i1.p1">BrainBERT[<xref rid="ref087" ref-type="bibr">87</xref>] collected stereo electroencephalogram (SEEG) data from 10 subjects(5 male, 5 female; aged 4-19, with a mean age of 11.9 and a standard deviation of 4.6) over 26 sessions, who are pharmacologically intractable epilepsy patients.</p>
              </list-item>
              <list-item id="S6.I3.i2">
                <p id="S6.I3.i2.p1">BrainNet [<xref rid="ref073" ref-type="bibr">73</xref>] collected 796 GB of SEEG data from a first-class hospital. The subjects suffering from epilepsy undergo a surgical procedure to implant 4 to 10 invasive electrodes, with 52 to 126 channels, in their brain. In total, the dataset contains 526 hours of 256Hz to 1024Hz recordings.</p>
              </list-item>
              <list-item id="S6.I3.i3">
                <p id="S6.I3.i3.p1">MBrain [<xref rid="ref002" ref-type="bibr">2</xref>] collected 550 GB of SEEG data from a first-class hospital. The subjects suffering from epilepsy undergo a surgical procedure to implant 4 to 10 invasive electrodes, with 52 to 124 channels, in their brain. In total, the dataset contains 470 hours of 1000Hz to 2000Hz recordings.</p>
              </list-item>
              <list-item id="S6.I3.i4">
                <p id="S6.I3.i4.p1">Brant [<xref rid="ref081" ref-type="bibr">81</xref>] collected 1.01 TB of SEEG data from a first-class hospital. The subjects undergo a surgical procedure to implant 4 to 11 invasive electrodes, each with 52 to 153 channels, in their brain. The dataset contains 2528 hours of 1000Hz recordings with more than 1 trillion timestamps. In addition, it also collected 29.39 GB and 43 hours of epilepsy labeled data for fine-tuning of specific downstream tasks.</p>
              </list-item>
              <list-item id="S6.I3.i5">
                <p id="S6.I3.i5.p1">LaBraM [<xref rid="ref011" ref-type="bibr">11</xref>] further collected 342.23 hours of data from more than 140 subjects through the ESI neural scanning system.</p>
              </list-item>
            </list>
          </p>
        </sec>
      </sec>
      <sec id="S6.SS2">
        <label>6.2</label>
        <title>Metrics</title>
        <p id="S6.SS2.p1">Evaluating the performance of EEG analysis models involves several key metrics, which are crucial for comparing different approaches and understanding their effectiveness. The most commonly used metrics include:</p>
        <p>
          <list list-type="bullet" id="S6.I4">
            <list-item id="S6.I4.i1">
              <p id="S6.I4.i1.p1"><bold>Accuracy<xref ref-type="fn" rid="fn1">1</xref><fn id="fn1"><label><sup>1</sup></label><p id="footnote1"><ext-link xlink:href="https://en.wikipedia.org/wiki/Accuracy_and_precision">https://en.wikipedia.org/wiki/Accuracy_and_precision</ext-link></p></fn></bold>: The proportion of correctly classified instances among the total instances. It is a fundamental metric for classification tasks but may be misleading for imbalanced datasets.</p>
            </list-item>
            <list-item id="S6.I4.i2">
              <p id="S6.I4.i2.p1"><bold>Precision and Recall<xref ref-type="fn" rid="fn2">2</xref><fn id="fn2"><label><sup>2</sup></label><p id="footnote2"><ext-link xlink:href="https://en.wikipedia.org/wiki/Precision_and_recall">https://en.wikipedia.org/wiki/Precision_and_recall</ext-link></p></fn></bold>: Precision is the proportion of true positive results among the predicted positives, while recall is the proportion of true positive results among the actual positives. These metrics are particularly useful for tasks with imbalanced classes. </p>
            </list-item>
            <list-item id="S6.I4.i3">
              <p id="S6.I4.i3.p1"><bold>F1 Score<xref ref-type="fn" rid="fn3">3</xref><fn id="fn3"><label><sup>3</sup></label><p id="footnote3"><ext-link xlink:href="https://en.wikipedia.org/wiki/F-score">https://en.wikipedia.org/wiki/F-score</ext-link></p></fn></bold>: The harmonic mean of precision and recall, providing a single metric that balances both concerns. It is especially useful when the dataset has imbalanced classes.</p>
            </list-item>
            <list-item id="S6.I4.i4">
              <p id="S6.I4.i4.p1"><bold>F2 Score<xref rid="footnote3">3</xref></bold>: The harmonic mean of precision and recall, giving twice as much weight to recall. It is particularly useful in applications such as epilepsy detection, where missing positive instances (epileptic events) can be fatal.</p>
            </list-item>
            <list-item id="S6.I4.i5">
              <p id="S6.I4.i5.p1"><bold>Area Under the Receiver Operating Characteristic Curve (AUC-ROC)[<xref rid="ref162" ref-type="bibr">162</xref>]</bold>: This metric evaluates the ability of a model to distinguish between classes, considering both the true positive rate and the false positive rate. It is widely used for binary classification tasks.</p>
            </list-item>
            <list-item id="S6.I4.i6">
              <p id="S6.I4.i6.p1"><bold>Mean Squared Error (MSE)<xref ref-type="fn" rid="fn4">4</xref><fn id="fn4"><label><sup>4</sup></label><p id="footnote4"><ext-link xlink:href="https://en.wikipedia.org/wiki/Mean_squared_error">https://en.wikipedia.org/wiki/Mean_squared_error</ext-link></p></fn></bold>: Used for regression tasks, MSE measures the average squared difference between predicted and actual values. Lower MSE indicates better model performance.</p>
            </list-item>
            <list-item id="S6.I4.i7">
              <p id="S6.I4.i7.p1"><bold>Mean Absolute Error (MAE)<xref ref-type="fn" rid="fn5">5</xref><fn id="fn5"><label><sup>5</sup></label><p id="footnote5"><ext-link xlink:href="https://en.wikipedia.org/wiki/Mean_absolute_error">https://en.wikipedia.org/wiki/Mean_absolute_error</ext-link></p></fn></bold>: Another metric for regression tasks, MAE measures the average absolute difference between predicted and actual values. It is less sensitive to outliers compared to MSE.</p>
            </list-item>
            <list-item id="S6.I4.i8">
              <p id="S6.I4.i8.p1"><bold>Cohen's Kappa<xref ref-type="fn" rid="fn6">6</xref><fn id="fn6"><label><sup>6</sup></label><p id="footnote6"><ext-link xlink:href="https://en.wikipedia.org/wiki/Cohen's_kappa">https://en.wikipedia.org/wiki/Cohen's_kappa</ext-link></p></fn></bold>: A statistical measure of inter-rater agreement for categorical items, which takes into account the possibility of agreement occurring by chance. It is useful for evaluating the reliability of classifications. </p>
            </list-item>
            <list-item id="S6.I4.i9">
              <p id="S6.I4.i9.p1"><bold>Inception Score (IS)[<xref rid="ref163" ref-type="bibr">163</xref>]</bold>: A metric used to evaluate the performance of generative models, such as Generative Adversarial Networks (GANs), by assessing the quality and diversity of the generated images. It calculates the classification probabilities of the generated images using a pre-trained Inception network, and measures both how distinct and realistic the generated images are. Higher scores indicate better performance in terms of generating high-quality and diverse images.</p>
            </list-item>
            <list-item id="S6.I4.i10">
              <p id="S6.I4.i10.p1"><bold>Frechet Inception Distance (FID)[<xref rid="ref164" ref-type="bibr">164</xref>]</bold>: A metric for evaluating the quality of generated images by comparing the feature distributions of these images to those of real images. Lower FID scores indicate more realistic and diverse generated images.</p>
            </list-item>
            <list-item id="S6.I4.i11">
              <p id="S6.I4.i11.p1"><bold>Kernel Inception Distance (KID)[<xref rid="ref165" ref-type="bibr">165</xref>]</bold>: A more robust measure of image quality in generative models than FID, KID compares the similarity of feature distributions between generated and real images using a kernel method. It provides a more nuanced assessment by considering both the mean and covariance of the feature distributions, making it sensitive to both the style and content of the images. Lower KID scores suggest better image generation performance.</p>
            </list-item>
            <list-item id="S6.I4.i12">
              <p id="S6.I4.i12.p1"><bold>Structural Similarity Index (SSIM)[<xref rid="ref166" ref-type="bibr">166</xref>]</bold>: A metric for assessing the visual similarity between two images. It evaluates the similarity by comparing the luminance, contrast, and structure of the images. The SSIM index ranges from 0 to 1, with values closer to 1 indicating higher similarity. It is commonly used to measure the effectiveness of image processing techniques like enhancement, compression, and super-resolution.</p>
            </list-item>
            <list-item id="S6.I4.i13">
              <p id="S6.I4.i13.p1"><bold>BLEU-N[<xref rid="ref167" ref-type="bibr">167</xref>]</bold>: A metric used to evaluate the quality of machine-translated text. It measures the correspondence between a machine's translations and human translations by comparing n-gram overlaps. Higher BLEU-N scores indicate better translation accuracy and fluency. BLEU stands for Bilingual Evaluation Understudy.</p>
            </list-item>
            <list-item id="S6.I4.i14">
              <p id="S6.I4.i14.p1"><bold>ROUGE-1[<xref rid="ref168" ref-type="bibr">168</xref>]</bold>: A metric used to evaluate the quality of automatic summarization and machine translation. It focuses on the overlap of unigrams (single words) between a generated summary or translation and a set of reference summaries or translations. Higher ROUGE-1 scores indicate a better match between the generated text and the reference texts.</p>
            </list-item>
            <list-item id="S6.I4.i15">
              <p id="S6.I4.i15.p1"><bold>Pearson Correlation Coefficient (PCC)<xref ref-type="fn" rid="fn7">7</xref><fn id="fn7"><label><sup>7</sup></label><p id="footnote7"><ext-link xlink:href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient">https://en.wikipedia.org/wiki/Pearson_correlation_coefficient</ext-link></p></fn></bold>: a statistical measure that expresses the linear correlation between two variables. It ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no correlation. PCC is commonly used in finance and economics to assess the strength and direction of the relationship between variables.</p>
            </list-item>
            <list-item id="S6.I4.i16">
              <p id="S6.I4.i16.p1"><bold>Melcepstral distance[<xref rid="ref169" ref-type="bibr">169</xref>]</bold>: A measure used in audio processing to evaluate the similarity between two sound signals, often employed in speech recognition and audio analysis. It's calculated based on the Mel-cepstral coefficients derived from the Fourier transform of the audio. Lower melcepstral distances indicate more similar sounds.</p>
            </list-item>
          </list>
        </p>
        <p id="S6.SS2.p2">In summary, the availability of diverse and high-quality datasets, combined with robust evaluation metrics, is essential for advancing spatio-temporal EEG data analysis. These resources enable researchers to develop, compare, and refine models, ultimately leading to more accurate and insightful interpretations of brain activity.</p>
      </sec>
    </sec>
    <sec id="S7">
      <label>7.</label>
      <title>Concludes and Future Directions</title>
      <p id="S7.p1"><bold>Conclusion</bold>: In conclusion, this paper has reviewed the current advancements in EEG analysis, focusing on three key areas: representation learning, discriminative-based methods, and generative-based methods. These areas collectively enhance the precision, interpretability, and application scope of EEG signal analysis, addressing significant challenges and paving the way for future research.</p>
      <p>
        <list list-type="bullet" id="S7.I1">
          <list-item id="S7.I1.i1">
            <p id="S7.I1.i1.p1"><bold>Learning Useful Representation from EEG Signals</bold>: The first step in understanding EEG signals is representation learning, where we automatically extract important information. Self-supervised learning techniques are effective in this process, helping us create strong representations of EEG signals. These representations improve our ability to interpret the data accurately and handle large amounts of brain signal data efficiently.</p>
          </list-item>
          <list-item id="S7.I1.i2">
            <p id="S7.I1.i2.p1"><bold>Identifying Patterns in EEG Signals</bold>: Discriminative methods are crucial for recognizing different patterns or categories within EEG signals. Using advanced techniques like Graph Neural Networks (GNNs) and foundation models, we can gain deeper insights into brain activity by capturing these patterns effectively. Understanding these patterns is essential for deciphering complex neural processes.</p>
          </list-item>
          <list-item id="S7.I1.i3">
            <p id="S7.I1.i3.p1"><bold>Generating New Insights from EEG Signals</bold>: Generative methods focus on generating new types of data from EEG signals. Techniques like diffusion models allow us to create images or text based on EEG data, providing innovative ways to visualize and understand brain activity. These generative methods also have applications in generating AI-generated content.</p>
          </list-item>
        </list>
      </p>
      <p id="S7.p2"><bold>Future Directions</bold>: Looking ahead, several promising directions for future research in EEG signal analysis and understanding can be identified:</p>
      <p>
        <list list-type="bullet" id="S7.I2">
          <list-item id="S7.I2.i1">
            <p id="S7.I2.i1.p1"><bold>Enhanced Integration of Self-Supervised and Semi-Supervised Learning</bold>: Further exploration into the integration of self-supervised and semi-supervised learning techniques could yield even more robust and generalized representations. This will enable better handling of diverse and complex EEG data with minimal labeled data, driving improvements in accuracy and efficiency.</p>
          </list-item>
          <list-item id="S7.I2.i2">
            <p id="S7.I2.i2.p1"><bold>Development of Advanced Network Architectures</bold>: Continued innovation in network architectures, such as the refinement and combination of Mamba [<xref rid="ref092" ref-type="bibr">92</xref>, <xref rid="ref170" ref-type="bibr">170</xref>], KAN [<xref rid="ref171" ref-type="bibr">171</xref>], and MoE models [<xref rid="ref172" ref-type="bibr">172</xref>], is essential. These advancements should focus on improving training efficiency and inference speed, particularly for deployment on mobile and edge devices. Research into optimizing these architectures for real-time analysis and low-power consumption is also crucial.</p>
          </list-item>
          <list-item id="S7.I2.i3">
            <p id="S7.I2.i3.p1"><bold>Expansion of Multimodal Generative Techniques</bold>: Expanding the capabilities of multimodal generative techniques to include more diverse forms of data, such as tactile or olfactory signals, could open new avenues for EEG applications. Additionally, improving the quality and realism of generated outputs, whether they be images, text, or speech, will enhance their utility in practical scenarios, particularly for assisting individuals with disabilities.</p>
          </list-item>
          <list-item id="S7.I2.i4">
            <p id="S7.I2.i4.p1"><bold>Addressing Constrained Conditions in Brain Signals</bold>: Variable missing [<xref rid="ref173" ref-type="bibr">173</xref>], class-incremental [<xref rid="ref174" ref-type="bibr">174</xref>], and source-free domain adaptation [<xref rid="ref175" ref-type="bibr">175</xref>] are constrained conditions in brain signal analysis that present significant challenges but also offer important research opportunities. Addressing these issues can enhance the accuracy and stability of analyses, leading to broad impacts in practical applications.</p>
          </list-item>
          <list-item id="S7.I2.i5">
            <p id="S7.I2.i5.p1"><bold>Interdisciplinary Collaboration and Real-World Applications</bold>: Encouraging interdisciplinary collaboration between neuroscientists, computer scientists, and clinicians will be vital for translating these technological advancements into real-world applications. This includes the development of user-friendly interfaces and tools for clinical use, as well as ensuring the ethical and responsible deployment of these technologies.</p>
          </list-item>
          <list-item id="S7.I2.i6">
            <p id="S7.I2.i6.p1"><bold>Establishing a Unified Evaluation Benchmark</bold>: As the volume of EEG data, task variety, and computational capabilities increase, establishing a comprehensive and standardized evaluation system becomes crucial. Similar to the challenges observed in general time-series analysis [<xref rid="ref176" ref-type="bibr">176</xref>, <xref rid="ref177" ref-type="bibr">177</xref>], there is currently no unified benchmark or complete dataset for consistent comparisons in EEG-based methods. To address this, we advocate for the future development of unified evaluation metrics and standardized datasets, which would enable more comprehensive and fair comparisons between different methods, and help assess their practical utility more accurately.</p>
          </list-item>
        </list>
      </p>
      <p id="S7.p4">By focusing on these future directions, the field of EEG signal analysis can continue to advance, providing deeper insights into brain function and enabling more effective applications in both clinical and non-clinical settings. Additionally, the integration of information fusion strategies—whether through multi-modal data alignment, cross-source learning, or collaborative inference—is expected to become a foundational pillar in next-generation EEG systems. Leveraging the synergy of different signals and models not only improves accuracy but also enhances the robustness and interpretability of brain-computer interfaces and clinical decision-making systems. Incorporating information fusion principles is thus pivotal for advancing both the scientific understanding and practical deployment of EEG technologies.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgments</title>
      <p id="ack.p1">This work was supported in part by the National Natural Science Foundation of China under Grant 62136002 and Grant 62477014; in part by the Ministry of Education Research Joint Fund Project under Grant 8091B042239; in part by the Shanghai Trusted Industry Internet Software Collaborative Innovation Center.</p>
    </ack>
    <sec id="sec0100" sec-type="COI-statement">
      <title>Conflict of interest</title>
      <p>The authors declare no conflicts of interest.</p>
    </sec>
    <ref-list>
      <title>References</title>
      <ref id="ref001">
        <label>[1]</label>
        <mixed-citation> David, O., Blauwblomme, T., Job, A. S., Chabardès, S., Hoffmann, D., Minotti, L., &amp; Kahane, P. (2011). Imaging the seizure onset zone with stereo-electroencephalography. <italic>Brain, 134</italic>(10), 2898–2911. [<uri>https://doi.org/10.1093/brain/awr238</uri>] </mixed-citation>
      </ref>
      <ref id="ref002">
        <label>[2]</label>
        <mixed-citation> Cai, D., Chen, J., Yang, Y., Liu, T., &amp; Li, Y. (2023, August). MBrain: A Multi-channel Self-Supervised Learning Framework for Brain Signals. In <italic>Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</italic> (pp. 130–141). [<uri>https://doi.org/10.1145/3580305.3599426</uri>] </mixed-citation>
      </ref>
      <ref id="ref003">
        <label>[3]</label>
        <mixed-citation> Craik, A., He, Y., &amp; Contreras-Vidal, J. L. (2019). Deep learning for electroencephalogram (EEG) classification tasks: a review. <italic>Journal of neural engineering, 16</italic>(3), 031001. </mixed-citation>
      </ref>
      <ref id="ref004">
        <label>[4]</label>
        <mixed-citation> Hosseini, M. P., Hosseini, A., &amp; Ahi, K. (2020). A review on machine learning for EEG signal processing in bioengineering. <italic>IEEE reviews in biomedical engineering, 14</italic>, 204–218. [<uri>https://doi.org/10.1109/RBME.2020.2969915</uri>] </mixed-citation>
      </ref>
      <ref id="ref005">
        <label>[5]</label>
        <mixed-citation> Bishop, C. M., &amp; Nasrabadi, N. M. (2006). <italic>Pattern recognition and machine learning</italic> (Vol. 4, No. 4, p. 738). New York: springer. </mixed-citation>
      </ref>
      <ref id="ref006">
        <label>[6]</label>
        <mixed-citation> Jiang, X., Bian, G. B., &amp; Tian, Z. (2019). Removal of artifacts from EEG signals: a review. <italic>Sensors, 19</italic>(5), 987. [<uri>https://doi.org/10.3390/s19050987</uri>] </mixed-citation>
      </ref>
      <ref id="ref007">
        <label>[7]</label>
        <mixed-citation> Zhang, X., Yao, L., Wang, X., Monaghan, J., Mcalpine, D., &amp; Zhang, Y. (2019). A survey on deep learning based brain computer interface: Recent advances and new frontiers. <italic>arXiv preprint arXiv:1905.04149</italic>, 66. </mixed-citation>
      </ref>
      <ref id="ref008">
        <label>[8]</label>
        <mixed-citation> Zhang, K., Wen, Q., Zhang, C., Cai, R., Jin, M., Liu, Y., … &amp; Pan, S. (2024). Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. <italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>. [<uri>https://doi.org/10.1109/TPAMI.2024.3387317</uri>] </mixed-citation>
      </ref>
      <ref id="ref009">
        <label>[9]</label>
        <mixed-citation> Jin, M., Koh, H. Y., Wen, Q., Zambon, D., Alippi, C., Webb, G. I., … &amp; Pan, S. (2024). A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. <italic>IEEE Transactions on Pattern Analysis and Machine Intelligence</italic>. [<uri>https://doi.org/10.1109/TPAMI.2024.3443141</uri>] </mixed-citation>
      </ref>
      <ref id="ref010">
        <label>[10]</label>
        <mixed-citation> Liang, Y., Wen, H., Nie, Y., Jiang, Y., Jin, M., Song, D., … &amp; Wen, Q. (2024, August). Foundation models for time series analysis: A tutorial and survey. In <italic>Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</italic> (pp. 6555-6565). [<uri>https://doi.org/10.1145/3637528.3671451</uri>] </mixed-citation>
      </ref>
      <ref id="ref011">
        <label>[11]</label>
        <mixed-citation> Jiang, W. B., Zhao, L. M., &amp; Lu, B. L. (2024). Large brain model for learning generic representations with tremendous EEG data in BCI. <italic>arXiv preprint arXiv:2405.18765</italic>. [<uri>https://doi.org/10.48550/arXiv.2405.18765</uri>] </mixed-citation>
      </ref>
      <ref id="ref012">
        <label>[12]</label>
        <mixed-citation> Zhang, X., Chowdhury, R. R., Gupta, R. K., &amp; Shang, J. (2024). Large language models for time series: A survey. <italic>arXiv preprint arXiv:2402.01801</italic>. [<uri>https://doi.org/10.48550/arXiv.2402.01801</uri>] </mixed-citation>
      </ref>
      <ref id="ref013">
        <label>[13]</label>
        <mixed-citation> Jin, M., Wen, Q., Liang, Y., Zhang, C., Xue, S., Wang, X., … &amp; Xiong, H. (2023). Large models for time series and spatio-temporal data: A survey and outlook. <italic>arXiv preprint arXiv:2310.10196</italic>. [<uri>https://doi.org/10.48550/arXiv.2310.10196</uri>] </mixed-citation>
      </ref>
      <ref id="ref014">
        <label>[14]</label>
        <mixed-citation> Yang, Y., Jin, M., Wen, H., Zhang, C., Liang, Y., Ma, L., … &amp; Wen, Q. (2024). A survey on diffusion models for time series and spatio-temporal data. <italic>arXiv preprint arXiv:2404.18886</italic>. [<uri>https://doi.org/10.48550/arXiv.2404.18886</uri>] </mixed-citation>
      </ref>
      <ref id="ref015">
        <label>[15]</label>
        <mixed-citation> Zhang, Z., Sun, Y., Wang, Z., Nie, Y., Ma, X., Sun, P., &amp; Li, R. (2024). Large language models for mobility in transportation systems: A survey on forecasting tasks. <italic>arXiv preprint arXiv:2405.02357</italic>. [<uri>https://doi.org/10.48550/arXiv.2405.02357</uri>] </mixed-citation>
      </ref>
      <ref id="ref016">
        <label>[16]</label>
        <mixed-citation> Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., &amp; Sun, L. (2022). Transformers in time series: A survey. <italic>arXiv preprint arXiv:2202.07125</italic>. [<uri>https://doi.org/10.48550/arXiv.2202.07125</uri>] </mixed-citation>
      </ref>
      <ref id="ref017">
        <label>[17]</label>
        <mixed-citation> Liu, K., Xiao, A., Zhang, X., Lu, S., &amp; Shao, L. (2023). Fac: 3d representation learning via foreground aware feature contrast. In <italic>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</italic> (pp. 9476-9485). </mixed-citation>
      </ref>
      <ref id="ref018">
        <label>[18]</label>
        <mixed-citation> Gao, T., Yao, X., &amp; Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. <italic>arXiv preprint arXiv:2104.08821</italic>. [<uri>https://doi.org/10.48550/arXiv.2104.08821</uri>] </mixed-citation>
      </ref>
      <ref id="ref019">
        <label>[19]</label>
        <mixed-citation> Mohsenvand, M. N., Izadi, M. R., &amp; Maes, P. (2020, November). Contrastive representation learning for electroencephalogram classification. In <italic>Machine Learning for Health</italic> (pp. 238-253). PMLR. </mixed-citation>
      </ref>
      <ref id="ref020">
        <label>[20]</label>
        <mixed-citation> Chen, T., Kornblith, S., Norouzi, M., &amp; Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In <italic>International conference on machine learning</italic> (pp. 1597-1607). PMLR. </mixed-citation>
      </ref>
      <ref id="ref021">
        <label>[21]</label>
        <mixed-citation> Eldele, E., Ragab, M., Chen, Z., Wu, M., Kwoh, C. K., Li, X., &amp; Guan, C. (2021). Time-series representation learning via temporal and contextual contrasting. <italic>arXiv preprint arXiv:2106.14112</italic>. [<uri>https://doi.org/10.48550/arXiv.2106.14112</uri>] </mixed-citation>
      </ref>
      <ref id="ref022">
        <label>[22]</label>
        <mixed-citation> Jiang, X., Zhao, J., Du, B., &amp; Yuan, Z. (2021, July). Self-supervised contrastive learning for EEG-based sleep staging. In <italic>2021 International Joint Conference on Neural Networks (IJCNN)</italic> (pp. 1-8). IEEE. [<uri>https://doi.org/10.1109/IJCNN52387.2021.9533305</uri>] </mixed-citation>
      </ref>
      <ref id="ref023">
        <label>[23]</label>
        <mixed-citation> Kumar, V., Reddy, L., Kumar Sharma, S., Dadi, K., Yarra, C., Bapi, R. S., &amp; Rajendran, S. (2022, September). mulEEG: a multi-view representation learning on EEG signals. In <italic>International Conference on Medical Image Computing and Computer-Assisted Intervention</italic> (pp. 398-407). Cham: Springer Nature Switzerland. </mixed-citation>
      </ref>
      <ref id="ref024">
        <label>[24]</label>
        <mixed-citation> Chuang, C. Y., Robinson, J., Lin, Y. C., Torralba, A., &amp; Jegelka, S. (2020). Debiased contrastive learning. <italic>Advances in neural information processing systems, 33</italic>, 8765-8775. </mixed-citation>
      </ref>
      <ref id="ref025">
        <label>[25]</label>
        <mixed-citation> Robinson, J., Chuang, C. Y., Sra, S., &amp; Jegelka, S. (2020). Contrastive learning with hard negative samples. <italic>arXiv preprint arXiv:2010.04592</italic>. [<uri>https://doi.org/10.48550/arXiv.2010.04592</uri>] </mixed-citation>
      </ref>
      <ref id="ref026">
        <label>[26]</label>
        <mixed-citation> Yang, C., Xiao, C., Westover, M. B., &amp; Sun, J. (2023). Self-supervised electroencephalogram representation learning for automatic sleep staging: model development and evaluation study. <italic>JMIR AI</italic>, 2(1), e46769. [<uri>https://doi.org/10.2196/46769</uri>] </mixed-citation>
      </ref>
      <ref id="ref027">
        <label>[27]</label>
        <mixed-citation> Wang, Y., Han, Y., Wang, H., &amp; Zhang, X. (2024). Contrast everything: A hierarchical contrastive framework for medical time-series. <italic>Advances in Neural Information Processing Systems, 36</italic>. </mixed-citation>
      </ref>
      <ref id="ref028">
        <label>[28]</label>
        <mixed-citation> Zhang, H., Wang, J., Xiao, Q., Deng, J., &amp; Lin, Y. (2021). Sleeppriorcl: Contrastive representation learning with prior knowledge-based positive mining and adaptive temperature for sleep staging. <italic>arXiv preprint arXiv:2110.09966</italic>. [<uri>https://doi.org/10.48550/arXiv.2110.09966</uri>] </mixed-citation>
      </ref>
      <ref id="ref029">
        <label>[29]</label>
        <mixed-citation> Weng, W., Gu, Y., Zhang, Q., Huang, Y., Miao, C., &amp; Chen, Y. (2023). A Knowledge-Driven Cross-view Contrastive Learning for EEG Representation. <italic>arXiv preprint arXiv:2310.03747</italic>. [<uri>https://doi.org/10.48550/arXiv.2310.03747</uri>] </mixed-citation>
      </ref>
      <ref id="ref030">
        <label>[30]</label>
        <mixed-citation> Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. <italic>arXiv preprint arXiv:1810.04805</italic>. [<uri>https://doi.org/10.48550/arXiv.1810.04805</uri>] </mixed-citation>
      </ref>
      <ref id="ref031">
        <label>[31]</label>
        <mixed-citation> Kostas, D., Aroca-Ouellette, S., &amp; Rudzicz, F. (2021). BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. <italic>Frontiers in Human Neuroscience, 15</italic>, 653659. [<uri>https://doi.org/10.3389/fnhum.2021.653659</uri>] </mixed-citation>
      </ref>
      <ref id="ref032">
        <label>[32]</label>
        <mixed-citation> Baevski, A., Zhou, Y., Mohamed, A., &amp; Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. <italic>Advances in neural information processing systems, 33</italic>, 12449-12460. </mixed-citation>
      </ref>
      <ref id="ref033">
        <label>[33]</label>
        <mixed-citation> Vaswani, A. (2017). Attention is all you need. <italic>Advances in Neural Information Processing Systems</italic>. </mixed-citation>
      </ref>
      <ref id="ref034">
        <label>[34]</label>
        <mixed-citation> Chien, H. Y. S., Goh, H., Sandino, C. M., &amp; Cheng, J. Y. (2022). Maeeg: Masked auto-encoder for eeg representation learning. <italic>arXiv preprint arXiv:2211.02625</italic>. [<uri>https://doi.org/10.48550/arXiv.2211.02625</uri>] </mixed-citation>
      </ref>
      <ref id="ref035">
        <label>[35]</label>
        <mixed-citation> Peng, R., Zhao, C., Xu, Y., Jiang, J., Kuang, G., Shao, J., &amp; Wu, D. (2023, June). Wavelet2vec: a filter bank masked autoencoder for EEG-based seizure subtype classification. In <italic>ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</italic> (pp. 1-5). IEEE. [<uri>https://doi.org/10.1109/ICASSP49357.2023.10097183</uri>] </mixed-citation>
      </ref>
      <ref id="ref036">
        <label>[36]</label>
        <mixed-citation> Dosovitskiy, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. <italic>arXiv preprint arXiv:2010.11929</italic>. [<uri>https://doi.org/10.48550/arXiv.2010.11929</uri>] </mixed-citation>
      </ref>
      <ref id="ref037">
        <label>[37]</label>
        <mixed-citation> Obeid, I., &amp; Picone, J. (2016). The temple university hospital EEG data corpus. <italic>Frontiers in neuroscience, 10</italic>, 196. [<uri>https://doi.org/10.3389/fnins.2016.00196</uri>] </mixed-citation>
      </ref>
      <ref id="ref038">
        <label>[38]</label>
        <mixed-citation> Zheng, W. L., Zhu, J. Y., &amp; Lu, B. L. (2017). Identifying stable patterns over time for emotion recognition from EEG. <italic>IEEE transactions on affective computing, 10</italic>(3), 417-429. [<uri>https://doi.org/10.1109/TAFFC.2017.2712143</uri>] </mixed-citation>
      </ref>
      <ref id="ref039">
        <label>[39]</label>
        <mixed-citation> Kemp, B., Zwinderman, A. H., Tuk, B., Kamphuisen, H. A., &amp; Oberye, J. J. (2000). Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. <italic>IEEE Transactions on Biomedical Engineering, 47</italic>(9), 1185-1194. [<uri>https://doi.org/10.1109/10.867928</uri>] </mixed-citation>
      </ref>
      <ref id="ref040">
        <label>[40]</label>
        <mixed-citation> Khalighi, S., Sousa, T., Santos, J. M., &amp; Nunes, U. (2016). ISRUC-Sleep: A comprehensive public dataset for sleep researchers. <italic>Computer methods and programs in biomedicine, 124</italic>, 180-192. [<uri>https://doi.org/10.1016/j.cmpb.2015.10.013</uri>] </mixed-citation>
      </ref>
      <ref id="ref041">
        <label>[41]</label>
        <mixed-citation> Anguita, D., Ghio, A., Oneto, L., Parra, X., &amp; Reyes-Ortiz, J. L. (2013, April). A public domain dataset for human activity recognition using smartphones. In <italic>Esann</italic> (Vol. 3, p. 3). [<uri>https://doi.org/10.1016/j.cmpb.2015.10.013</uri>] </mixed-citation>
      </ref>
      <ref id="ref042">
        <label>[42]</label>
        <mixed-citation> Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., &amp; Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. <italic>Physical Review E, 64</italic>(6), 061907. [<uri>https://doi.org/10.1103/PhysRevE.64.061907</uri>] </mixed-citation>
      </ref>
      <ref id="ref043">
        <label>[43]</label>
        <mixed-citation> Lessmeier, C., Kimotho, J. K., Zimmer, D., &amp; Sextro, W. (2016, July). Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In <italic>PHM Society European Conference</italic> (Vol. 3, No. 1). [<uri>https://doi.org/10.36001/phme.2016.v3i1.1577</uri>] </mixed-citation>
      </ref>
      <ref id="ref044">
        <label>[44]</label>
        <mixed-citation> Guillot, A., Sauvet, F., During, E. H., &amp; Thorey, V. (2020). Dreem open datasets: Multi-scored sleep datasets to compare human and automated sleep staging. <italic>IEEE transactions on neural systems and rehabilitation engineering, 28</italic>(9), 1955-1965. [<uri>https://doi.org/10.1109/TNSRE.2020.3011181</uri>] </mixed-citation>
      </ref>
      <ref id="ref045">
        <label>[45]</label>
        <mixed-citation> Zhang, G. Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., … &amp; Redline, S. (2018). The National Sleep Research Resource: towards a sleep data commons. <italic>Journal of the American Medical Informatics Association, 25</italic>(10), 1351-1358. [<uri>https://doi.org/10.1093/jamia/ocy064</uri>] </mixed-citation>
      </ref>
      <ref id="ref046">
        <label>[46]</label>
        <mixed-citation> Biswal, S., Sun, H., Goparaju, B., Westover, M. B., Sun, J., &amp; Bianchi, M. T. (2018). Expert-level sleep scoring with deep neural networks. <italic>Journal of the American Medical Informatics Association, 25</italic>(12), 1643-1650. [<uri>https://doi.org/10.1093/jamia/ocy131</uri>] </mixed-citation>
      </ref>
      <ref id="ref047">
        <label>[47]</label>
        <mixed-citation> Escudero, J., Abásolo, D., Hornero, R., Espino, P., &amp; López, M. (2006). Analysis of electroencephalograms in Alzheimer's disease patients with multiscale entropy. <italic>Physiological measurement, 27</italic>(11), 1091. </mixed-citation>
      </ref>
      <ref id="ref048">
        <label>[48]</label>
        <mixed-citation> Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., … &amp; Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. <italic>circulation, 101</italic>(23), e215-e220. [<uri>https://doi.org/10.1161/01.CIR.101.23.e215</uri>] </mixed-citation>
      </ref>
      <ref id="ref049">
        <label>[49]</label>
        <mixed-citation> Van Dijk, H., Van Wingen, G., Denys, D., Olbrich, S., Van Ruth, R., &amp; Arns, M. (2022). The two decades brainclinics research archive for insights in neurophysiology (TDBRAIN) database. <italic>Scientific data, 9</italic>(1), 333. </mixed-citation>
      </ref>
      <ref id="ref050">
        <label>[50]</label>
        <mixed-citation> O'reilly, C., Gosselin, N., Carrier, J., &amp; Nielsen, T. (2014). Montreal Archive of Sleep Studies: an open-access resource for instrument benchmarking and exploratory research. <italic>Journal of sleep research, 23</italic>(6), 628-635. [<uri>https://doi.org/10.1111/jsr.12169</uri>] </mixed-citation>
      </ref>
      <ref id="ref051">
        <label>[51]</label>
        <mixed-citation> Schalk, G., McFarland, D. J., Hinterberger, T., Birbaumer, N., &amp; Wolpaw, J. R. (2004). BCI2000: a general-purpose brain-computer interface (BCI) system. <italic>IEEE Transactions on biomedical engineering, 51</italic>(6), 1034-1043. [<uri>https://doi.org/10.1109/TBME.2004.827072</uri>] </mixed-citation>
      </ref>
      <ref id="ref052">
        <label>[52]</label>
        <mixed-citation> Shoeb, A. H. (2009). <italic>Application of machine learning to epileptic seizure onset detection and treatment</italic> (Doctoral dissertation, Massachusetts Institute of Technology). </mixed-citation>
      </ref>
      <ref id="ref053">
        <label>[53]</label>
        <mixed-citation> Tangermann, M., Müller, K. R., Aertsen, A., Birbaumer, N., Braun, C., Brunner, C., … &amp; Blankertz, B. (2012). Review of the BCI competition IV. <italic>Frontiers in neuroscience, 6</italic>, 55. [<uri>https://doi.org/10.3389/fnins.2012.00055</uri>] </mixed-citation>
      </ref>
      <ref id="ref054">
        <label>[54]</label>
        <mixed-citation> Margaux, P., Emmanuel, M., Sébastien, D., Olivier, B., &amp; Jérémie, M. (2012). Objective and Subjective Evaluation of Online Error Correction during P300-Based Spelling. <italic>Advances in Human-Computer Interaction, 2012</italic>(1), 578295. [<uri>https://doi.org/10.1155/2012/578295</uri>] </mixed-citation>
      </ref>
      <ref id="ref055">
        <label>[55]</label>
        <mixed-citation> Peng, R., Zhao, C., Jiang, J., Kuang, G., Cui, Y., Xu, Y., … &amp; Wu, D. (2022). TIE-EEGNet: Temporal information enhanced EEGNet for seizure subtype classification. <italic>IEEE Transactions on Neural Systems and Rehabilitation Engineering, 30</italic>, 2567-2576. [<uri>https://doi.org/10.1109/TNSRE.2022.3204540</uri>] </mixed-citation>
      </ref>
      <ref id="ref056">
        <label>[56]</label>
        <mixed-citation> Loshchilov, I. (2017). Decoupled weight decay regularization. <italic>arXiv preprint arXiv:1711.05101</italic>. [<uri>https://doi.org/10.48550/arXiv.1711.05101</uri>] </mixed-citation>
      </ref>
      <ref id="ref057">
        <label>[57]</label>
        <mixed-citation> Park, H. J., &amp; Friston, K. (2013). Structural and functional brain networks: from connections to cognition. <italic>Science, 342</italic>(6158), 1238411. [<uri>https://doi.org/10.1126/science.1238411</uri>] </mixed-citation>
      </ref>
      <ref id="ref058">
        <label>[58]</label>
        <mixed-citation> Jia, Z., Lin, Y., Wang, J., Zhou, R., Ning, X., He, Y., &amp; Zhao, Y. (2020, July). GraphSleepNet: Adaptive spatial-temporal graph convolutional networks for sleep stage classification. In <italic>Ijcai</italic> (Vol. 2021, pp. 1324-1330). </mixed-citation>
      </ref>
      <ref id="ref059">
        <label>[59]</label>
        <mixed-citation> Defferrard, M., Bresson, X., &amp; Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. <italic>Advances in neural information processing systems, 29</italic>. </mixed-citation>
      </ref>
      <ref id="ref060">
        <label>[60]</label>
        <mixed-citation> Wang, Y., Xu, Y., Yang, J., Wu, M., Li, X., Xie, L., &amp; Chen, Z. (2024, March). Graph-Aware Contrasting for Multivariate Time-Series Classification. In <italic>Proceedings of the AAAI Conference on Artificial Intelligence</italic> (Vol. 38, No. 14, pp. 15725-15734). [<uri>https://doi.org/10.1609/aaai.v38i14.29501</uri>] </mixed-citation>
      </ref>
      <ref id="ref061">
        <label>[61]</label>
        <mixed-citation> Cai, W., Liang, Y., Liu, X., Feng, J., &amp; Wu, Y. (2024, March). Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. In <italic>Proceedings of the AAAI Conference on Artificial Intelligence</italic> (Vol. 38, No. 10, pp. 11141-11149). [<uri>https://doi.org/10.1609/aaai.v38i10.28991</uri>] </mixed-citation>
      </ref>
      <ref id="ref062">
        <label>[62]</label>
        <mixed-citation> Deng, A., &amp; Hooi, B. (2021, May). Graph neural network-based anomaly detection in multivariate time series. In <italic>Proceedings of the AAAI conference on artificial intelligence</italic> (Vol. 35, No. 5, pp. 4027-4035). [<uri>https://doi.org/10.1609/aaai.v35i5.16523</uri>] </mixed-citation>
      </ref>
      <ref id="ref063">
        <label>[63]</label>
        <mixed-citation> Salvador, R., Suckling, J., Coleman, M. R., Pickard, J. D., Menon, D., &amp; Bullmore, E. D. (2005). Neurophysiological architecture of functional magnetic resonance images of human brain. <italic>Cerebral cortex, 15</italic>(9), 1332-1342. [<uri>https://doi.org/10.1093/cercor/bhi016</uri>] </mixed-citation>
      </ref>
      <ref id="ref064">
        <label>[64]</label>
        <mixed-citation> Pearson, K., &amp; Lee, A. (1903). On the laws of inheritance in man: I. Inheritance of physical characters. <italic>Biometrika, 2</italic>(4), 357-462. [<uri>https://doi.org/10.2307/2331507</uri>] </mixed-citation>
      </ref>
      <ref id="ref065">
        <label>[65]</label>
        <mixed-citation> Danon, L., Diaz-Guilera, A., Duch, J., &amp; Arenas, A. (2005). Comparing community structure identification. <italic>Journal of statistical mechanics: Theory and experiment, 2005</italic>(09), P09008. </mixed-citation>
      </ref>
      <ref id="ref066">
        <label>[66]</label>
        <mixed-citation> Aydore, S., Pantazis, D., &amp; Leahy, R. M. (2013). A note on the phase locking value and its properties. <italic>Neuroimage, 74</italic>, 231-244. [<uri>https://doi.org/10.1016/j.neuroimage.2013.02.008</uri>] </mixed-citation>
      </ref>
      <ref id="ref067">
        <label>[67]</label>
        <mixed-citation> Tang, S., Dunnmon, J. A., Saab, K., Zhang, X., Huang, Q., Dubost, F., … &amp; Lee-Messer, C. (2021). Self-supervised graph neural networks for improved electroencephalographic seizure analysis. <italic>arXiv preprint arXiv:2104.08336</italic>. [<uri>https://doi.org/10.48550/arXiv.2104.08336</uri>] </mixed-citation>
      </ref>
      <ref id="ref068">
        <label>[68]</label>
        <mixed-citation> Ho, T. K. K., &amp; Armanfard, N. (2023, June). Self-supervised learning for anomalous channel detection in EEG graphs: Application to seizure analysis. In <italic>Proceedings of the AAAI conference on artificial intelligence</italic> (Vol. 37, No. 7, pp. 7866-7874). </mixed-citation>
      </ref>
      <ref id="ref069">
        <label>[69]</label>
        <mixed-citation> Jia, Z., Lin, Y., Wang, J., Ning, X., He, Y., Zhou, R., … &amp; Li-wei, H. L. (2021). Multi-view spatial-temporal graph convolutional networks with domain generalization for sleep stage classification. <italic>IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29</italic>, 1977-1986. [<uri>https://doi.org/10.1109/TNSRE.2021.3110665</uri>] </mixed-citation>
      </ref>
      <ref id="ref070">
        <label>[70]</label>
        <mixed-citation> Li, R., Wang, Y., &amp; Lu, B. L. (2021, October). A multi-domain adaptive graph convolutional network for EEG-based emotion recognition. In <italic>Proceedings of the 29th ACM International Conference on Multimedia</italic> (pp. 5565-5573). [<uri>https://doi.org/10.1145/3474085.3475697</uri>] </mixed-citation>
      </ref>
      <ref id="ref071">
        <label>[71]</label>
        <mixed-citation> Wang, J., Ning, X., Shi, W., &amp; Lin, Y. (2023, April). A Bayesian Graph Neural Network for EEG Classification—A Win-Win on Performance and Interpretability. In <italic>2023 IEEE 39th International Conference on Data Engineering (ICDE)</italic> (pp. 2126-2139). IEEE. [<uri>https://doi.org/10.1109/ICDE55515.2023.00165</uri>] </mixed-citation>
      </ref>
      <ref id="ref072">
        <label>[72]</label>
        <mixed-citation> Jia, Z., Lin, Y., Wang, J., Feng, Z., Xie, X., &amp; Chen, C. (2021, October). HetEmotionNet: two-stream heterogeneous graph recurrent neural network for multi-modal emotion recognition. In <italic>Proceedings of the 29th ACM International Conference on Multimedia</italic> (pp. 1047-1056). [<uri>https://doi.org/10.1145/3474085.3475583</uri>] </mixed-citation>
      </ref>
      <ref id="ref073">
        <label>[73]</label>
        <mixed-citation> Chen, J., Yang, Y., Yu, T., Fan, Y., Mo, X., &amp; Yang, C. (2022, August). Brainnet: Epileptic wave detection from seeg with hierarchical graph diffusion learning. In <italic>Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</italic> (pp. 2741-2751). [<uri>https://doi.org/10.1145/3534678.3539178</uri>] </mixed-citation>
      </ref>
      <ref id="ref074">
        <label>[74]</label>
        <mixed-citation> Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., … &amp; Patras, I. (2011). Deap: A database for emotion analysis; using physiological signals. <italic>IEEE transactions on affective computing, 3</italic>(1), 18-31. [<uri>https://doi.org/10.1109/T-AFFC.2011.15</uri>] </mixed-citation>
      </ref>
      <ref id="ref075">
        <label>[75]</label>
        <mixed-citation> Soleymani, M., Lichtenauer, J., Pun, T., &amp; Pantic, M. (2011). A multimodal database for affect recognition and implicit tagging. <italic>IEEE transactions on affective computing, 3</italic>(1), 42-55. [<uri>https://doi.org/10.1109/T-AFFC.2011.25</uri>] </mixed-citation>
      </ref>
      <ref id="ref076">
        <label>[76]</label>
        <mixed-citation> Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., … &amp; Liang, P. (2021). On the opportunities and risks of foundation models. <italic>arXiv preprint arXiv:2108.07258</italic>. [<uri>https://doi.org/10.48550/arXiv.2108.07258</uri>] </mixed-citation>
      </ref>
      <ref id="ref077">
        <label>[77]</label>
        <mixed-citation> Brown, T. B. (2020). Language models are few-shot learners. <italic>arXiv preprint arXiv:2005.14165</italic>. </mixed-citation>
      </ref>
      <ref id="ref078">
        <label>[78]</label>
        <mixed-citation> Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … &amp; Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In <italic>International conference on machine learning</italic> (pp. 8748-8763). PMLR. </mixed-citation>
      </ref>
      <ref id="ref079">
        <label>[79]</label>
        <mixed-citation> Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., … &amp; Girshick, R. (2023). Segment anything. In <italic>Proceedings of the IEEE/CVF International Conference on Computer Vision</italic> (pp. 4015-4026). </mixed-citation>
      </ref>
      <ref id="ref080">
        <label>[80]</label>
        <mixed-citation> Wagh, N., &amp; Varatharajah, Y. (2020, November). Eeg-gcnn: Augmenting electroencephalogram-based neurological disease diagnosis using a domain-guided graph convolutional neural network. In <italic>Machine Learning for Health</italic> (pp. 367-378). PMLR. </mixed-citation>
      </ref>
      <ref id="ref081">
        <label>[81]</label>
        <mixed-citation> Zhang, D., Yuan, Z., Yang, Y., Chen, J., Wang, J., &amp; Li, Y. (2024). Brant: Foundation model for intracranial neural signal. <italic>Advances in Neural Information Processing Systems, 36</italic>. </mixed-citation>
      </ref>
      <ref id="ref082">
        <label>[82]</label>
        <mixed-citation> Cui, W., Jeong, W., Thölke, P., Medani, T., Jerbi, K., Joshi, A. A., &amp; Leahy, R. M. (2024, May). Neuro-GPT: Towards a foundation model for EEG. In <italic>2024 IEEE International Symposium on Biomedical Imaging (ISBI)</italic> (pp. 1-5). IEEE. [<uri>https://doi.org/10.1109/ISBI56570.2024.10635453</uri>] </mixed-citation>
      </ref>
      <ref id="ref083">
        <label>[83]</label>
        <mixed-citation> Salar, A., Elachqar, O., Miller, A. C., Emrani, S., Nallasamy, U., &amp; Shapiro, I. (2023). Large-scale training of foundation models for wearable biosignals. <italic>arXiv preprint arXiv:2312.05409</italic>. [<uri>https://doi.org/10.48550/arXiv.2312.05409</uri>] </mixed-citation>
      </ref>
      <ref id="ref084">
        <label>[84]</label>
        <mixed-citation> Zhang, D., Yuan, Z., Chen, J., Chen, K., &amp; Yang, Y. (2024, August). Brant-X: A Unified Physiological Signal Alignment Framework. In <italic>Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</italic> (pp. 4155-4166). [<uri>https://doi.org/10.1145/3637528.3671953</uri>] </mixed-citation>
      </ref>
      <ref id="ref085">
        <label>[85]</label>
        <mixed-citation> Yuan, Z., Zhang, D., Chen, J., Gu, G., &amp; Yang, Y. (2024). Brant-2: Foundation Model for Brain Signals. <italic>arXiv preprint arXiv:2402.10251</italic>. [<uri>https://doi.org/10.48550/arXiv.2402.10251</uri>] </mixed-citation>
      </ref>
      <ref id="ref086">
        <label>[86]</label>
        <mixed-citation> Chen, Y., Ren, K., Song, K., Wang, Y., Wang, Y., Li, D., &amp; Qiu, L. (2024). EEGFormer: Towards transferable and interpretable large-scale EEG foundation model. <italic>arXiv preprint arXiv:2401.10278</italic>. [<uri>https://doi.org/10.48550/arXiv.2401.10278</uri>] </mixed-citation>
      </ref>
      <ref id="ref087">
        <label>[87]</label>
        <mixed-citation> Wang, C., Subramaniam, V., Yaari, A. U., Kreiman, G., Katz, B., Cases, I., &amp; Barbu, A. (2023). BrainBERT: Self-supervised representation learning for intracranial recordings. <italic>arXiv preprint arXiv:2302.14367</italic>. [<uri>https://doi.org/10.48550/arXiv.2302.14367</uri>] </mixed-citation>
      </ref>
      <ref id="ref088">
        <label>[88]</label>
        <mixed-citation> Apple Heart &amp; Movement Study – Study site for information and progress updates for AH&amp;MS. Retrieved from <ext-link xlink:href="https://appleheartandmovementstudy.bwh.harvard.edu/">https://appleheartandmovementstudy.bwh.harvard.edu/</ext-link> </mixed-citation>
      </ref>
      <ref id="ref089">
        <label>[89]</label>
        <mixed-citation> LeCun, Y., Bottou, L., Bengio, Y., &amp; Haffner, P. (1998). Gradient-based learning applied to document recognition. <italic>Proceedings of the IEEE, 86</italic>(11), 2278-2324. </mixed-citation>
      </ref>
      <ref id="ref090">
        <label>[90]</label>
        <mixed-citation> Zaremba, W. (2014). Recurrent neural network regularization. <italic>arXiv preprint arXiv:1409.2329</italic>. [<uri>https://doi.org/10.48550/arXiv.1409.2329</uri>] </mixed-citation>
      </ref>
      <ref id="ref091">
        <label>[91]</label>
        <mixed-citation> Vaswani, A. (2017). Attention is all you need. <italic>Advances in Neural Information Processing Systems</italic>. </mixed-citation>
      </ref>
      <ref id="ref092">
        <label>[92]</label>
        <mixed-citation> Gu, A., &amp; Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. <italic>arXiv preprint arXiv:2312.00752</italic>. [<uri>https://doi.org/10.48550/arXiv.2312.00752</uri>] </mixed-citation>
      </ref>
      <ref id="ref093">
        <label>[93]</label>
        <mixed-citation> Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., … &amp; Lample, G. (2023). Llama: Open and efficient foundation language models. <italic>arXiv preprint arXiv:2302.13971</italic>. [<uri>https://doi.org/10.48550/arXiv.2302.13971</uri>] </mixed-citation>
      </ref>
      <ref id="ref094">
        <label>[94]</label>
        <mixed-citation> Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., … &amp; Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. <italic>arXiv preprint arXiv:2307.09288</italic>. [<uri>https://doi.org/10.48550/arXiv.2307.09288</uri>] </mixed-citation>
      </ref>
      <ref id="ref095">
        <label>[95]</label>
        <mixed-citation> Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., … &amp; McGrew, B. (2023). Gpt-4 technical report. <italic>arXiv preprint arXiv:2303.08774</italic>. [<uri>https://doi.org/10.48550/arXiv.2303.08774</uri>] </mixed-citation>
      </ref>
      <ref id="ref096">
        <label>[96]</label>
        <mixed-citation> Iapascurta, V., &amp; Fiodorov, I. (2023, September). NLP Tools for Epileptic Seizure Prediction Using EEG Data: A Comparative Study of Three ML Models. In <italic>International Conference on Nanotechnologies and Biomedical Engineering</italic> (pp. 170-180). Cham: Springer Nature Switzerland. </mixed-citation>
      </ref>
      <ref id="ref097">
        <label>[97]</label>
        <mixed-citation> bbrinkm, &amp; Will Cukierski. (2014). <italic>American Epilepsy Society Seizure Prediction Challenge</italic>. Retrieved from <ext-link xlink:href="https://kaggle.com/competitions/seizure-prediction">https://kaggle.com/competitions/seizure-prediction</ext-link> </mixed-citation>
      </ref>
      <ref id="ref098">
        <label>[98]</label>
        <mixed-citation> Xue, H., &amp; Salim, F. D. (2023). Promptcast: A new prompt-based learning paradigm for time series forecasting. <italic>IEEE Transactions on Knowledge and Data Engineering</italic>. [<uri>https://doi.org/10.1109/TKDE.2023.3342137</uri>] </mixed-citation>
      </ref>
      <ref id="ref099">
        <label>[99]</label>
        <mixed-citation> Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … &amp; Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. <italic>Journal of machine learning research</italic>, 21(140), 1-67. </mixed-citation>
      </ref>
      <ref id="ref100">
        <label>[100]</label>
        <mixed-citation> Cleveland, R. B., Cleveland, W. S., McRae, J. E., &amp; Terpenning, I. (1990). STL: A seasonal-trend decomposition. <italic>J. off. Stat, 6</italic>(1), 3-73. </mixed-citation>
      </ref>
      <ref id="ref101">
        <label>[101]</label>
        <mixed-citation> Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … &amp; Chen, W. (2021). Lora: Low-rank adaptation of large language models. <italic>arXiv preprint arXiv:2106.09685</italic>. [<uri>https://doi.org/10.48550/arXiv.2106.09685</uri>] </mixed-citation>
      </ref>
      <ref id="ref102">
        <label>[102]</label>
        <mixed-citation> Wu, H., Xu, J., Wang, J., &amp; Long, M. (2021). Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. <italic>Advances in neural information processing systems, 34</italic>, 22419-22430. </mixed-citation>
      </ref>
      <ref id="ref103">
        <label>[103]</label>
        <mixed-citation> Chang, C., Peng, W. C., &amp; Chen, T. F. (2023). Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. <italic>arXiv preprint arXiv:2308.08469</italic>. [<uri>https://doi.org/10.48550/arXiv.2308.08469</uri>] </mixed-citation>
      </ref>
      <ref id="ref104">
        <label>[104]</label>
        <mixed-citation> Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., &amp; Sutskever, I. (2019). Language models are unsupervised multitask learners. <italic>OpenAI blog, 1</italic>(8), 9. </mixed-citation>
      </ref>
      <ref id="ref105">
        <label>[105]</label>
        <mixed-citation> Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., … &amp; Wen, Q. (2023). Time-llm: Time series forecasting by reprogramming large language models. <italic>arXiv preprint arXiv:2310.01728</italic>. [<uri>https://doi.org/10.48550/arXiv.2310.01728</uri>] </mixed-citation>
      </ref>
      <ref id="ref106">
        <label>[106]</label>
        <mixed-citation> Pan, Z., Jiang, Y., Garg, S., Schneider, A., Nevmyvaka, Y., &amp; Song, D. (2024). $ Sˆ 2$ IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting. In <italic>Forty-first International Conference on Machine Learning</italic>. </mixed-citation>
      </ref>
      <ref id="ref107">
        <label>[107]</label>
        <mixed-citation> Zhou, T., Niu, P., Sun, L., &amp; Jin, R. (2023). One fits all: Power general time series analysis by pretrained lm. <italic>Advances in neural information processing systems, 36</italic>, 43322-43355. </mixed-citation>
      </ref>
      <ref id="ref108">
        <label>[108]</label>
        <mixed-citation> Bagnall, A., Dau, H. A., Lines, J., Flynn, M., Large, J., Bostrom, A., … &amp; Keogh, E. (2018). The UEA multivariate time series classification archive, 2018. <italic>arXiv preprint arXiv:1811.00075</italic>. [<uri>https://doi.org/10.48550/arXiv.1811.00075</uri>] </mixed-citation>
      </ref>
      <ref id="ref109">
        <label>[109]</label>
        <mixed-citation> Sun, C., Li, H., Li, Y., &amp; Hong, S. (2023). TEST: Text prototype aligned embedding to activate LLM's ability for time series. <italic>arXiv preprint arXiv:2308.08241</italic>. [<uri>https://doi.org/10.48550/arXiv.2308.08241</uri>] </mixed-citation>
      </ref>
      <ref id="ref110">
        <label>[110]</label>
        <mixed-citation> Zhang, Y., Yang, S., Cauwenberghs, G., &amp; Jung, T. P. (2024). From Word Embedding to Reading Embedding Using Large Language Model, EEG and Eye-tracking. <italic>arXiv preprint arXiv:2401.15681</italic>. [<uri>https://doi.org/10.48550/arXiv.2401.15681</uri>] </mixed-citation>
      </ref>
      <ref id="ref111">
        <label>[111]</label>
        <mixed-citation> Hollenstein, N., Rotsztejn, J., Troendle, M., Pedroni, A., Zhang, C., &amp; Langer, N. (2018). ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. <italic>Scientific data, 5</italic>(1), 1-13. </mixed-citation>
      </ref>
      <ref id="ref112">
        <label>[112]</label>
        <mixed-citation> Qiu, J., Han, W., Zhu, J., Xu, M., Weber, D., Li, B., &amp; Zhao, D. (2023, December). Can brain signals reveal inner alignment with human languages?. In <italic>Findings of the Association for Computational Linguistics: EMNLP 2023</italic> (pp. 1789-1804). [<uri>https://doi.org/10.18653/v1/2023.findings-emnlp.120</uri>] </mixed-citation>
      </ref>
      <ref id="ref113">
        <label>[113]</label>
        <mixed-citation> Park, C. Y., Cha, N., Kang, S., Kim, A., Khandoker, A. H., Hadjileontiadis, L., … &amp; Lee, U. (2020). K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. <italic>Scientific Data, 7</italic>(1), 293. </mixed-citation>
      </ref>
      <ref id="ref114">
        <label>[114]</label>
        <mixed-citation> Li, J., Liu, C., Cheng, S., Arcucci, R., &amp; Hong, S. (2024, January). Frozen language model helps ecg zero-shot learning. In <italic>Medical Imaging with Deep Learning</italic> (pp. 402-415). PMLR. </mixed-citation>
      </ref>
      <ref id="ref115">
        <label>[115]</label>
        <mixed-citation> Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jin, D., Naumann, T., &amp; McDermott, M. (2019). Publicly available clinical BERT embeddings. <italic>arXiv preprint arXiv:1904.03323</italic>. [<uri>https://doi.org/10.48550/arXiv.1904.03323</uri>] </mixed-citation>
      </ref>
      <ref id="ref116">
        <label>[116]</label>
        <mixed-citation> Wagner, P., Strodthoff, N., Bousseljot, R. D., Kreiseler, D., Lunze, F. I., Samek, W., &amp; Schaeffter, T. (2020). PTB-XL, a large publicly available electrocardiography dataset. <italic>Scientific data, 7</italic>(1), 1-15. </mixed-citation>
      </ref>
      <ref id="ref117">
        <label>[117]</label>
        <mixed-citation> Moody, G. B., &amp; Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. <italic>IEEE engineering in medicine and biology magazine, 20</italic>(3), 45-50. [<uri>https://doi.org/10.1109/51.932724</uri>] </mixed-citation>
      </ref>
      <ref id="ref118">
        <label>[118]</label>
        <mixed-citation> Jia, F., Wang, K., Zheng, Y., Cao, D., &amp; Liu, Y. (2024, March). GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting. In <italic>Proceedings of the AAAI Conference on Artificial Intelligence</italic> (Vol. 38, No. 21, pp. 23343-23351). [<uri>https://doi.org/10.1609/aaai.v38i21.30383</uri>] </mixed-citation>
      </ref>
      <ref id="ref119">
        <label>[119]</label>
        <mixed-citation> Yu, H., Guo, P., &amp; Sano, A. (2024). ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text. <italic>arXiv preprint arXiv:2405.19366</italic>. [<uri>https://doi.org/10.48550/arXiv.2405.19366</uri>] </mixed-citation>
      </ref>
      <ref id="ref120">
        <label>[120]</label>
        <mixed-citation> Yasunaga, M., Leskovec, J., &amp; Liang, P. (2022). Linkbert: Pretraining language models with document links. <italic>arXiv preprint arXiv:2203.15827</italic>. [<uri>https://doi.org/10.48550/arXiv.2203.15827</uri>] </mixed-citation>
      </ref>
      <ref id="ref121">
        <label>[121]</label>
        <mixed-citation> Zheng, J., Chu, H., Struppa, D., Zhang, J., Yacoub, S. M., El-Askary, H., … &amp; Rakovski, C. (2020). Optimal multi-stage arrhythmia classification approach. <italic>Scientific reports, 10</italic>(1), 2898. </mixed-citation>
      </ref>
      <ref id="ref122">
        <label>[122]</label>
        <mixed-citation> Cheng, M., Chen, Y., Liu, Q., Liu, Z., &amp; Luo, Y. (2024). Advancing Time Series Classification with Multimodal Language Modeling. <italic>arXiv preprint arXiv:2403.12371</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.12371</uri>] </mixed-citation>
      </ref>
      <ref id="ref123">
        <label>[123]</label>
        <mixed-citation> Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … &amp; Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. <italic>arXiv preprint arXiv:1609.03499</italic>. </mixed-citation>
      </ref>
      <ref id="ref124">
        <label>[124]</label>
        <mixed-citation> Cheng, M., Liu, Q., Liu, Z., Zhang, H., Zhang, R., &amp; Chen, E. (2023). Timemae: Self-supervised representations of time series with decoupled masked autoencoders. <italic>arXiv preprint arXiv:2303.00320</italic>. [<uri>https://doi.org/10.48550/arXiv.2303.00320</uri>] </mixed-citation>
      </ref>
      <ref id="ref125">
        <label>[125]</label>
        <mixed-citation> Liu, M., Ren, S., Ma, S., Jiao, J., Chen, Y., Wang, Z., &amp; Song, W. (2021). Gated transformer networks for multivariate time series classification. <italic>arXiv preprint arXiv:2103.14438</italic>. [<uri>https://doi.org/10.48550/arXiv.2103.14438</uri>] </mixed-citation>
      </ref>
      <ref id="ref126">
        <label>[126]</label>
        <mixed-citation> Cheng, M., Tao, X., Liu, Q., Zhang, H., Chen, Y., &amp; Lei, C. (2024). Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model. <italic>arXiv preprint arXiv:2403.12372</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.12372</uri>] </mixed-citation>
      </ref>
      <ref id="ref127">
        <label>[127]</label>
        <mixed-citation> Kim, J. W., Alaa, A., &amp; Bernardo, D. (2024). EEG-GPT: exploring capabilities of large language models for EEG classification and interpretation. <italic>arXiv preprint arXiv:2401.18006</italic>. [<uri>https://doi.org/10.48550/arXiv.2401.18006</uri>] </mixed-citation>
      </ref>
      <ref id="ref128">
        <label>[128]</label>
        <mixed-citation> Wang, Y., Jin, R., Wu, M., Li, X., Xie, L., &amp; Chen, Z. (2024). K-Link: Knowledge-Link Graph from LLMs for Enhanced Representation Learning in Multivariate Time-Series Data. <italic>arXiv preprint arXiv:2403.03645</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.03645</uri>] </mixed-citation>
      </ref>
      <ref id="ref129">
        <label>[129]</label>
        <mixed-citation> Han, Z., Gao, C., Liu, J., Zhang, J., &amp; Zhang, S. Q. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. <italic>arXiv preprint arXiv:2403.14608</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.14608</uri>] </mixed-citation>
      </ref>
      <ref id="ref130">
        <label>[130]</label>
        <mixed-citation> Lester, B., Al-Rfou, R., &amp; Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. <italic>arXiv preprint arXiv:2104.08691</italic>. [<uri>https://doi.org/10.48550/arXiv.2104.08691</uri>] </mixed-citation>
      </ref>
      <ref id="ref131">
        <label>[131]</label>
        <mixed-citation> Hinton, G. (2015). Distilling the Knowledge in a Neural Network. <italic>arXiv preprint arXiv:1503.02531</italic>. [<uri>https://doi.org/10.48550/arXiv.1503.02531</uri>] </mixed-citation>
      </ref>
      <ref id="ref132">
        <label>[132]</label>
        <mixed-citation> Jiang, Y., Pan, Z., Zhang, X., Garg, S., Schneider, A., Nevmyvaka, Y., &amp; Song, D. (2024). Empowering time series analysis with large language models: A survey. <italic>arXiv preprint arXiv:2402.03182</italic>. [<uri>https://doi.org/10.48550/arXiv.2402.03182</uri>] </mixed-citation>
      </ref>
      <ref id="ref133">
        <label>[133]</label>
        <mixed-citation> Wang, Z., &amp; Ji, H. (2022, June). Open vocabulary electroencephalography-to-text decoding and zero-shot sentiment classification. In <italic>Proceedings of the AAAI Conference on Artificial Intelligence</italic> (Vol. 36, No. 5, pp. 5350-5358). [<uri>https://doi.org/10.1609/aaai.v36i5.20472</uri>] </mixed-citation>
      </ref>
      <ref id="ref134">
        <label>[134]</label>
        <mixed-citation> Lewis, M. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. <italic>arXiv preprint arXiv:1910.13461</italic>. [<uri>https://doi.org/10.48550/arXiv.1910.13461</uri>] </mixed-citation>
      </ref>
      <ref id="ref135">
        <label>[135]</label>
        <mixed-citation> Cao, D., Jia, F., Arik, S. O., Pfister, T., Zheng, Y., Ye, W., &amp; Liu, Y. (2023). Tempo: Prompt-based generative pre-trained transformer for time series forecasting. <italic>arXiv preprint arXiv:2310.04948</italic>. [<uri>https://doi.org/10.48550/arXiv.2310.04948</uri>] </mixed-citation>
      </ref>
      <ref id="ref136">
        <label>[136]</label>
        <mixed-citation> Liu, P., Guo, H., Dai, T., Li, N., Bao, J., Ren, X., … &amp; Xia, S. T. (2024). Taming Pre-trained LLMs for Generalised Time Series Forecasting via Cross-modal Knowledge Distillation. <italic>arXiv preprint arXiv:2403.07300</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.07300</uri>] </mixed-citation>
      </ref>
      <ref id="ref137">
        <label>[137]</label>
        <mixed-citation> Tan, M., Merrill, M. A., Gupta, V., Althoff, T., &amp; Hartvigsen, T. (2024, June). Are language models actually useful for time series forecasting?. In <italic>The Thirty-eighth Annual Conference on Neural Information Processing Systems</italic>. </mixed-citation>
      </ref>
      <ref id="ref138">
        <label>[138]</label>
        <mixed-citation> Zheng, L. N., Dong, C. G., Zhang, W. E., Yue, L., Xu, M., Maennel, O., &amp; Chen, W. (2024). Revisited Large Language Model for Time Series Analysis through Modality Alignment. <italic>arXiv preprint arXiv:2410.12326</italic>. [<uri>https://doi.org/10.48550/arXiv.2410.12326</uri>] </mixed-citation>
      </ref>
      <ref id="ref139">
        <label>[139]</label>
        <mixed-citation> Zhou, T., Niu, P., Wang, X., Sun, L., &amp; Jin, R. (2023). One fits all: Universal time series analysis by pretrained lm and specially designed adaptors. <italic>arXiv preprint arXiv:2311.14782</italic>. [<uri>https://doi.org/10.48550/arXiv.2311.14782</uri>] </mixed-citation>
      </ref>
      <ref id="ref140">
        <label>[140]</label>
        <mixed-citation> Zhang, Y., Li, Q., Nahata, S., Jamal, T., Cheng, S. K., Cauwenberghs, G., &amp; Jung, T. P. (2024). Integrating large language model, EEG, and eye-tracking for word-level neural state classification in reading comprehension. <italic>IEEE Transactions on Neural Systems and Rehabilitation Engineering</italic>. [<uri>https://doi.org/10.1109/TNSRE.2024.3435460</uri>] </mixed-citation>
      </ref>
      <ref id="ref141">
        <label>[141]</label>
        <mixed-citation> Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … &amp; Bengio, Y. (2020). Generative adversarial networks. <italic>Communications of the ACM, 63</italic>(11), 139-144. [<uri>https://doi.org/10.1145/3422622</uri>] </mixed-citation>
      </ref>
      <ref id="ref142">
        <label>[142]</label>
        <mixed-citation> Ho, J., Jain, A., &amp; Abbeel, P. (2020). Denoising diffusion probabilistic models. <italic>Advances in neural information processing systems, 33</italic>, 6840-6851. </mixed-citation>
      </ref>
      <ref id="ref143">
        <label>[143]</label>
        <mixed-citation> Kavasidis, I., Palazzo, S., Spampinato, C., Giordano, D., &amp; Shah, M. (2017, October). Brain2image: Converting brain signals into images. In <italic>Proceedings of the 25th ACM international conference on Multimedia</italic> (pp. 1809-1817). [<uri>https://doi.org/10.1145/3123266.3127907</uri>] </mixed-citation>
      </ref>
      <ref id="ref144">
        <label>[144]</label>
        <mixed-citation> Spampinato, C., Palazzo, S., Kavasidis, I., Giordano, D., Souly, N., &amp; Shah, M. (2017). Deep learning human mind for automated visual classification. In <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic> (pp. 6809-6817). </mixed-citation>
      </ref>
      <ref id="ref145">
        <label>[145]</label>
        <mixed-citation> Tirupattur, P., Rawat, Y. S., Spampinato, C., &amp; Shah, M. (2018, October). Thoughtviz: Visualizing human thoughts using generative adversarial network. In <italic>Proceedings of the 26th ACM international conference on Multimedia</italic> (pp. 950-958). [<uri>https://doi.org/10.1145/3240508.3240641</uri>] </mixed-citation>
      </ref>
      <ref id="ref146">
        <label>[146]</label>
        <mixed-citation> Kumar, P., Saini, R., Roy, P. P., Sahu, P. K., &amp; Dogra, D. P. (2018). Envisioned speech recognition using EEG sensors. <italic>Personal and Ubiquitous Computing, 22</italic>, 185-199. </mixed-citation>
      </ref>
      <ref id="ref147">
        <label>[147]</label>
        <mixed-citation> Singh, P., Pandey, P., Miyapuram, K., &amp; Raman, S. (2023, June). EEG2IMAGE: image reconstruction from EEG brain signals. In <italic>ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</italic> (pp. 1-5). IEEE. [<uri>https://doi.org/10.1109/ICASSP49357.2023.10096587</uri>] </mixed-citation>
      </ref>
      <ref id="ref148">
        <label>[148]</label>
        <mixed-citation> Singh, P., Dalal, D., Vashishtha, G., Miyapuram, K., &amp; Raman, S. (2024). Learning Robust Deep Visual Representations from EEG Brain Recordings. In <italic>Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</italic> (pp. 7553-7562). </mixed-citation>
      </ref>
      <ref id="ref149">
        <label>[149]</label>
        <mixed-citation> Kaneshiro, B., Perreau Guimaraes, M., Kim, H. S., Norcia, A. M., &amp; Suppes, P. (2015). A representational similarity analysis of the dynamics of object processing using single-trial EEG classification. <italic>Plos one, 10</italic>(8), e0135697. [<uri>https://doi.org/10.1371/journal.pone.0135697</uri>] </mixed-citation>
      </ref>
      <ref id="ref150">
        <label>[150]</label>
        <mixed-citation> Bai, Y., Wang, X., Cao, Y. P., Ge, Y., Yuan, C., &amp; Shan, Y. (2023). Dreamdiffusion: Generating high-quality images from brain eeg signals. <italic>arXiv preprint arXiv:2306.16934</italic>. [<uri>https://doi.org/10.48550/arXiv.2306.16934</uri>] </mixed-citation>
      </ref>
      <ref id="ref151">
        <label>[151]</label>
        <mixed-citation> Lan, Y. T., Ren, K., Wang, Y., Zheng, W. L., Li, D., Lu, B. L., &amp; Qiu, L. (2023). Seeing through the brain: image reconstruction of visual perception from human brain signals. <italic>arXiv preprint arXiv:2308.02510</italic>. [<uri>https://doi.org/10.48550/arXiv.2308.02510</uri>] </mixed-citation>
      </ref>
      <ref id="ref152">
        <label>[152]</label>
        <mixed-citation> Liu, H., Hajialigol, D., Antony, B., Han, A., &amp; Wang, X. (2024). EEG2TEXT: Open Vocabulary EEG-to-Text Decoding with EEG Pre-Training and Multi-View Transformer. <italic>arXiv preprint arXiv:2405.02165</italic>. [<uri>https://doi.org/10.48550/arXiv.2405.02165</uri>] </mixed-citation>
      </ref>
      <ref id="ref153">
        <label>[153]</label>
        <mixed-citation> Gifford, A. T., Dwivedi, K., Roig, G., &amp; Cichy, R. M. (2022). A large and rich EEG dataset for modeling human visual object recognition. <italic>NeuroImage, 264</italic>, 119754. [<uri>https://doi.org/10.1016/j.neuroimage.2022.119754</uri>] </mixed-citation>
      </ref>
      <ref id="ref154">
        <label>[154]</label>
        <mixed-citation> Wang, J., Song, Z., Ma, Z., Qiu, X., Zhang, M., &amp; Zhang, Z. (2024). Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder. <italic>arXiv preprint arXiv:2402.17433</italic>. [<uri>https://doi.org/10.18653/v1/2024.acl-long.393</uri>] </mixed-citation>
      </ref>
      <ref id="ref155">
        <label>[155]</label>
        <mixed-citation> Duan, Y., Chau, C., Wang, Z., Wang, Y. K., &amp; Lin, C. T. (2024). Dewave: Discrete encoding of eeg waves for eeg to text translation. <italic>Advances in Neural Information Processing Systems, 36</italic>. </mixed-citation>
      </ref>
      <ref id="ref156">
        <label>[156]</label>
        <mixed-citation> Guo, Y., Liu, T., Zhang, X., Wang, A., &amp; Wang, W. (2023). End-to-end translation of human neural activity to speech with a dual–dual generative adversarial network. <italic>Knowledge-Based Systems, 277</italic>, 110837. [<uri>https://doi.org/10.1016/j.knosys.2023.110837</uri>] </mixed-citation>
      </ref>
      <ref id="ref157">
        <label>[157]</label>
        <mixed-citation> Daly, I. (2023). Neural decoding of music from the EEG. <italic>Scientific Reports, 13</italic>(1), 624. [<uri>https://doi.org/10.1038/s41598-022-27361-x</uri>] </mixed-citation>
      </ref>
      <ref id="ref158">
        <label>[158]</label>
        <mixed-citation> Radford, A. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. <italic>arXiv preprint arXiv:1511.06434</italic>. [<uri>https://doi.org/10.48550/arXiv.1511.06434</uri>] </mixed-citation>
      </ref>
      <ref id="ref159">
        <label>[159]</label>
        <mixed-citation> Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., &amp; Aila, T. (2020). Training generative adversarial networks with limited data. <italic>Advances in neural information processing systems, 33</italic>, 12104-12114. </mixed-citation>
      </ref>
      <ref id="ref160">
        <label>[160]</label>
        <mixed-citation> Jayaram, V., &amp; Barachant, A. (2018). MOABB: trustworthy algorithm benchmarking for BCIs. <italic>Journal of neural engineering, 15</italic>(6), 066011. </mixed-citation>
      </ref>
      <ref id="ref161">
        <label>[161]</label>
        <mixed-citation> Blankertz, B., Dornhege, G., Krauledat, M., Müller, K. R., &amp; Curio, G. (2007). The non-invasive Berlin brain–computer interface: fast acquisition of effective performance in untrained subjects. <italic>NeuroImage, 37</italic>(2), 539-550. [<uri>https://doi.org/10.1016/j.neuroimage.2007.01.051</uri>] </mixed-citation>
      </ref>
      <ref id="ref162">
        <label>[162]</label>
        <mixed-citation> Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. <italic>Pattern recognition, 30</italic>(7), 1145-1159. [<uri>https://doi.org/10.1016/S0031-3203(96)00142-2</uri>] </mixed-citation>
      </ref>
      <ref id="ref163">
        <label>[163]</label>
        <mixed-citation> Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., &amp; Chen, X. (2016). Improved techniques for training gans. <italic>Advances in neural information processing systems, 29</italic>. </mixed-citation>
      </ref>
      <ref id="ref164">
        <label>[164]</label>
        <mixed-citation> Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., &amp; Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. <italic>Advances in neural information processing systems, 30</italic>. </mixed-citation>
      </ref>
      <ref id="ref165">
        <label>[165]</label>
        <mixed-citation> Bińkowski, M., Sutherland, D. J., Arbel, M., &amp; Gretton, A. (2018). Demystifying mmd gans. <italic>arXiv preprint arXiv:1801.01401</italic>. [<uri>https://doi.org/10.48550/arXiv.1801.01401</uri>] </mixed-citation>
      </ref>
      <ref id="ref166">
        <label>[166]</label>
        <mixed-citation> Wang, Z., Bovik, A. C., Sheikh, H. R., &amp; Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. <italic>IEEE transactions on image processing, 13</italic>(4), 600-612. [<uri>https://doi.org/10.1109/TIP.2003.819861</uri>] </mixed-citation>
      </ref>
      <ref id="ref167">
        <label>[167]</label>
        <mixed-citation> Papineni, K., Roukos, S., Ward, T., &amp; Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In <italic>Proceedings of the 40th annual meeting of the Association for Computational Linguistics</italic> (pp. 311-318). </mixed-citation>
      </ref>
      <ref id="ref168">
        <label>[168]</label>
        <mixed-citation> Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In <italic>Text summarization branches out</italic> (pp. 74-81). </mixed-citation>
      </ref>
      <ref id="ref169">
        <label>[169]</label>
        <mixed-citation> Kubichek, R. (1993, May). Mel-cepstral distance measure for objective speech quality assessment. In <italic>Proceedings of IEEE pacific rim conference on communications computers and signal processing</italic> (Vol. 1, pp. 125-128). IEEE. [<uri>https://doi.org/10.1109/PACRIM.1993.407206</uri>] </mixed-citation>
      </ref>
      <ref id="ref170">
        <label>[170]</label>
        <mixed-citation> Dao, T., &amp; Gu, A. (2024). Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. <italic>arXiv preprint arXiv:2405.21060</italic>. [<uri>https://doi.org/10.48550/arXiv.2405.21060</uri>] </mixed-citation>
      </ref>
      <ref id="ref171">
        <label>[171]</label>
        <mixed-citation> Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., … &amp; Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. <italic>arXiv preprint arXiv:2404.19756</italic>. [<uri>https://doi.org/10.48550/arXiv.2404.19756</uri>] </mixed-citation>
      </ref>
      <ref id="ref172">
        <label>[172]</label>
        <mixed-citation> Ni, R., Lin, Z., Wang, S., &amp; Fanti, G. (2024, April). Mixture-of-Linear-Experts for Long-term Time Series Forecasting. In <italic>International Conference on Artificial Intelligence and Statistics</italic> (pp. 4672-4680). PMLR. </mixed-citation>
      </ref>
      <ref id="ref173">
        <label>[173]</label>
        <mixed-citation> Yu, C., Wang, F., Shao, Z., Qian, T., Zhang, Z., Wei, W., &amp; Xu, Y. (2024, August). Ginar: An end-to-end multivariate time series forecasting model suitable for variable missing. In <italic>Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining</italic> (pp. 3989-4000). [<uri>https://doi.org/10.1145/3637528.3672055</uri>] </mixed-citation>
      </ref>
      <ref id="ref174">
        <label>[174]</label>
        <mixed-citation> Qiao, Z., Pham, Q., Cao, Z., Le, H. H., Suganthan, P. N., Jiang, X., &amp; Savitha, R. (2024). Class-incremental learning for time series: Benchmark and evaluation. <italic>arXiv preprint arXiv:2402.12035</italic>. [<uri>https://doi.org/10.48550/arXiv.2402.12035</uri>] </mixed-citation>
      </ref>
      <ref id="ref175">
        <label>[175]</label>
        <mixed-citation> Ragab, M., Eldele, E., Wu, M., Foo, C. S., Li, X., &amp; Chen, Z. (2023, August). Source-free domain adaptation with temporal imputation for time series data. In <italic>Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</italic> (pp. 1989-1998). [<uri>https://doi.org/10.1145/3580305.3599507</uri>] </mixed-citation>
      </ref>
      <ref id="ref176">
        <label>[176]</label>
        <mixed-citation> Qiu, X., Hu, J., Zhou, L., Wu, X., Du, J., Zhang, B., … &amp; Yang, B. (2024). Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. <italic>arXiv preprint arXiv:2403.20150</italic>. [<uri>https://doi.org/10.48550/arXiv.2403.20150</uri>] </mixed-citation>
      </ref>
      <ref id="ref177">
        <label>[177]</label>
        <mixed-citation> Wang, Y., Wu, H., Dong, J., Liu, Y., Long, M., &amp; Wang, J. (2024). Deep time series models: A comprehensive survey and benchmark. <italic>arXiv preprint arXiv:2407.13278</italic>. [<uri>https://doi.org/10.48550/arXiv.2407.13278</uri>] </mixed-citation>
      </ref>
      <ref id="ref178">
        <label>[178]</label>
        <mixed-citation> Savran, A., Ciftci, K., Chanel, G., Cruz_Mota, J., Viet, L. H., Sankur, B., … &amp; Rombaut, M. (2006). Emotion detection in the loop from brain signals and facial images. In <italic>eINTERFACE'06-SIMILAR NoE Summer Workshop on Multimodal Interfaces</italic>. </mixed-citation>
      </ref>
      <ref id="ref179">
        <label>[179]</label>
        <mixed-citation> Trujillo, L. T., Stanfield, C. T., &amp; Vela, R. D. (2017). The effect of electroencephalogram (EEG) reference choice on information-theoretic measures of the complexity and integration of EEG signals. <italic>Frontiers in neuroscience, 11</italic>, 425. [<uri>https://doi.org/10.3389/fnins.2017.00425</uri>] </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>
