<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.1d2 20140930//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.1d2" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="nlm-ta">FEIR</journal-id>
      <journal-id journal-id-type="publisher-id">IECE</journal-id>
      <journal-title-group>
        <journal-title xml:lang="zh">Frontiers in Educational Innovation and Research</journal-title>
        <journal-title xml:lang="en">Frontiers in Educational Innovation and Research</journal-title>
      </journal-title-group>
      <issn pub-type="ppub" publication-format="print">1</issn>
      <issn pub-type="epub" publication-format="electronic">1</issn>
      <publisher>
        <publisher-name>Institute of Emerging and Computer Engineering Inc</publisher-name>
        <publisher-loc>522 W RIVERSIDE AVE STE N, SPOKANE, WA, 99201-0508, UNITED STATES</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.62762/FEIR.2024.416675</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Enhanced Dynamic Label Allocation for Mathematical Formula Named Entity Recognition in Learning Path Recommendations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0003-5015-7807</contrib-id>
          <name>
            <surname>Liu</surname>
            <given-names>Hongchen</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-9225-7660</contrib-id>
          <name>
            <surname>Zhang</surname>
            <given-names>Qingchuan</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff1"><label>1</label>National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China</aff>
      </contrib-group>
      <author-notes>
        <corresp id="cor2">Corresponding Author: Qingchuan Zhang. Email: <email>zqc1982@126.com</email></corresp>
      </author-notes>
      <pub-date date-type="pub" pub-type="epub" publication-format="online">
        <day>20</day>
        <month>5</month>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <issue>1</issue>
      <fpage>10</fpage>
      <lpage>21</lpage>
      <history>
        <date date-type="received">
          <day>13</day>
          <month>12</month>
          <year>2024</year>
        </date>
        <date date-type="accepted">
          <day>14</day>
          <month>4</month>
          <year>2025</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2025 by the Authors. Published by Institute of Emerging and Computer Engineers. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/).</copyright-statement>
        <copyright-year>2025</copyright-year>
        <copyright-holder>Institute of Emerging and Computer Engineering Inc</copyright-holder>
        <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
        <license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="https://www.iece.org/article/abs/feir.2024.416675">This article is available from https://www.iece.org/article/abs/feir.2024.416675</self-uri>
      <abstract>
        <p>In the field of natural language processing, Named entity recognition (NER) is a essential task. Mathematical formulas usually contain a large number of terminologies, units of measure and other proprietary knowledge, and the integration of this information into the knowledge graph can significantly enhance the semantic expression ability of the graph. By identifying the named entities in data formulas, the key concepts, entities and relationships between them in the knowledge graph can be extracted, establishing basis for the construction of the knowledge graph and making it easier to interpret and analyse in practical applications. Furthermore, the structured knowledge derived from this process can facilitate personalized learning path recommendations by mapping identified entities to educational resources and prerequisite relationships. Aiming at the problem of insufficient recognition ability of existing models for mathematical formula entities, a mathematical formula named entity recognition method combining enhanced dynamic allocation of labels is proposed. A mathematical formula entity recognition model consisted of BERT(Bidirectional Encoder Representation from Transformer), BiLSTM(Bidirectional Long Short-term Memory) and Transformer was constructed, namely BERT-formula. The feature representation of deep semantic information is enhanced by adding extra sequences to the original vector representation for splicing at the model input; and the entity label prediction problem is regarded as a one-to-many linear allocation problem, and an auction algorithm is introduced to acquire the optimal allocation result with the smallest cost. Experiments demonstrate that the accuracy of the model prediction on the mathematical formula set is 98.8%, and the F1 value is 98.8%, which is improved by 1.51 and 1.05 percentage points compared with BERT-BiLSTM-CRF. It is evident that the approach performs well on the objective of identifying mathematical formula entities.</p>
      </abstract>
      <kwd-group kwd-group-type="author" xml:lang="en">
        <kwd>named entity recognition (NER)</kwd>
        <kwd>mathematics</kwd>
        <kwd>bidirectional encoder representations from transformer (BERT)</kwd>
        <kwd>deep learning</kwd>
        <kwd>auction algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="S1">
      <label>1.</label>
      <title>Introduction</title>
      <p id="S1.p1">With the rapid development of information technology, technological breakthroughs in the fields of big data, cloud computing, artificial intelligence and other fields continue to emerge, which has brought about profound changes in all walks of life. In the field of operations research, mathematical modelling and optimization analysis methods play a crucial role in solving complex problems. Therefore, for mathematical formulas in operations research, we need to find it and distinguish it, and identify the named entities in the formulas accurately, which helps to improve the efficiency and accuracy of operations research.</p>
      <p id="S1.p2">Latex is a coding language for the representation of formulas in operations research, and it is also involved in other industries or professions related to mathematical formulas. Nonetheless, the topic of named entity recognition in operations research has seen comparatively little investigation and effort, and no sizeable and accessible datasets have been compiled. As a result, dealing with current problems in the field of formulas in operations research with existing techniques achieves mediocre results due to the lack of sufficient reference and experimental data.</p>
      <p id="S1.p3">Named Entity Recognition is a fundamental task for several natural language processing applications such as knowledge base question and answer systems, machine translation, information retrieval, sentiment analysis, and knowledge mapping. Finding specific entities, such as names of individuals, locations, organizations, etc. from a piece of text, and distinguish them from non-entities is the main objective of NER. Named entity recognition of latex texts makes further research possible. Accordingly, this paper constructs a dataset from the text of operations research courses in schools and related digital resources, and conducts research on entity recognition in operations research on the basis of the dataset.</p>
      <p id="S1.p4">This paper proposes a mathematical formula named entity recognition model consisting of BERT, BiLSTM and Transformer, for addressing the above issues. The BERT model is used to obtain the vector representation of each character, which is spliced with the vector representation of a randomly initialized instance query, and then jointly fed into BERT for encoding, and then the query semantics are enhanced by using One-Way Self-Attention so that the query can be modelled in terms of its connections with each other. This is followed by feature extraction through the BiLSTM layer and transformer layer, and finally the final predicted labelling results are obtained by finding the optimal allocation by finding the minimum cost matrix for the allocation problem.</p>
    </sec>
    <sec id="S2">
      <label>2.</label>
      <title>Related Work</title>
      <p id="S2.p1">The research progress in named entity recognition techniques can be divided into the following stages:</p>
      <p id="S2.p2">The first stage is the use of dictionary and rule based approach. Firstly, a candidate dictionary is obtained based on statistical analysis, and then manual screening is used while extracting the important terms in the domain. Using the a priori knowledge of the lexicon, potential entities in the sentence are matched, after which they are filtered according to rules. Rule-based approaches tend to rely too much on manually defined rules and templates, and thus may have limitations in their coverage for complex linguistic expressions and diverse inputs. Moreover, manually constructed rules are subjective and prone to bias and errors.</p>
      <p id="S2.p3">The second stage is the statistical machine learning based approach, in which researchers begin to try to use statistical models for named entity recognition. For instance, in a research finished by Yu et al. [<xref rid="ref001" ref-type="bibr">1</xref>], they utilized Hidden Markov Model (HMM) for the recognition of Chinese named entities; Huang et al. [<xref rid="ref002" ref-type="bibr">2</xref>] suggested an approach combining Support Vector Machine (SVM) with transformation-based error-driven learning for biological entity recognition; A named entity recognition technique based on conditional random fields was proposed by Feng et al. [<xref rid="ref003" ref-type="bibr">3</xref>]. Statistical machine learning based methods no longer rely on hand-constructed tedious rules, however, they need a large number of training sets with clear labels, which still takes a lot of effort and resources.</p>
      <p id="S2.p4">The third stage is deep learning based named entity recognition. With the theory and application of deep learning gradually coming into people's attention, in addition to the improvement of algorithms and computer performance, the depth and width of neural networks [<xref rid="ref004" ref-type="bibr">4</xref>] are also increasing. As a result, many neural network structures have emerged that are particularly well known today, the Convolutional Neural Network (CNN) [<xref rid="ref005" ref-type="bibr">5</xref>], the Recurrent Neural Network (RNN) [<xref rid="ref006" ref-type="bibr">6</xref>], the Long Short Term Memory Network (LSTM), and even more intricate deep learning models like BERT (Bidirectional Encoder Representation from Transformer) [<xref rid="ref007" ref-type="bibr">7</xref>]. Deep learning models can learn and extract features from data on their own, in contrast to standard machine learning techniques that require feature extraction by hand. Deep learning greatly saves the human resources required for feature fusion, and deep learning models can automatically learn complex language patterns with strong generalization ability, which enables them to be better applied to practical tasks. For example, unidirectional Long Short Term Memory (LSTM) networks [<xref rid="ref008" ref-type="bibr">8</xref>] are widely used in NER tasks due to their strong sequential feature extraction ability and are often combined with CRF (LSTM-CRF [<xref rid="ref009" ref-type="bibr">9</xref>]) to achieve better recognition results. However, since unidirectional LSTM networks is limited to extracting unidirectional text features, Lample et al. [<xref rid="ref010" ref-type="bibr">10</xref>] then proposed a Bidirectional LSTM (BiLSTM) network on this basis to obtain global contextual deep features, and then combined with CRF to form a BiLSTM-CRF neural network, it enhances the model's effect even further, and since then the model has gradually become the mainstream model for deep learning to solve NER problems in various fields. For example, Zhou et al. [<xref rid="ref011" ref-type="bibr">11</xref>] promoted knowledge mining of ancient Chinese medicine books by extract text features using the BiLSTM-CRF method; Cheng et al. [<xref rid="ref012" ref-type="bibr">12</xref>] applied the BiLSTM-CRF model to the field of ancient Chinese literature, and realized the processing tasks of automatic sentence breakage, automatic word division, lexical annotation and other processing tasks of the ancient Chinese information, and achieved very good results.</p>
      <p id="S2.p5">Meanwhile there are many scholars who have improved and innovated the BiLSTM-CRF model. For example, Huang et al. [<xref rid="ref013" ref-type="bibr">13</xref>] introduced an external cybersecurity lexicon to enhance the features of cybersecurity texts based on the BiLSTM-CRF model, and obtained a favorable outcome on the cybersecurity dataset. On the field of agriculture, Zhou et al. [<xref rid="ref014" ref-type="bibr">14</xref>] firstly processed the long text of the dataset into short text, and then input it into the ERNIE model for encoding to get this representation that preserves semantic associations, and subsequently enter it into BiLSTM-CRF to address the issues of low efficiency of the soil fertility named entity recognition method and poor text processing effect. Regarding the study of Chinese, Li et al. [<xref rid="ref015" ref-type="bibr">15</xref>] introduced Hybrid Attention mechanism (Hybrid Attention) into BiLSTM-CRF model to achieve good semantic analysis ability and accurately represent the negation semantics in a sentence.</p>
      <p id="S2.p6">The Iterative Expanded Convolutional Neural Network (IDCNN), which can effectively extract local features across a wide acceptance domain, was also applied for Named Entity Recognition for the first time by Strubell et al. [<xref rid="ref016" ref-type="bibr">16</xref>]. In order to solve the named entity identification issue in electronic medical records, Chen et al. [<xref rid="ref017" ref-type="bibr">17</xref>] presented an attention mechanism based on the IDCNN-CRF model, and the special step-size of the inflated convolution can extract the text features more accurately with excellent recognition results.</p>
      <p id="S2.p7">To improve the word vectors' semantic representation, scholars have proposed pre-trained language models. Peters et al. [<xref rid="ref018" ref-type="bibr">18</xref>] proposed ELMo model based on BiLSTM structure, which is able to extract bi-directional textual features. Radford et al. [<xref rid="ref019" ref-type="bibr">19</xref>] proposed GPT (Generative Pre-trained Transformer) model, which is able to capture more distant semantic features.is able to capture more distant semantic information, but because it is a unidirectional model, it is unable to obtain global contextual information. Therefore, Devlin et al. [<xref rid="ref008" ref-type="bibr">8</xref>] from Google team proposed the BERT model with bidirectional Transformer encoder structure. This boosts its performance on named entity recognition tests and further refines the semantic representation of word vectors. For example, Li et al. [<xref rid="ref020" ref-type="bibr">20</xref>] utilized the BERT-CRF model to the joint extraction of maize breeding entity relationships, which provides an effective data base for the construction of maize breeding knowledge graph and other downstream tasks; Zheng et al. [<xref rid="ref021" ref-type="bibr">21</xref>] utilized the BiLSTM-CRF model based on BERT for web content monitoring to identify sensitive words and variants, and the recognition effect is improved compared with other models; Yu et al. [<xref rid="ref022" ref-type="bibr">22</xref>] proposed an ancient poetic place name recognition model, referred to as DABERT-CRF, using a data enhancement method while integrating BERT-CRF, as a way to promote further research on Chinese classical literature; Li et al. [<xref rid="ref023" ref-type="bibr">23</xref>] used BERT and BiLSTM, combined with a bilinear attention mechanism, to successfully improve the semantic information and attain favorable outcomes in the Chinese recognition of medically named entities. Good results have been achieved on it.</p>
      <p id="S2.p8">The NER task was redefined as a machine-reading task in recent years by Luo et al. [<xref rid="ref024" ref-type="bibr">24</xref>], Mengge et al. [<xref rid="ref025" ref-type="bibr">25</xref>] and Zheng et al. [<xref rid="ref026" ref-type="bibr">26</xref>], who demonstrated strong performance on both flat and nested datasets. With the goal to extract entities, they create type-specific queries based on external knowledge and treat phrases as contexts. They create PER-specific queries in natural language form, for instance, for the statement "U.S. President Donald Trump is enjoying his vacation in Miami" in order to extract PER entities like "U.S. President" and "Donald Trump". However, only one type of entity can be extracted for each inference because searches are type-specific. This approach overlooks the inherent relationships between different entity types in addition to producing ineffective predictions. Furthermore, Type-specific searches are manually constructed using external knowledge, which makes realistic scenarios using hundreds of entity types difficult.</p>
    </sec>
    <sec id="S3">
      <label>3.</label>
      <title>Methodology: A mathematical formula named entity recognition method combining enhanced dynamic allocation of labels</title>
      <p>
        <fig id="F1">
          <label>Figure 1.</label>
          <caption>
            <p>Structure of the model.</p>
          </caption>
          <graphic xlink:href="fig1.jpg"/>
        </fig>
      </p>
      <p id="S3.p1">Based on the similarities between sequences of mathematical formulas and textual information, in everyday use mathematical formulas are often in the form of latex encoding, thus allowing useful features to be learned from the data in a textual form. BERT-formula is a hybrid model architecture based on the BERT model encoding. A vector representation of each character is first computed, spliced with a randomly initialized vector representation of the instance query, and then jointly fed into BERT for encoding, followed by the use of unidirectional Self-Attention, which allows the queries to model connections with each other, enhancing the query semantics [<xref rid="ref027" ref-type="bibr">27</xref>]. This is followed by feature extraction through BiLSTM layer and transformer layer to improve the model's generalization capacity even more. Finally, a dynamic label assignment mechanism is designed to ascertain the best possible result for the assignment, and label assignment is regarded as a one-to-many Linear Assignment Problem (LAP) (Burkard and Cela, 1999) [<xref rid="ref028" ref-type="bibr">28</xref>]. The final predicted labelling outcome is obtained by finding the minimum cost matrix of the allocation problem and finding the optimal allocation.</p>
      <p id="S3.p2">Figure <xref ref-type="fig" rid="F1">1</xref> displays the model's framework as suggested in this paper. There are three components to it: encoder, entity prediction and dynamic label assignment, where entity classification and entity localization are the two subtasks of entity prediction. The input of the encoder part comes from textual information as well as instance queries that can learn global semantic information, and the vector representation is obtained by using embedding to extract rich syntactic and semantic elements; the entity prediction part mainly accomplishes the boundary prediction of the entity as well as the category prediction of the entity, and if more than one prediction for the same entity occurs, The one chosen to be kept is the one with the highest probability value; the dynamic label entity assignment mainly According to the allocation cost generated in the previous part to form the Cost matrix, and then further use the algorithm of the linear allocation problem to calculate the label allocation matrix with the minimum cost, so as to achieve the instance query and the allocation of entity labels, and get the prediction results of each entity to complete the recognition task.</p>
      <p id="S3.p3">To indicate an example of training, we use <inline-formula><mml:math alttext="(X,Y)" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math alttext="X" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> is a sentence with <inline-formula><mml:math alttext="N" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> words, <inline-formula><mml:math alttext="Y=\{&lt;X_{k}^{l},X_{k}^{r},X_{k}^{t}&gt;\}_{k=0}^{G-1}" class="ltx_math_unparsed" display="inline"><mml:mrow><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mo lspace="0em">&lt;</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mi>k</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mi>k</mml:mi><mml:mi>r</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mi>k</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo rspace="0em">&gt;</mml:mo><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>G</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and the three represent the index of the left boundary of the <inline-formula><mml:math alttext="k" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-th entity, the index of the right boundary, and the index of the entity type, respectively. In our study, <inline-formula><mml:math alttext="M(&gt;G)" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mspace width="0.3888888888888889em"/><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi/><mml:mo>&gt;</mml:mo><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> globally learnable instance queries are set up, it extracts an entity from the phrase in each case. During training, they can independently learn query semantics and are initialized at random. Therefore, the challenge involves using the learnable instance inquiry <inline-formula><mml:math alttext="I" display="inline"><mml:mi>I</mml:mi></mml:math></inline-formula> to extract entity <inline-formula><mml:math alttext="Y" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> from an input sentence <inline-formula><mml:math alttext="X" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula>.</p>
      <sec id="S3.SS1">
        <label>3.1</label>
        <title>Encoder</title>
        <p id="S3.SS1.p1">Two components make up the model's input: an instance query of length M and the text of a latex mathematical formula of length N. The Encoder is responsible for stitching them together into a sequence and encoding them at the same time. The instance query of length M is a randomly generated sequence of a fixed-length segment, which is composed without the aid of external knowledge and learns deep semantic information between sentences.</p>
        <p id="S3.SS1.p2">Firstly, we start with the computation of embedding, with the help of Bert embedding we compute the Token embedding, Position embedding and Type embedding of the sequence, after that we stitch the two embedding information to get the <inline-formula><mml:math alttext="E_{token}" display="inline"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>⁢</mml:mo><mml:mi>o</mml:mi><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi><mml:mo>⁢</mml:mo><mml:mi>e</mml:mi><mml:mo>⁢</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math alttext="E_{position}" display="inline"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mi>o</mml:mi><mml:mo>⁢</mml:mo><mml:mi>s</mml:mi><mml:mo>⁢</mml:mo><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>t</mml:mi><mml:mo>⁢</mml:mo><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>o</mml:mi><mml:mo>⁢</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula><mml:math alttext="E_{type}" display="inline"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>⁢</mml:mo><mml:mi>y</mml:mi><mml:mo>⁢</mml:mo><mml:mi>p</mml:mi><mml:mo>⁢</mml:mo><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
        <p>
          <disp-formula-group id="S3.E1">
            <disp-formula id="S3.E1X">
              <mml:math alttext="\displaystyle E_{\text{token}}=\text{Concat}(V,I)" display="inline">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>token</mml:mtext>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>Concat</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mi>V</mml:mi>
                      <mml:mo>,</mml:mo>
                      <mml:mi>I</mml:mi>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
            <disp-formula id="S3.E1Xa">
              <mml:math alttext="\displaystyle E_{\text{position}}=\text{Concat}(P^{w},P^{q})" display="inline">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>position</mml:mtext>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>Concat</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:msup>
                        <mml:mi>P</mml:mi>
                        <mml:mi>w</mml:mi>
                      </mml:msup>
                      <mml:mo>,</mml:mo>
                      <mml:msup>
                        <mml:mi>P</mml:mi>
                        <mml:mi>q</mml:mi>
                      </mml:msup>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
            <disp-formula id="S3.E1Xb">
              <mml:math alttext="\displaystyle E_{\text{type}}=\text{Concat}(\left[U^{w}\right]^{N},\left[U^{q}%&#10;\right]^{M})" display="inline">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>type</mml:mtext>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>Concat</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:msup>
                        <mml:mrow>
                          <mml:mo>[</mml:mo>
                          <mml:msup>
                            <mml:mi>U</mml:mi>
                            <mml:mi>w</mml:mi>
                          </mml:msup>
                          <mml:mo>]</mml:mo>
                        </mml:mrow>
                        <mml:mi>N</mml:mi>
                      </mml:msup>
                      <mml:mo>,</mml:mo>
                      <mml:msup>
                        <mml:mrow>
                          <mml:mo>[</mml:mo>
                          <mml:msup>
                            <mml:mi>U</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msup>
                          <mml:mo>]</mml:mo>
                        </mml:mrow>
                        <mml:mi>M</mml:mi>
                      </mml:msup>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS1.p3">where <inline-formula><mml:math alttext="V" display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> denotes the Token embedding of the word sequence, <inline-formula><mml:math alttext="I" display="inline"><mml:mi>I</mml:mi></mml:math></inline-formula> denotes the vector representation of the instance query, <inline-formula><mml:math alttext="P^{w}" display="inline"><mml:msup><mml:mi>P</mml:mi><mml:mi>w</mml:mi></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math alttext="P^{q}" display="inline"><mml:msup><mml:mi>P</mml:mi><mml:mi>q</mml:mi></mml:msup></mml:math></inline-formula> denote the learnable positional embedding of the text sequence and the instance query sequence, <inline-formula><mml:math alttext="U^{w}" display="inline"><mml:msup><mml:mi>U</mml:mi><mml:mi>w</mml:mi></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math alttext="U^{q}" display="inline"><mml:msup><mml:mi>U</mml:mi><mml:mi>q</mml:mi></mml:msup></mml:math></inline-formula> denote the type embedding of the text and the type embedding of the instance query respectively, <inline-formula><mml:math alttext="[.]^{N}" class="ltx_math_unparsed" display="inline"><mml:msup><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mo lspace="0em" rspace="0.167em">.</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mi>N</mml:mi></mml:msup></mml:math></inline-formula> and <inline-formula><mml:math alttext="[.]^{M}" class="ltx_math_unparsed" display="inline"><mml:msup><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mo lspace="0em" rspace="0.167em">.</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mi>M</mml:mi></mml:msup></mml:math></inline-formula> denotes the repetition of the <inline-formula><mml:math alttext="N" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula>-th and <inline-formula><mml:math alttext="M" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>-th times.</p>
        <p id="S3.SS1.p4">The further input to the encoder can thus be represented as:</p>
        <p>
          <disp-formula id="S3.Ex1">
            <mml:math alttext="H_{0}=E_{\text{token}}+E_{\text{position}}+E_{\text{type}}\in\mathbb{R}^{(N+M)%&#10;\times h}" display="block">
              <mml:mrow>
                <mml:msub>
                  <mml:mi>H</mml:mi>
                  <mml:mn>0</mml:mn>
                </mml:msub>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>token</mml:mtext>
                  </mml:msub>
                  <mml:mo>+</mml:mo>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>position</mml:mtext>
                  </mml:msub>
                  <mml:mo>+</mml:mo>
                  <mml:msub>
                    <mml:mi>E</mml:mi>
                    <mml:mtext>type</mml:mtext>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>∈</mml:mo>
                <mml:msup>
                  <mml:mi>ℝ</mml:mi>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mi>N</mml:mi>
                        <mml:mo>+</mml:mo>
                        <mml:mi>M</mml:mi>
                      </mml:mrow>
                      <mml:mo rspace="0.055em" stretchy="false">)</mml:mo>
                    </mml:mrow>
                    <mml:mo rspace="0.222em">×</mml:mo>
                    <mml:mi>h</mml:mi>
                  </mml:mrow>
                </mml:msup>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <sec id="S3.SS1.SSS1">
          <label>3.1.1</label>
          <title>One-Way Self-Attention</title>
          <p id="S3.SS1.SSS1.p1">Sentences can communicate with all instance queries thanks to common self-attention. Randomly produced instance queries might so alter the sentence's encoding and ruin its semantics. In order to keep the semantics of the sentence relatively independent from the instance queries, we replace the unidirectional form of self-attention in BERT with a version that maintains the sentence semantics largely independent from the instance queries.</p>
          <p>
            <disp-formula-group id="S3.E2">
              <disp-formula id="S3.E2X">
                <mml:math alttext="\displaystyle OW-SA(H)=\alpha HW_{v}" display="inline">
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mi>O</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mi>W</mml:mi>
                      </mml:mrow>
                      <mml:mo>−</mml:mo>
                      <mml:mrow>
                        <mml:mi>S</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mi>A</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mi>H</mml:mi>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mi>α</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>H</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:msub>
                        <mml:mi>W</mml:mi>
                        <mml:mi>v</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mrow>
                </mml:math>
              </disp-formula>
              <disp-formula id="S3.E2Xa">
                <mml:math alttext="\displaystyle\alpha=\text{softmax}\left(\frac{HW_{q}\left(HW_{k}\right)^{T}}{%&#10;\sqrt{h}}+M\right)" display="inline">
                  <mml:mrow>
                    <mml:mi>α</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mtext>softmax</mml:mtext>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mstyle displaystyle="true">
                            <mml:mfrac>
                              <mml:mrow>
                                <mml:mi>H</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:msub>
                                  <mml:mi>W</mml:mi>
                                  <mml:mi>q</mml:mi>
                                </mml:msub>
                                <mml:mo>⁢</mml:mo>
                                <mml:msup>
                                  <mml:mrow>
                                    <mml:mo>(</mml:mo>
                                    <mml:mrow>
                                      <mml:mi>H</mml:mi>
                                      <mml:mo>⁢</mml:mo>
                                      <mml:msub>
                                        <mml:mi>W</mml:mi>
                                        <mml:mi>k</mml:mi>
                                      </mml:msub>
                                    </mml:mrow>
                                    <mml:mo>)</mml:mo>
                                  </mml:mrow>
                                  <mml:mi>T</mml:mi>
                                </mml:msup>
                              </mml:mrow>
                              <mml:msqrt>
                                <mml:mi>h</mml:mi>
                              </mml:msqrt>
                            </mml:mfrac>
                          </mml:mstyle>
                          <mml:mo>+</mml:mo>
                          <mml:mi>M</mml:mi>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:math>
              </disp-formula>
            </disp-formula-group>
          </p>
          <p id="S3.SS1.SSS1.p2">where <inline-formula><mml:math alttext="W_{q},W_{k},W_{v}" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>v</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the parameters of the weight matrix, <inline-formula><mml:math alttext="M\in\{0,-\infty\}^{(N+M)\times(N+M)}" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mrow><mml:mo>−</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo rspace="0.055em" stretchy="false">)</mml:mo></mml:mrow><mml:mo rspace="0.222em">×</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the mask matrix representing the attention scores. The components that are set to 0 are reserved units, whereas the elements that are set to <inline-formula><mml:math alttext="-\infty" display="inline"><mml:mrow><mml:mo>−</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> are removed units. The top-right sub-matrix in our approach is a full matrix of size <inline-formula><mml:math alttext="N\times M" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> with all other members set to 0. This prevents instance queries from participating in sentence encoding. Furthermore, the self-attention between instance queries can improve their query semantics and model the connections between them.</p>
          <p id="S3.SS1.SSS1.p3">Following BERT encoding, we use two bidirectional LSTM layers and <inline-formula><mml:math alttext="L" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> (experimentally set to 5) extra transformer layers to further encode the phrases at the character level. Finally, we will split <inline-formula><mml:math alttext="H\in\mathbb{R}^{(N+M)\times h}" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>M</mml:mi></mml:mrow><mml:mo rspace="0.055em" stretchy="false">)</mml:mo></mml:mrow><mml:mo rspace="0.222em">×</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> into two parts: sentence encoding <inline-formula><mml:math alttext="H^{w}\in\mathbb{R}^{N\times h}" display="inline"><mml:mrow><mml:msup><mml:mi>H</mml:mi><mml:mi>w</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and instance query encoding <inline-formula><mml:math alttext="H^{q}\in\mathbb{R}^{M\times h}" display="inline"><mml:mrow><mml:msup><mml:mi>H</mml:mi><mml:mi>q</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>.</p>
        </sec>
      </sec>
      <sec id="S3.SS2">
        <label>3.2</label>
        <title>Entity Prediction</title>
        <p id="S3.SS2.p1">One entity from a phrase can be predicted by each instance query, and a maximum of M entities can be predicted concurrently with M instance inquiries. One way to think of entity prediction is as a combination of boundary and category prediction. We have designed entity pointers and entity classifiers from different perspectives.</p>
        <sec id="S3.SS2.SSS1">
          <label>3.2.1</label>
          <title>Entity Pointer</title>
          <p id="S3.SS2.SSS1.p1">Firstly, we use two linear layers to interact with each character in the phrase for the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query <inline-formula><mml:math alttext="H_{i}^{q}" display="inline"><mml:msubsup><mml:mi>H</mml:mi><mml:mi>i</mml:mi><mml:mi>q</mml:mi></mml:msubsup></mml:math></inline-formula>. The fused representation of the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query and the <inline-formula><mml:math alttext="j" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>-th character can be computed as:</p>
          <p>
            <disp-formula id="S3.E3">
              <mml:math alttext="S_{ij}^{\delta}=\text{ReLU}(H_{i}^{q}W_{\delta}^{q}+H_{j}^{w}W_{\delta}^{w})" display="block">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>S</mml:mi>
                    <mml:mrow>
                      <mml:mi>i</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>j</mml:mi>
                    </mml:mrow>
                    <mml:mi>δ</mml:mi>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>ReLU</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>H</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>W</mml:mi>
                            <mml:mi>δ</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mo>+</mml:mo>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>H</mml:mi>
                            <mml:mi>j</mml:mi>
                            <mml:mi>w</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>W</mml:mi>
                            <mml:mi>δ</mml:mi>
                            <mml:mi>w</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                      </mml:mrow>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS2.SSS1.p2">where <inline-formula><mml:math alttext="\delta\in\{l,r\}" display="inline"><mml:mrow><mml:mi>δ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> represents the left and right boundaries, <inline-formula><mml:math alttext="W_{\delta}^{q},W_{\delta}^{w}\in\mathbb{R}^{h\times h}" display="inline"><mml:mrow><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mi>δ</mml:mi><mml:mi>q</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>W</mml:mi><mml:mi>δ</mml:mi><mml:mi>w</mml:mi></mml:msubsup></mml:mrow><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are trainable projection parameters. Next, we determine the possibility that the sentence's <inline-formula><mml:math alttext="j" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>-th word represents a left or right boundary.</p>
          <p>
            <disp-formula id="S3.E4">
              <mml:math alttext="P_{ij}^{\delta}=\text{sigmoid}(S_{ij}^{\delta}W_{\delta}+b_{\delta})" display="block">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>P</mml:mi>
                    <mml:mrow>
                      <mml:mi>i</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>j</mml:mi>
                    </mml:mrow>
                    <mml:mi>δ</mml:mi>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>sigmoid</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>S</mml:mi>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>j</mml:mi>
                            </mml:mrow>
                            <mml:mi>δ</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msub>
                            <mml:mi>W</mml:mi>
                            <mml:mi>δ</mml:mi>
                          </mml:msub>
                        </mml:mrow>
                        <mml:mo>+</mml:mo>
                        <mml:msub>
                          <mml:mi>b</mml:mi>
                          <mml:mi>δ</mml:mi>
                        </mml:msub>
                      </mml:mrow>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS2.SSS1.p3">where <inline-formula><mml:math alttext="W_{\delta}\in\mathbb{R}^{h}" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>δ</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>h</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math alttext="W_{\delta}\in\mathbb{R}^{h}" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>δ</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>h</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are learnable parameters.</p>
        </sec>
        <sec id="S3.SS2.SSS2">
          <label>3.2.2</label>
          <title>Entity Classifier</title>
          <p id="S3.SS2.SSS2.p1">Information about entity boundaries is helpful for classifying entities. We use <inline-formula><mml:math alttext="P_{i}^{\delta}=[P_{i}^{\delta_{0}},P_{i}^{\delta_{1}},\dots,P_{i}^{\delta_{N-1%&#10;}}]" display="inline"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>i</mml:mi><mml:mi>δ</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>δ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>δ</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:msubsup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>δ</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>−</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:msubsup><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math alttext="\delta\in\{l,r\}" display="inline"><mml:mrow><mml:mi>δ</mml:mi><mml:mo>∈</mml:mo><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mi>l</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo stretchy="false">}</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> to evaluate each word and relate it to the instance query. It is possible to calculate the boundary-aware representation of the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query as:</p>
          <p>
            <disp-formula id="S3.E5">
              <mml:math alttext="S_{i}^{\delta}=\text{ReLU}([H_{i}^{q}W_{t}^{q};P_{i}^{l}H^{w};P_{i}^{r}H^{w}])" display="block">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>S</mml:mi>
                    <mml:mi>i</mml:mi>
                    <mml:mi>δ</mml:mi>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mtext>ReLU</mml:mtext>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">[</mml:mo>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>H</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>W</mml:mi>
                            <mml:mi>t</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mo>;</mml:mo>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>P</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>l</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msup>
                            <mml:mi>H</mml:mi>
                            <mml:mi>w</mml:mi>
                          </mml:msup>
                        </mml:mrow>
                        <mml:mo>;</mml:mo>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>P</mml:mi>
                            <mml:mi>i</mml:mi>
                            <mml:mi>r</mml:mi>
                          </mml:msubsup>
                          <mml:mo>⁢</mml:mo>
                          <mml:msup>
                            <mml:mi>H</mml:mi>
                            <mml:mi>w</mml:mi>
                          </mml:msup>
                        </mml:mrow>
                        <mml:mo stretchy="false">]</mml:mo>
                      </mml:mrow>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS2.SSS2.p2">where <inline-formula><mml:math alttext="W_{t}^{q}\in\mathbb{R}^{h\times k}" display="inline"><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mi>t</mml:mi><mml:mi>q</mml:mi></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is a learnable parameter. The probability that the entity that the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query is attempting to query falls into category <inline-formula><mml:math alttext="c" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> can then be obtained:</p>
          <p>
            <disp-formula id="S3.E6">
              <mml:math alttext="P_{ic}^{t}=\frac{\exp(S_{i}^{t}W_{t}^{c}+b_{t}^{c})}{\sum_{c^{\prime}\in%&#10;\varepsilon}\exp(S_{i}^{t}W_{t}^{c^{\prime}}+b_{t}^{c^{\prime}})}" display="block">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>P</mml:mi>
                    <mml:mrow>
                      <mml:mi>i</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>c</mml:mi>
                    </mml:mrow>
                    <mml:mi>t</mml:mi>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mfrac>
                    <mml:mrow>
                      <mml:mi>exp</mml:mi>
                      <mml:mo>⁡</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:msubsup>
                              <mml:mi>S</mml:mi>
                              <mml:mi>i</mml:mi>
                              <mml:mi>t</mml:mi>
                            </mml:msubsup>
                            <mml:mo>⁢</mml:mo>
                            <mml:msubsup>
                              <mml:mi>W</mml:mi>
                              <mml:mi>t</mml:mi>
                              <mml:mi>c</mml:mi>
                            </mml:msubsup>
                          </mml:mrow>
                          <mml:mo>+</mml:mo>
                          <mml:msubsup>
                            <mml:mi>b</mml:mi>
                            <mml:mi>t</mml:mi>
                            <mml:mi>c</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mo>∑</mml:mo>
                        <mml:mrow>
                          <mml:msup>
                            <mml:mi>c</mml:mi>
                            <mml:mo>′</mml:mo>
                          </mml:msup>
                          <mml:mo>∈</mml:mo>
                          <mml:mi>ε</mml:mi>
                        </mml:mrow>
                      </mml:msub>
                      <mml:mrow>
                        <mml:mi>exp</mml:mi>
                        <mml:mo>⁡</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:msubsup>
                                <mml:mi>S</mml:mi>
                                <mml:mi>i</mml:mi>
                                <mml:mi>t</mml:mi>
                              </mml:msubsup>
                              <mml:mo>⁢</mml:mo>
                              <mml:msubsup>
                                <mml:mi>W</mml:mi>
                                <mml:mi>t</mml:mi>
                                <mml:msup>
                                  <mml:mi>c</mml:mi>
                                  <mml:mo>′</mml:mo>
                                </mml:msup>
                              </mml:msubsup>
                            </mml:mrow>
                            <mml:mo>+</mml:mo>
                            <mml:msubsup>
                              <mml:mi>b</mml:mi>
                              <mml:mi>t</mml:mi>
                              <mml:msup>
                                <mml:mi>c</mml:mi>
                                <mml:mo>′</mml:mo>
                              </mml:msup>
                            </mml:msubsup>
                          </mml:mrow>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mfrac>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </p>
          <p id="S3.SS2.SSS2.p3">where <inline-formula><mml:math alttext="W_{t}^{c^{\prime}}\in\mathbb{R}^{h}" display="inline"><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mi>t</mml:mi><mml:msup><mml:mi>c</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:msubsup><mml:mo>∈</mml:mo><mml:msup><mml:mi>ℝ</mml:mi><mml:mi>h</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math alttext="b_{t}^{c^{\prime}}" display="inline"><mml:msubsup><mml:mi>b</mml:mi><mml:mi>t</mml:mi><mml:msup><mml:mi>c</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:msubsup></mml:math></inline-formula> are learnable parameters.</p>
          <p id="S3.SS2.SSS2.p4">Finally, the entity predicted by the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query is <inline-formula><mml:math alttext="T_{i}=(T_{i}^{l},T_{i}^{r},T_{i}^{t})" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math alttext="T_{i}^{l}=\arg\max_{j}(P_{ij}^{l})" display="inline"><mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>arg</mml:mi><mml:mo lspace="0.167em">⁡</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math alttext="T_{i}^{r}=\arg\max_{j}(P_{ij}^{r})" display="inline"><mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>r</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>arg</mml:mi><mml:mo lspace="0.167em">⁡</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>r</mml:mi></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> is the left and right boundaries, and <inline-formula><mml:math alttext="T_{i}^{t}=\arg\max_{c}(P_{ic}^{t})" display="inline"><mml:mrow><mml:msubsup><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>arg</mml:mi><mml:mo lspace="0.167em">⁡</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mi>t</mml:mi></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> is the entity type.</p>
          <p id="S3.SS2.SSS2.p5">We extract entities in parallel by performing entity classification and entity localization for every instance query. If multiple instance queries predict different types when locating the same entity, then we keep the one with the highest classification probability as the prediction.</p>
        </sec>
      </sec>
      <sec id="S3.SS3">
        <label>3.3</label>
        <title>Dynamic Label Assignment</title>
        <p id="S3.SS3.p1">Since instance queries are implicit (not in natural language form), we are unable to assign optimal entities to them in advance. In order to address this issue, we are going to assign labels to instance queries during training dynamically. Label assignment is specifically thought of as a linear assignment problem. Any entity can be assigned to any instance query, and the cost incurred may vary due to instance query assignment. We define the cost of assigning the kth entity (<inline-formula><mml:math alttext="Y_{k}=&lt;Y_{k}^{l},Y_{k}^{r},Y_{k}^{t}&gt;" class="ltx_math_unparsed" display="inline"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo rspace="0em">=</mml:mo><mml:mo lspace="0em">&lt;</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>r</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula>) to the ith instance query as:</p>
        <p>
          <disp-formula id="S3.E7">
            <mml:math alttext="\text{Cost}_{ik}=-(P_{iY_{k}^{t}}^{t}+P_{iY_{k}^{l}}^{l}+P_{iY_{k}^{r}}^{r})" display="block">
              <mml:mrow>
                <mml:msub>
                  <mml:mtext>Cost</mml:mtext>
                  <mml:mrow>
                    <mml:mi>i</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mi>k</mml:mi>
                  </mml:mrow>
                </mml:msub>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>−</mml:mo>
                  <mml:mrow>
                    <mml:mo stretchy="false">(</mml:mo>
                    <mml:mrow>
                      <mml:msubsup>
                        <mml:mi>P</mml:mi>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>Y</mml:mi>
                            <mml:mi>k</mml:mi>
                            <mml:mi>t</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mi>t</mml:mi>
                      </mml:msubsup>
                      <mml:mo>+</mml:mo>
                      <mml:msubsup>
                        <mml:mi>P</mml:mi>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>Y</mml:mi>
                            <mml:mi>k</mml:mi>
                            <mml:mi>l</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mi>l</mml:mi>
                      </mml:msubsup>
                      <mml:mo>+</mml:mo>
                      <mml:msubsup>
                        <mml:mi>P</mml:mi>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:msubsup>
                            <mml:mi>Y</mml:mi>
                            <mml:mi>k</mml:mi>
                            <mml:mi>r</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mi>r</mml:mi>
                      </mml:msubsup>
                    </mml:mrow>
                    <mml:mo stretchy="false">)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS3.p2">where <inline-formula><mml:math alttext="Y_{k}^{l},Y_{k}^{r},Y_{k}^{t}" display="inline"><mml:mrow><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>r</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Y</mml:mi><mml:mi>k</mml:mi><mml:mi>t</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> indicates the entity type of the <inline-formula><mml:math alttext="k" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-th entity, the index of the left boundary and the right boundary. In order to allocate as many entities as possible, A maximum of one entity per query and a maximum of one query per entity must be allocated, which ensures that the total cost of allocation is minimized. Nevertheless, Numerous instance queries are not allocated to the best possible entities, and the one-to-one rule fully utilizes instance queries. In order to enable one entity to be assigned to more than one instance query, we therefore extend the traditional LAP (Linear Assignment Problem) to one-to-many. The optimization objective of this one-to-many LAP is explained as:</p>
        <p>
          <disp-formula-group id="S3.E8">
            <disp-formula id="S3.E8X">
              <mml:math alttext="\displaystyle\min\sum_{i=0}^{M-1}\sum_{k=0}^{G-1}A_{ik}\text{Cost}_{ik}\quad%&#10;\text{s.t.}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi>min</mml:mi>
                    <mml:mo lspace="0.167em">⁢</mml:mo>
                    <mml:mrow>
                      <mml:mstyle displaystyle="true">
                        <mml:munderover>
                          <mml:mo movablelimits="false">∑</mml:mo>
                          <mml:mrow>
                            <mml:mi>i</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>0</mml:mn>
                          </mml:mrow>
                          <mml:mrow>
                            <mml:mi>M</mml:mi>
                            <mml:mo>−</mml:mo>
                            <mml:mn>1</mml:mn>
                          </mml:mrow>
                        </mml:munderover>
                      </mml:mstyle>
                      <mml:mrow>
                        <mml:mstyle displaystyle="true">
                          <mml:munderover>
                            <mml:mo movablelimits="false">∑</mml:mo>
                            <mml:mrow>
                              <mml:mi>k</mml:mi>
                              <mml:mo>=</mml:mo>
                              <mml:mn>0</mml:mn>
                            </mml:mrow>
                            <mml:mrow>
                              <mml:mi>G</mml:mi>
                              <mml:mo>−</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                          </mml:munderover>
                        </mml:mstyle>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>A</mml:mi>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>k</mml:mi>
                            </mml:mrow>
                          </mml:msub>
                          <mml:mo>⁢</mml:mo>
                          <mml:msub>
                            <mml:mtext>Cost</mml:mtext>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>k</mml:mi>
                            </mml:mrow>
                          </mml:msub>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mspace width="1em"/>
                  <mml:mtext>s.t.</mml:mtext>
                </mml:mrow>
              </mml:math>
            </disp-formula>
            <disp-formula id="S3.E8Xa">
              <mml:math alttext="\displaystyle\sum_{k}A_{ik}\leq 1,\sum_{i}A_{ik}=q_{k},\forall i,k,A_{ik}\in\{%&#10;0,1\}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mstyle displaystyle="true">
                        <mml:munder>
                          <mml:mo movablelimits="false">∑</mml:mo>
                          <mml:mi>k</mml:mi>
                        </mml:munder>
                      </mml:mstyle>
                      <mml:msub>
                        <mml:mi>A</mml:mi>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>k</mml:mi>
                        </mml:mrow>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>≤</mml:mo>
                    <mml:mn>1</mml:mn>
                  </mml:mrow>
                  <mml:mo>,</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mstyle displaystyle="true">
                          <mml:munder>
                            <mml:mo movablelimits="false">∑</mml:mo>
                            <mml:mi>i</mml:mi>
                          </mml:munder>
                        </mml:mstyle>
                        <mml:msub>
                          <mml:mi>A</mml:mi>
                          <mml:mrow>
                            <mml:mi>i</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mi>k</mml:mi>
                          </mml:mrow>
                        </mml:msub>
                      </mml:mrow>
                      <mml:mo>=</mml:mo>
                      <mml:mrow>
                        <mml:msub>
                          <mml:mi>q</mml:mi>
                          <mml:mi>k</mml:mi>
                        </mml:msub>
                        <mml:mo>,</mml:mo>
                        <mml:mrow>
                          <mml:mo rspace="0.167em">∀</mml:mo>
                          <mml:mi>i</mml:mi>
                        </mml:mrow>
                        <mml:mo>,</mml:mo>
                        <mml:mi>k</mml:mi>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo>,</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>A</mml:mi>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>k</mml:mi>
                        </mml:mrow>
                      </mml:msub>
                      <mml:mo>∈</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">{</mml:mo>
                        <mml:mn>0</mml:mn>
                        <mml:mo>,</mml:mo>
                        <mml:mn>1</mml:mn>
                        <mml:mo stretchy="false">}</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS3.p4">where <inline-formula><mml:math alttext="A\in\{0,1\}^{M\times G}" display="inline"><mml:mrow><mml:mi>A</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the matrix of allocation, <inline-formula><mml:math alttext="G" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula> indicates the number of entities, <inline-formula><mml:math alttext="A_{ik}=1" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>⁢</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> denotes the <inline-formula><mml:math alttext="k" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-th entity allocated to the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th instance query. <inline-formula><mml:math alttext="q_{k}" display="inline"><mml:msub><mml:mi>q</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:math></inline-formula> indicates the assignable quantity of the <inline-formula><mml:math alttext="k" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-th entity, <inline-formula><mml:math alttext="Q=\sum_{k}q_{k}" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo rspace="0.111em">=</mml:mo><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mi>q</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:math></inline-formula> represents all entities' total assignable quantity. The assignable numbers of each entity type are balanced in our experiments.</p>
        <p id="S3.SS3.p5">Then we used the Auction Algorithm to settle the entity allocation problem by generating the label allocation matrix with the lowest possible overall cost. However, there are more instance inquiries than there are entity labels available for allocation <inline-formula><mml:math alttext="(M&gt;Q)" display="inline"><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>M</mml:mi><mml:mo>&gt;</mml:mo><mml:mi>Q</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>. Consequently, certain instance queries won't be assigned to any entity label. We expand the allocation matrix by one column and assign them "None" labels. The vector a of the new column is set up as follows:</p>
        <p>
          <disp-formula id="S3.E9">
            <mml:math alttext="a=\begin{cases}0,&amp;\sum_{k}A_{ik}=1\\&#10;1,&amp;\sum_{k}A_{ik}=0\end{cases}" display="block">
              <mml:mrow>
                <mml:mi>a</mml:mi>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>{</mml:mo>
                  <mml:mtable columnspacing="5pt" displaystyle="true" rowspacing="0pt">
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mn>0</mml:mn>
                          <mml:mo>,</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mstyle displaystyle="false">
                              <mml:msub>
                                <mml:mo>∑</mml:mo>
                                <mml:mi>k</mml:mi>
                              </mml:msub>
                            </mml:mstyle>
                            <mml:msub>
                              <mml:mi>A</mml:mi>
                              <mml:mrow>
                                <mml:mi>i</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>k</mml:mi>
                              </mml:mrow>
                            </mml:msub>
                          </mml:mrow>
                          <mml:mo>=</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                      </mml:mtd>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mn>1</mml:mn>
                          <mml:mo>,</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mstyle displaystyle="false">
                              <mml:msub>
                                <mml:mo>∑</mml:mo>
                                <mml:mi>k</mml:mi>
                              </mml:msub>
                            </mml:mstyle>
                            <mml:msub>
                              <mml:mi>A</mml:mi>
                              <mml:mrow>
                                <mml:mi>i</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mi>k</mml:mi>
                              </mml:mrow>
                            </mml:msub>
                          </mml:mrow>
                          <mml:mo>=</mml:mo>
                          <mml:mn>0</mml:mn>
                        </mml:mrow>
                      </mml:mtd>
                    </mml:mtr>
                  </mml:mtable>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS3.p6">On account of new allocation matrix <inline-formula><mml:math alttext="\hat{A}\in\{0,1\}^{M\times(G+1)}" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>A</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mrow><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>G</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, we are able to obtain the labels of <inline-formula><mml:math alttext="M" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> instance queries <inline-formula><mml:math alttext="\hat{Y}=Y.\text{indexby}(\pi^{*})" display="inline"><mml:mrow><mml:mrow><mml:mover accent="true"><mml:mi>Y</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo lspace="0em" rspace="0.167em">.</mml:mo><mml:mrow><mml:mtext>indexby</mml:mtext><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>π</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math alttext="\pi^{*}=\arg\max_{\text{dim}=1}(\hat{A})" display="inline"><mml:mrow><mml:msup><mml:mi>π</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mi>arg</mml:mi><mml:mo lspace="0.167em">⁡</mml:mo><mml:mrow><mml:msub><mml:mi>max</mml:mi><mml:mrow><mml:mtext>dim</mml:mtext><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>⁡</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mover accent="true"><mml:mi>A</mml:mi><mml:mo>^</mml:mo></mml:mover><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mrow></mml:math></inline-formula> is the index vector of labels of instance queries in the optimal allocation case.</p>
      </sec>
      <sec id="S3.SS4">
        <label>3.4</label>
        <title>Training objectives</title>
        <p id="S3.SS4.p1">After the above steps, we as well compute the entity prediction results for M instance queries and get their labels when the total allocation cost is minimum. We specify the classification loss and boundary loss for training model. We utilize the binary cross entropy function to get the loss values for left and right border prediction:</p>
        <p>
          <disp-formula id="S3.E10">
            <mml:math alttext="L_{b}=-\sum_{\delta\in\{l,r\}}\sum_{i=0}^{M-1}\sum_{j=0}^{N-1}\left[\hat{Y}_{i%&#10;}^{c}=c\right]\log P_{ic}^{t}+\left[\hat{Y}_{i}^{\delta}\neq c\right]\log(1-P_%&#10;{ic}^{t})" display="inline">
              <mml:mrow>
                <mml:msub>
                  <mml:mi>L</mml:mi>
                  <mml:mi>b</mml:mi>
                </mml:msub>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>−</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mo rspace="0em">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>δ</mml:mi>
                          <mml:mo>∈</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">{</mml:mo>
                            <mml:mi>l</mml:mi>
                            <mml:mo>,</mml:mo>
                            <mml:mi>r</mml:mi>
                            <mml:mo stretchy="false">}</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:msub>
                      <mml:mrow>
                        <mml:msubsup>
                          <mml:mo rspace="0em">∑</mml:mo>
                          <mml:mrow>
                            <mml:mi>i</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>0</mml:mn>
                          </mml:mrow>
                          <mml:mrow>
                            <mml:mi>M</mml:mi>
                            <mml:mo>−</mml:mo>
                            <mml:mn>1</mml:mn>
                          </mml:mrow>
                        </mml:msubsup>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mo rspace="0em">∑</mml:mo>
                            <mml:mrow>
                              <mml:mi>j</mml:mi>
                              <mml:mo>=</mml:mo>
                              <mml:mn>0</mml:mn>
                            </mml:mrow>
                            <mml:mrow>
                              <mml:mi>N</mml:mi>
                              <mml:mo>−</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                          </mml:msubsup>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mo>[</mml:mo>
                              <mml:mrow>
                                <mml:msubsup>
                                  <mml:mover accent="true">
                                    <mml:mi>Y</mml:mi>
                                    <mml:mo>^</mml:mo>
                                  </mml:mover>
                                  <mml:mi>i</mml:mi>
                                  <mml:mi>c</mml:mi>
                                </mml:msubsup>
                                <mml:mo>=</mml:mo>
                                <mml:mi>c</mml:mi>
                              </mml:mrow>
                              <mml:mo>]</mml:mo>
                            </mml:mrow>
                            <mml:mo lspace="0.167em">⁢</mml:mo>
                            <mml:mrow>
                              <mml:mi>log</mml:mi>
                              <mml:mo lspace="0.167em">⁡</mml:mo>
                              <mml:msubsup>
                                <mml:mi>P</mml:mi>
                                <mml:mrow>
                                  <mml:mi>i</mml:mi>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mi>c</mml:mi>
                                </mml:mrow>
                                <mml:mi>t</mml:mi>
                              </mml:msubsup>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>+</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mo>[</mml:mo>
                      <mml:mrow>
                        <mml:msubsup>
                          <mml:mover accent="true">
                            <mml:mi>Y</mml:mi>
                            <mml:mo>^</mml:mo>
                          </mml:mover>
                          <mml:mi>i</mml:mi>
                          <mml:mi>δ</mml:mi>
                        </mml:msubsup>
                        <mml:mo>≠</mml:mo>
                        <mml:mi>c</mml:mi>
                      </mml:mrow>
                      <mml:mo>]</mml:mo>
                    </mml:mrow>
                    <mml:mo lspace="0.167em">⁢</mml:mo>
                    <mml:mrow>
                      <mml:mi>log</mml:mi>
                      <mml:mo>⁡</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mrow>
                          <mml:mn>1</mml:mn>
                          <mml:mo>−</mml:mo>
                          <mml:msubsup>
                            <mml:mi>P</mml:mi>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>c</mml:mi>
                            </mml:mrow>
                            <mml:mi>t</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS4.p2">We employ the cross entropy function for entity categorization to calculate the loss value:</p>
        <p>
          <disp-formula id="S3.E11">
            <mml:math alttext="L_{b}=-\sum_{i=0}^{M-1}\sum_{c\in\varepsilon}\left[\hat{Y}_{i}^{t}=c\right]%&#10;\log P_{ic}^{t}" display="block">
              <mml:mrow>
                <mml:msub>
                  <mml:mi>L</mml:mi>
                  <mml:mi>b</mml:mi>
                </mml:msub>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mo>−</mml:mo>
                  <mml:mrow>
                    <mml:munderover>
                      <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                      <mml:mrow>
                        <mml:mi>i</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:mn>0</mml:mn>
                      </mml:mrow>
                      <mml:mrow>
                        <mml:mi>M</mml:mi>
                        <mml:mo>−</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                    </mml:munderover>
                    <mml:mrow>
                      <mml:munder>
                        <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>c</mml:mi>
                          <mml:mo>∈</mml:mo>
                          <mml:mi>ε</mml:mi>
                        </mml:mrow>
                      </mml:munder>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:mo>[</mml:mo>
                          <mml:mrow>
                            <mml:msubsup>
                              <mml:mover accent="true">
                                <mml:mi>Y</mml:mi>
                                <mml:mo>^</mml:mo>
                              </mml:mover>
                              <mml:mi>i</mml:mi>
                              <mml:mi>t</mml:mi>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:mi>c</mml:mi>
                          </mml:mrow>
                          <mml:mo>]</mml:mo>
                        </mml:mrow>
                        <mml:mo lspace="0.167em">⁢</mml:mo>
                        <mml:mrow>
                          <mml:mi>log</mml:mi>
                          <mml:mo lspace="0.167em">⁡</mml:mo>
                          <mml:msubsup>
                            <mml:mi>P</mml:mi>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mi>c</mml:mi>
                            </mml:mrow>
                            <mml:mi>t</mml:mi>
                          </mml:msubsup>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS4.p3">where <inline-formula><mml:math alttext="1[\omega]" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>ω</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> represents an indicator function that <inline-formula><mml:math alttext="\omega" display="inline"><mml:mi>ω</mml:mi></mml:math></inline-formula> takes 1 when true and 0 otherwise.</p>
        <p id="S3.SS4.p4">Following Al-Rfou et al. [<xref rid="ref029" ref-type="bibr">29</xref>] and Carion et al. [<xref rid="ref030" ref-type="bibr">30</xref>], after every character level transformer layer, we add an entity pointer and an entity classifier, allowing us to obtain two loss values at each layer. As a result, the overall loss on training set <inline-formula><mml:math alttext="D" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula> can be described as:</p>
        <p>
          <disp-formula id="S3.E12">
            <mml:math alttext="L=\sum_{D}\sum_{\tau=1}^{L}L_{t}^{\tau}+L_{b}^{\tau}" display="block">
              <mml:mrow>
                <mml:mi>L</mml:mi>
                <mml:mo rspace="0.111em">=</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:munder>
                      <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                      <mml:mi>D</mml:mi>
                    </mml:munder>
                    <mml:mrow>
                      <mml:munderover>
                        <mml:mo movablelimits="false">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>τ</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mi>L</mml:mi>
                      </mml:munderover>
                      <mml:msubsup>
                        <mml:mi>L</mml:mi>
                        <mml:mi>t</mml:mi>
                        <mml:mi>τ</mml:mi>
                      </mml:msubsup>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>+</mml:mo>
                  <mml:msubsup>
                    <mml:mi>L</mml:mi>
                    <mml:mi>b</mml:mi>
                    <mml:mi>τ</mml:mi>
                  </mml:msubsup>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S3.SS4.p5">where <inline-formula><mml:math alttext="L_{t}^{\tau}" display="inline"><mml:msubsup><mml:mi>L</mml:mi><mml:mi>t</mml:mi><mml:mi>τ</mml:mi></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math alttext="L_{b}^{\tau}" display="inline"><mml:msubsup><mml:mi>L</mml:mi><mml:mi>b</mml:mi><mml:mi>τ</mml:mi></mml:msubsup></mml:math></inline-formula> are the classification loss and boundary loss of the <inline-formula><mml:math alttext="\tau" display="inline"><mml:mi>τ</mml:mi></mml:math></inline-formula> layer, respectively. In order to predict, we solely use entity prediction at the last layer for prediction.</p>
        <p>
          <fig id="F2">
            <label>Figure 2.</label>
            <caption>
              <p>Knowledge graph based learning path recommendation scheme map.</p>
            </caption>
            <graphic xlink:href="fig2.jpg"/>
          </fig>
        </p>
      </sec>
    </sec>
    <sec id="S4">
      <label>4.</label>
      <title>Research on Learning Path Recommendation Based on Knowledge Graph</title>
      <p id="S4.p1">Knowledge graphs (KGs) serve as a pedagogical and computational scaffold for structuring fragmented knowledge domains, enabling the generation of logically coherent learning paths that respect intrinsic educational dependencies such as prerequisite sequences, hierarchical inclusions, and conceptual analogies. Unlike traditional methods, which often prioritize isolated content delivery, KGs formalize knowledge topology through nodes (e.g., "Gradient Descent") and edges (e.g., "is_prerequisite_for"), allowing systematic traversal from a target node (e.g., "Convolutional Neural Networks") to prerequisite foundations (e.g., "Linear Algebra" or "Image Processing"). This structural rigor ensures learning continuity by mandating that no critical intermediate concepts are omitted, thereby preserving the integrity of pedagogical logic. For instance, omitting "Backpropagation" in a path toward "Deep Reinforcement Learning" would violate prerequisite dependencies, undermining learning efficacy. To operationalize this, a hybrid recommendation strategy is proposed, integrating inclusion-relation prioritization and multi-criteria node ranking. First, inclusion relations (e.g., "Machine Learning Fundamentals" <inline-formula><mml:math alttext="\supset" display="inline"><mml:mo>⊃</mml:mo></mml:math></inline-formula> "Supervised Learning") are extracted using subgraph isolation:</p>
      <p>
        <disp-formula id="S4.E1">
          <mml:math alttext="G_{\text{include}}=\{v\in V\mid\exists e(v,v_{\text{target}})\,r_{\text{%&#10;include}}\in E\}" display="block">
            <mml:mrow>
              <mml:msub>
                <mml:mi>G</mml:mi>
                <mml:mtext>include</mml:mtext>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mrow>
                <mml:mo stretchy="false">{</mml:mo>
                <mml:mrow>
                  <mml:mi>v</mml:mi>
                  <mml:mo>∈</mml:mo>
                  <mml:mi>V</mml:mi>
                </mml:mrow>
                <mml:mo fence="true" lspace="0em" rspace="0.167em">∣</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo rspace="0.167em">∃</mml:mo>
                    <mml:mrow>
                      <mml:mi>e</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mi>v</mml:mi>
                        <mml:mo>,</mml:mo>
                        <mml:msub>
                          <mml:mi>v</mml:mi>
                          <mml:mtext>target</mml:mtext>
                        </mml:msub>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                      <mml:mo lspace="0.170em">⁢</mml:mo>
                      <mml:msub>
                        <mml:mi>r</mml:mi>
                        <mml:mtext>include</mml:mtext>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>∈</mml:mo>
                  <mml:mi>E</mml:mi>
                </mml:mrow>
                <mml:mo stretchy="false">}</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
      </p>
      <p id="S4.p2">where <inline-formula><mml:math alttext="G_{\text{include}}" display="inline"><mml:msub><mml:mi>G</mml:mi><mml:mtext>include</mml:mtext></mml:msub></mml:math></inline-formula> denotes the subgraph of nodes directly containing or contained by the target node <inline-formula><mml:math alttext="v_{\text{target}}" display="inline"><mml:msub><mml:mi>v</mml:mi><mml:mtext>target</mml:mtext></mml:msub></mml:math></inline-formula>. Subsequently, candidate nodes are ranked by a composite metric balancing association strength and learning cost. Association strength is quantified via a weighted combination of inverse shortest-path distance <inline-formula><mml:math alttext="\text{Distance}(v,v_{\text{target}})" display="inline"><mml:mrow><mml:mtext>Distance</mml:mtext><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mtext>target</mml:mtext></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> and node centrality <inline-formula><mml:math alttext="\text{Degree}(v)" display="inline"><mml:mrow><mml:mtext>Degree</mml:mtext><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>;</p>
      <p>
        <disp-formula id="S4.E2">
          <mml:math alttext="\text{Score}_{\text{association}}(v)=\alpha\cdot\frac{1}{\text{Distance}(v,v_{%&#10;\text{target}})}+\beta\cdot\text{Degree}(v)" display="block">
            <mml:mrow>
              <mml:mrow>
                <mml:msub>
                  <mml:mtext>Score</mml:mtext>
                  <mml:mtext>association</mml:mtext>
                </mml:msub>
                <mml:mo>⁢</mml:mo>
                <mml:mrow>
                  <mml:mo stretchy="false">(</mml:mo>
                  <mml:mi>v</mml:mi>
                  <mml:mo stretchy="false">)</mml:mo>
                </mml:mrow>
              </mml:mrow>
              <mml:mo>=</mml:mo>
              <mml:mrow>
                <mml:mrow>
                  <mml:mi>α</mml:mi>
                  <mml:mo lspace="0.222em" rspace="0.222em">⋅</mml:mo>
                  <mml:mfrac>
                    <mml:mn>1</mml:mn>
                    <mml:mrow>
                      <mml:mtext>Distance</mml:mtext>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mi>v</mml:mi>
                        <mml:mo>,</mml:mo>
                        <mml:msub>
                          <mml:mi>v</mml:mi>
                          <mml:mtext>target</mml:mtext>
                        </mml:msub>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mfrac>
                </mml:mrow>
                <mml:mo>+</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi>β</mml:mi>
                    <mml:mo lspace="0.222em" rspace="0.222em">⋅</mml:mo>
                    <mml:mtext>Degree</mml:mtext>
                  </mml:mrow>
                  <mml:mo>⁢</mml:mo>
                  <mml:mrow>
                    <mml:mo stretchy="false">(</mml:mo>
                    <mml:mi>v</mml:mi>
                    <mml:mo stretchy="false">)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
      </p>
      <p id="S4.p3">where <inline-formula><mml:math alttext="\alpha" display="inline"><mml:mi>α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math alttext="\beta" display="inline"><mml:mi>β</mml:mi></mml:math></inline-formula> calibrated through learner feedback or A/B testing. Nodes with higher scores are prioritized, while ties are resolved using learning cost <inline-formula><mml:math alttext="C(v)" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>, modeled as a function of historical learner engagement (<inline-formula><mml:math alttext="t_{\text{avg}}" display="inline"><mml:msub><mml:mi>t</mml:mi><mml:mtext>avg</mml:mtext></mml:msub></mml:math></inline-formula>) and resource complexity <inline-formula><mml:math alttext="RC(v)" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>⁢</mml:mo><mml:mi>C</mml:mi><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>v</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula>:</p>
      <p>
        <disp-formula id="S4.E3">
          <mml:math alttext="C(v)=\gamma\cdot t_{\text{avg}}+(1-\gamma)\cdot RC(v)" display="block">
            <mml:mrow>
              <mml:mrow>
                <mml:mi>C</mml:mi>
                <mml:mo>⁢</mml:mo>
                <mml:mrow>
                  <mml:mo stretchy="false">(</mml:mo>
                  <mml:mi>v</mml:mi>
                  <mml:mo stretchy="false">)</mml:mo>
                </mml:mrow>
              </mml:mrow>
              <mml:mo>=</mml:mo>
              <mml:mrow>
                <mml:mrow>
                  <mml:mi>γ</mml:mi>
                  <mml:mo lspace="0.222em" rspace="0.222em">⋅</mml:mo>
                  <mml:msub>
                    <mml:mi>t</mml:mi>
                    <mml:mtext>avg</mml:mtext>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>+</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mn>1</mml:mn>
                        <mml:mo>−</mml:mo>
                        <mml:mi>γ</mml:mi>
                      </mml:mrow>
                      <mml:mo rspace="0.055em" stretchy="false">)</mml:mo>
                    </mml:mrow>
                    <mml:mo rspace="0.222em">⋅</mml:mo>
                    <mml:mi>R</mml:mi>
                  </mml:mrow>
                  <mml:mo>⁢</mml:mo>
                  <mml:mi>C</mml:mi>
                  <mml:mo>⁢</mml:mo>
                  <mml:mrow>
                    <mml:mo stretchy="false">(</mml:mo>
                    <mml:mi>v</mml:mi>
                    <mml:mo stretchy="false">)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
      </p>
      <p id="S4.p4">where <inline-formula><mml:math alttext="\gamma" display="inline"><mml:mi>γ</mml:mi></mml:math></inline-formula> adjusts the weight of temporal versus cognitive load. This framework aligns with graph-based recommendation paradigms such as Zhang et al. [<xref rid="ref031" ref-type="bibr">31</xref>] graph convolutional network (GCN) approach, which leverages node embeddings to capture latent educational dependencies, and Zheng et al. [<xref rid="ref032" ref-type="bibr">32</xref>] ant colony optimization variant, dynamically balancing exploration of novel concepts and exploitation of known pathways. Furthermore, Shi et al. [<xref rid="ref033" ref-type="bibr">33</xref>] validate the necessity of multidimensional KGs, demonstrating that integrating node centrality (e.g., PageRank scores) and learner-specific factors (e.g., proficiency levels) enhances path personalization for learners, as measured by retention metrics. The methodology's robustness is further reinforced by its compatibility with ontology-enriched KGs, where named entity recognition (NER) refines node categorization—for example, distinguishing "Bayesian Inference" as a theoretical versus applied concept—ensuring granular alignment with curricular taxonomies [<xref rid="ref034" ref-type="bibr">34</xref>]. Figure <xref ref-type="fig" rid="F2">2</xref> shows the general process of learning path recommendation based on knowledge graph.</p>
    </sec>
    <sec id="S5">
      <label>5.</label>
      <title>Experiments</title>
      <p id="S5.p1">In this paper, we use manual annotation to extract more than four thousand latex mathematical formulas from the text of about 100,000 words of operations research textbook, which contains more than 10,000 entities and annotate the latex formulas, and there are five types of entity categories defined, namely: Function operator, Unary logical operator, Binary relation operator, Delimiter and Special operator. Table <xref rid="T1" ref-type="table">1</xref> shows certain entities' pertinent descriptions and examples.</p>
      <p>
        <table-wrap id="T1">
          <label>Table 1</label>
          <caption>
            <p>Representation and examples of entities.</p>
          </caption>
          <table>
            <thead>
              <tr>
                <th style="border-top: 1px solid black;" align="left">Type of entity</th>
                <th style="border-top: 1px solid black;" align="left">Notation</th>
                <th style="border-top: 1px solid black;" align="left">Description</th>
                <th style="border-top: 1px solid black;" align="left">Example</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td style="border-top: 1px solid black;" align="left">Function operator</td>
                <td style="border-top: 1px solid black;" align="left">FUNC<sup/></td>
                <td style="border-top: 1px solid black;" align="left">Minimum, maximum</td>
                <td style="border-top: 1px solid black;" align="left">\min, \max, …</td>
              </tr>
              <tr>
                <td align="left">Unary logical operator</td>
                <td align="left">ULOP</td>
                <td align="left">Include, less than, more than…</td>
                <td align="left">\in, \leq, \geq, …</td>
              </tr>
              <tr>
                <td align="left">Binary relation operator</td>
                <td align="left">BROP</td>
                <td align="left">Intersection, union …</td>
                <td align="left">\land, \vee, …</td>
              </tr>
              <tr>
                <td align="left">Delimiter</td>
                <td align="left">DELI</td>
                <td align="left">Array…</td>
                <td align="left">\begin{pmatrix}, …</td>
              </tr>
              <tr>
                <td style="border-bottom: 1px solid black;" align="left">Special operator</td>
                <td style="border-bottom: 1px solid black;" align="left">SPEC</td>
                <td style="border-bottom: 1px solid black;" align="left">Root sign, fraction</td>
                <td style="border-bottom: 1px solid black;" align="left">\sqrt, \frac, …</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
      </p>
      <sec id="S5.SS1">
        <label>5.1</label>
        <title>Labeling standards</title>
        <p id="S5.SS1.p1">In the named entity recognition task, there are two commonly used annotation approaches, which are BIO annotation method and BIOES annotation method. In the BIO annotation approach, B-begin denotes the entity's first character, I-inside its middle character, and O-outside its character that is unrelated to any of the other entities. (which can be interpreted as the character that doesn't exist in any entity). When using the BIOES annotation method, the B stands for the entity's beginning, the I for its middle, the O for the character unrelated to the entity, the E-end for its end, and the S-single for the single character, which is an entity in and of itself. This article uses the BIO annotation method.</p>
      </sec>
      <sec id="S5.SS2">
        <label>5.2</label>
        <title>Assessment indicators</title>
        <p id="S5.SS2.p1">Three indicators—precision (P), recall (R), and F1 value (F1)—are used in this research as the criteria for assessing the model's efficacy. P is how many true positive samples are there in the samples predicted to be positive; and R is how many positive samples are predicted correctly with respect to the original positive samples. When we use these two metrics to judge the merits of a model, one would often hope that the values of both precision and recall are at a high level, but in practice this is often not the case because the two metrics are contradictory in some cases. In order to give a more comprehensive judgment of the model, the reconciled average of precision and recall, F1, is introduced. P, R and F1 are calculated as follows:</p>
        <p>
          <disp-formula id="S5.E1">
            <mml:math alttext="P=\frac{T_{p}}{T_{p}+F_{p}}\times 100\%" display="block">
              <mml:mrow>
                <mml:mi>P</mml:mi>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mfrac>
                    <mml:msub>
                      <mml:mi>T</mml:mi>
                      <mml:mi>p</mml:mi>
                    </mml:msub>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>T</mml:mi>
                        <mml:mi>p</mml:mi>
                      </mml:msub>
                      <mml:mo>+</mml:mo>
                      <mml:msub>
                        <mml:mi>F</mml:mi>
                        <mml:mi>p</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mfrac>
                  <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                  <mml:mrow>
                    <mml:mn>100</mml:mn>
                    <mml:mo>%</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p>
          <disp-formula id="S5.E2">
            <mml:math alttext="R=\frac{T_{p}}{T_{p}+F_{N}}\times 100\%" display="block">
              <mml:mrow>
                <mml:mi>R</mml:mi>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mfrac>
                    <mml:msub>
                      <mml:mi>T</mml:mi>
                      <mml:mi>p</mml:mi>
                    </mml:msub>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>T</mml:mi>
                        <mml:mi>p</mml:mi>
                      </mml:msub>
                      <mml:mo>+</mml:mo>
                      <mml:msub>
                        <mml:mi>F</mml:mi>
                        <mml:mi>N</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mfrac>
                  <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                  <mml:mrow>
                    <mml:mn>100</mml:mn>
                    <mml:mo>%</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p>
          <disp-formula id="S5.E3">
            <mml:math alttext="F1=2\times\frac{P\times R}{P+R}\times 100\%" display="block">
              <mml:mrow>
                <mml:mrow>
                  <mml:mi>F</mml:mi>
                  <mml:mo>⁢</mml:mo>
                  <mml:mn>1</mml:mn>
                </mml:mrow>
                <mml:mo>=</mml:mo>
                <mml:mrow>
                  <mml:mn>2</mml:mn>
                  <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                  <mml:mfrac>
                    <mml:mrow>
                      <mml:mi>P</mml:mi>
                      <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                      <mml:mi>R</mml:mi>
                    </mml:mrow>
                    <mml:mrow>
                      <mml:mi>P</mml:mi>
                      <mml:mo>+</mml:mo>
                      <mml:mi>R</mml:mi>
                    </mml:mrow>
                  </mml:mfrac>
                  <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                  <mml:mrow>
                    <mml:mn>100</mml:mn>
                    <mml:mo>%</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p id="S5.SS2.p2">where <inline-formula><mml:math alttext="T_{p}" display="inline"><mml:msub><mml:mi>T</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:math></inline-formula> is the number of entities that the model properly identified, <inline-formula><mml:math alttext="F_{p}" display="inline"><mml:msub><mml:mi>F</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:math></inline-formula> is the number of unrelated entities that the model identified, and <inline-formula><mml:math alttext="F_{N}" display="inline"><mml:msub><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:math></inline-formula> indicates the quantity of related entities that the model failed to recognize.</p>
      </sec>
      <sec id="S5.SS3">
        <label>5.3</label>
        <title>Experimental environment and model parameter settings</title>
        <p id="S5.SS3.p1">The experiments are based on the torch framework to build neural network models, and the detailed configuration of the experimental environment is shown in Table <xref rid="T2" ref-type="table">2</xref>.</p>
        <p>
          <table-wrap id="T2">
            <label>Table 2</label>
            <caption>
              <p>Experiment environment.</p>
            </caption>
            <table>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" colspan="2" align="left">Environment</td>
                  <td style="border-top: 1px solid black;" align="left">Configuration</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="4" align="left">Hardware</td>
                  <td style="border-top: 1px solid black;" align="left">OS</td>
                  <td style="border-top: 1px solid black;" align="left">Windows 11</td>
                </tr>
                <tr>
                  <td align="left">CPU</td>
                  <td align="left">
                    <p>
                      <table-wrap>
                        <table>
                          <tr>
                            <td align="left">Intel(R) Core(TM)</td>
                          </tr>
                          <tr>
                            <td align="left">i7-12700H  2.30 GHz</td>
                          </tr>
                        </table>
                      </table-wrap>
                    </p>
                  </td>
                </tr>
                <tr>
                  <td align="left">GPU</td>
                  <td align="left">GeForce RTX 3060</td>
                </tr>
                <tr>
                  <td align="left">Memory<sup/></td>
                  <td align="left">32 GB</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" rowspan="2" align="left">Software</td>
                  <td style="border-top: 1px solid black;" align="left">Python</td>
                  <td style="border-top: 1px solid black;" align="left">Python 3.6</td>
                </tr>
                <tr>
                  <td style="border-bottom: 1px solid black;" align="left">Pytorch</td>
                  <td style="border-bottom: 1px solid black;" align="left">torch 1.9.1</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <p id="S5.SS3.p2">The main training parameters involved in the experiment are shown in Table <xref rid="T3" ref-type="table">3</xref>. The quantity of Transformer layers used for the BERT model is 12, hidden layer dimension 768, number of head 12, optimizer AdamW, learning rate 0.00002, Lstm_dim 384, Batch_size 4, dropout 0.4, gradient_clip 1, weight_decay 0.1, max_len 512, epochs 50.</p>
        <p>
          <table-wrap id="T3">
            <label>Table 3</label>
            <caption>
              <p>Model parameters.</p>
            </caption>
            <table>
              <thead>
                <tr>
                  <th style="border-top: 1px solid black;" align="center">  Environment</th>
                  <th style="border-top: 1px solid black;" align="center">  Configuration</th>
                </tr>
              </thead>
              <tbody>
                <tr>
                  <th style="border-top: 1px solid black;" align="center">  Transformer</th>
                  <td style="border-top: 1px solid black;" align="center">  12</td>
                </tr>
                <tr>
                  <th align="center">  Hidden_size</th>
                  <td align="center">  768</td>
                </tr>
                <tr>
                  <th align="center">  Number of head</th>
                  <td align="center">  12</td>
                </tr>
                <tr>
                  <th align="center">  Optimizer</th>
                  <td align="center">  AdamW</td>
                </tr>
                <tr>
                  <th align="center">  Learning_rate</th>
                  <td align="center">  0.00002</td>
                </tr>
                <tr>
                  <th align="center">  Lstm_size</th>
                  <td align="center">  384</td>
                </tr>
                <tr>
                  <th align="center">  Drop_out</th>
                  <td align="center">  0.4</td>
                </tr>
                <tr>
                  <th align="center">  Gradient_clip</th>
                  <td align="center">  1</td>
                </tr>
                <tr>
                  <th align="center">  Weight_decay</th>
                  <td align="center">  0.1</td>
                </tr>
                <tr>
                  <th style="border-bottom: 1px solid black;" align="center">  Batch_size</th>
                  <td style="border-bottom: 1px solid black;" align="center">  4</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
      </sec>
      <sec id="S5.SS4">
        <label>5.4</label>
        <title>Experimental results and analysis</title>
        <p id="S5.SS4.p1">To confirm the capability of the proposed model (BERT-formula) for entity recognition of mathematical formulas in operations research, it was compared with four models, BiLSTM, BiLSTM-CRF and Bert-BiLSTM-CRF, in the same experimental environment in terms of three indexes: accuracy, recall, and F1 value. The results of the experiments are shown in Table <xref rid="T4" ref-type="table">4</xref>.</p>
        <p>
          <table-wrap id="T4">
            <label>Table 4</label>
            <caption>
              <p>Comparison of named entity recognition results of different models.</p>
            </caption>
            <table>
              <thead>
                <tr>
                  <th style="border-top: 1px solid black;" align="center">Methods</th>
                  <th style="border-top: 1px solid black;" align="center">Accuracy</th>
                  <th style="border-top: 1px solid black;" align="center">Recall</th>
                  <th style="border-top: 1px solid black;" align="center">F1</th>
                </tr>
              </thead>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" align="center">BiLSTM</td>
                  <td style="border-top: 1px solid black;" align="center">36.93</td>
                  <td style="border-top: 1px solid black;" align="center">37.60<sup/></td>
                  <td style="border-top: 1px solid black;" align="center">37.26</td>
                </tr>
                <tr>
                  <td align="center">BiLSTM-CRF</td>
                  <td align="center">70.41</td>
                  <td align="center">74.86</td>
                  <td align="center">72.57</td>
                </tr>
                <tr>
                  <td align="center">BERT-BiLSTM-CRF</td>
                  <td align="center">97.37</td>
                  <td align="center">98.28</td>
                  <td align="center">97.82</td>
                </tr>
                <tr>
                  <td style="border-bottom: 1px solid black;" align="center">
                    <bold>Ours</bold>
                  </td>
                  <td style="border-bottom: 1px solid black;" align="center">
                    <bold>98.88</bold>
                  </td>
                  <td style="border-bottom: 1px solid black;" align="center">
                    <bold>98.88</bold>
                  </td>
                  <td style="border-bottom: 1px solid black;" align="center">
                    <bold>98.87</bold>
                  </td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <p id="S5.SS4.p2">From Table <xref rid="T4" ref-type="table">4</xref>, it is evident for us to test the effect of each model for the entity recognition of operational research formulas. It can be seen that the capability of BiLSTM alone is very limited, while after combining with CRF, The model's recognition impact has significantly improved. The BERT-BiLSTM-CRF model has a large improvement in the recognition effect due to the fact that BERT has the capability of extracting local and contextual features, which produces feature vectors with a more accurate representation.</p>
      </sec>
    </sec>
    <sec id="S6">
      <label>6.</label>
      <title>Conclusion</title>
      <p id="S6.p1">This research investigates named entity recognition of mathematical formulas based on text in the field of operations research, and proposes a model for named entity recognition of mathematical formulas, BERT-formula. an efficient feature representation of mathematical formulas using embedding is spliced with a vector representation of randomly initialized instance queries, which are then jointly input to BERT for encoding, and then encoded using one-way Self-Attention, which allows the queries to model connections with each other and enhances the query semantics. After that, feature extraction is performed through BiLSTM layer and transformer layer, and finally the final predicted labeling results are obtained by finding the optimal allocation by finding the minimum cost matrix for the allocation problem. The experimental findings demonstrate that, in comparison to the conventional NER technique, the inference speed of the approach presented in this study is significantly enhanced, and it is also superior in terms of recognition results. The identified formula entities can further support adaptive learning path recommendations by mapping domain-specific knowledge components to targeted educational resources. The named entity recognition of operations research formulas achieved in this research establishes a strong basis for downstream NLP tasks in the field of operations research, and the example queries in this paper do not rely on external knowledge, but throughout the training phase, acquire the query semantics associated with the entity type and location, which saves a lot of manual consumption; the feature of not relying on the knowledge related to a specific domain also makes the method can be easily applied to other domains. At the same time, for the problem of fewer datasets of text labeling of mathematical formulas in the field of operations research, a dataset of operations research latex formulas is constructed, which contains nearly five thousand data, and it is anticipated that it will contribute in some way to the advancement of operations research. Subsequently, the size of the dataset will be further expanded to further verify the transferability of the model and apply the model to other fields.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgments</title>
      <p id="ack.p1">This work was supported in part by the Project of Construction and Support for high-level Innovative Teams of Beijing Municipal Institutions under Grant BPHR20220104; in part by the Beijing Scholars Program under Grant 099; in part by the IFLYTEK University Intelligent Teaching Innovation Research Special Project under Grant 2022XF055.</p>
    </ack>
    <sec id="sec0100" sec-type="COI-statement">
      <title>Conflict of interest</title>
      <p>The authors declare no conflicts of interest.</p>
    </sec>
    <ref-list>
      <title>References</title>
      <ref id="ref001">
        <label>[1]</label>
        <mixed-citation> Yu, H. K., Zhang, H. P., Liu, Q., Lu, X. Q., &amp; Shi, S. C. (2006). Chinese named entity identification using cascaded hidden Markov model. Tongxin Xuebao/Journal on Communications, 27(2), 87-94. </mixed-citation>
      </ref>
      <ref id="ref002">
        <label>[2]</label>
        <mixed-citation> Huang, H. W. (2009). <italic>SVM combined with error-driven learning for biological entity recognition</italic>. National University of Defense Technology. </mixed-citation>
      </ref>
      <ref id="ref003">
        <label>[3]</label>
        <mixed-citation> Feng, Y., Yu, H., Sun, G., &amp; Zhao, Y. (2016). Domain-specific term recognition method based on word embedding and conditional random field. <italic>Journal of Computer Applications, 36</italic>(11), 3146-3151. </mixed-citation>
      </ref>
      <ref id="ref004">
        <label>[4]</label>
        <mixed-citation> Hinton, G. E., &amp; Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. <italic>Science, 313</italic>(5786), 504-507. [<uri>https://doi.org/10.1126/science.1127647</uri>] </mixed-citation>
      </ref>
      <ref id="ref005">
        <label>[5]</label>
        <mixed-citation> LeCun, Y., Bottou, L., Bengio, Y., &amp; Haffner, P. (1998). Gradient-based learning applied to document recognition. <italic>Proceedings of the IEEE, 86</italic>(11), 2278-2324. [<uri>https://doi.org/10.1109/5.726791</uri>] </mixed-citation>
      </ref>
      <ref id="ref006">
        <label>[6]</label>
        <mixed-citation> Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., &amp; Khudanpur, S. (2010, September). Recurrent neural network based language model. In <italic>Interspeech</italic> (Vol. 2, No. 3, pp. 1045-1048). </mixed-citation>
      </ref>
      <ref id="ref007">
        <label>[7]</label>
        <mixed-citation> Hochreiter, S., &amp; Schmidhuber, J. (1997). Long short-term memory. <italic>Neural Computation, 9</italic>(8), 1735-1780. [<uri>https://doi.org/10.1162/neco.1997.9.8.1735</uri>] </mixed-citation>
      </ref>
      <ref id="ref008">
        <label>[8]</label>
        <mixed-citation> Devlin, J., Chang, M. W., Lee, K., &amp; Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In <italic>Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)</italic> (pp. 4171-4186). [<uri>https://doi.org/10.18653/v1/N19-1423</uri>] </mixed-citation>
      </ref>
      <ref id="ref009">
        <label>[9]</label>
        <mixed-citation> Hammerton, J. (2003). Named entity recognition with long short-term memory. <italic>Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL</italic>, 172-175. </mixed-citation>
      </ref>
      <ref id="ref010">
        <label>[10]</label>
        <mixed-citation> Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., &amp; Dyer, C. (2016). Neural Architectures for Named Entity Recognition. In <italic>Proceedings of NAACL-HLT</italic> (pp. 260-270). [<uri>https://doi.org/10.48550/arXiv.1603.01360</uri>] </mixed-citation>
      </ref>
      <ref id="ref011">
        <label>[11]</label>
        <mixed-citation> Zhou, J. W., Wang, K., Wu, Y. L., et al.. (2024). Research on Named Entity Recognition of Shen Nong's Materia Medica Based on BiLSTM-CRF. <italic>Journal of Chengdu University of Traditional Chinese Medicine, 47</italic>(03), 54-59. [<uri>https://doi.org/10.13593/j.cnki.51-1501/r.2024.03.59</uri>] </mixed-citation>
      </ref>
      <ref id="ref012">
        <label>[12]</label>
        <mixed-citation> Cheng, N., Li, B., Ge, S., Hao, X., &amp; Feng, M. (2020). A joint model of automatic sentence segmentation and lexical analysis for ancient Chinese based on BiLSTM-CRF model. <italic>Journal of Chinese Information Processing, 34</italic>(4), 1-9. </mixed-citation>
      </ref>
      <ref id="ref013">
        <label>[13]</label>
        <mixed-citation> Huang, Z. Y., Yu, Y. N., Lin, R. M., et al.. (2024). Knowledge graph construction for network security base on modified BiLSTM-CRF. <italic>Modern Electronics Technique, 47</italic>(06), 15-21. [<uri>https://doi.org/10.16652/j.issn.1004-373x.2024.06.003</uri>] </mixed-citation>
      </ref>
      <ref id="ref014">
        <label>[14]</label>
        <mixed-citation> Zhou, L. L., Chen, L., Ji, F., et al.. (2023). ERNIE-BiLSTM-CRF Model-based entity recognition study in soil fertility. <italic>Horticulture and Seed, 43</italic>(09), 97-101. [<uri>https://doi.org/10.16530/j.cnki.cn21-1574/s.2023.09.036</uri>] </mixed-citation>
      </ref>
      <ref id="ref015">
        <label>[15]</label>
        <mixed-citation> Li, J., Lyu, G., Li, R., et al.. (2023). Chinese Negative Semantic Representation and Annotation Combined with Hybrid Attention Mechanism and BiLSTM-CRF. <italic>Computer Engineering and Applications, 59</italic>(09), 167-175. </mixed-citation>
      </ref>
      <ref id="ref016">
        <label>[16]</label>
        <mixed-citation> Strubell, E., Verga, P., Belanger, D., et al.. (2017). Fast and accurate entity recognition with iterated dilated convolutions. <italic>Proceedings of EMNLP</italic>, 2670-2680. </mixed-citation>
      </ref>
      <ref id="ref017">
        <label>[17]</label>
        <mixed-citation> Chen, T. Y., &amp; Feng, S. (2022). Research on named entity recognition method and model stability of electronic medical record based on IDCNN+CRF and attention mechanism. <italic>China Digital Medicine, 17</italic>(11), 1-5. </mixed-citation>
      </ref>
      <ref id="ref018">
        <label>[18]</label>
        <mixed-citation> Peters, M. E., Neumann, M., Iyyer, M., et al.. (2018). Deep contextualized word representations. <italic>Proceedings of NAACL-HLT</italic>, 2227-2237. </mixed-citation>
      </ref>
      <ref id="ref019">
        <label>[19]</label>
        <mixed-citation> Radford, A., Narasimhan, K., Salimans, T., Sutskever, I. (2018). Improving language understanding by generative pre-training. <italic>OpenAI preprint</italic>. </mixed-citation>
      </ref>
      <ref id="ref020">
        <label>[20]</label>
        <mixed-citation> Li, S., &amp; Pang, W. (2023). Joint Extraction Method of Entity and Relation in Maize Breeding Based on BERT-CRF and Word Embedding. <italic>Transactions of the Chinese Society of Agricultural Machinery</italic>, 1-16. [<uri>http://kns.cnki.net/kcms/detail/11.1964.S.20230919.1113.006.html</uri>] </mixed-citation>
      </ref>
      <ref id="ref021">
        <label>[21]</label>
        <mixed-citation> Zheng, X., Li, B., Feng, Z., et al.. (2023). Entity Recognition of Network Sensitive Words and Variants Based on BERT-BiLSTM-CRF. <italic>Computer and Digital Engineering, 51</italic>(07), 1585-1589. </mixed-citation>
      </ref>
      <ref id="ref022">
        <label>[22]</label>
        <mixed-citation> Yu, X., &amp; Chang, E. (2023). Automatic Recognition of Place Names in Ancient PoetryBased on DA-BERT-CRF Models: Taking the AncientPoetries of Nanjing as an Example. <italic>Library Journal, 42</italic>(10), 87-94+73. [<uri>https://doi.org/10.13663/j.cnki.lj.2023.10.010</uri>] </mixed-citation>
      </ref>
      <ref id="ref023">
        <label>[23]</label>
        <mixed-citation> Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., &amp; Li, J. (2020). A unified MRC framework for named entity recognition. <italic>Proceedings of ACL</italic>, 5849-5859. </mixed-citation>
      </ref>
      <ref id="ref024">
        <label>[24]</label>
        <mixed-citation> Luo, X., Li, T., &amp; Jia, Z. (2024). Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement. <italic>Journal of Computer Applications, 44</italic>(2), 385-392. </mixed-citation>
      </ref>
      <ref id="ref025">
        <label>[25]</label>
        <mixed-citation> Xue, M., Yu, B., Zhang, Z., Liu, T., Zhang, Y., &amp; Wang, B. (2020). Coarse-toFine Pre-training for Named Entity Recognition. <italic>Proceedings of EMNLP</italic>, 6345-6354. </mixed-citation>
      </ref>
      <ref id="ref026">
        <label>[26]</label>
        <mixed-citation> Zheng, H., Qin, B., &amp; Xu, M. (2021, January). Chinese medical named entity recognition using crf-mt-adapt and ner-mrc. In <italic>2021 2nd International Conference on Computing and Data Science (CDS)</italic> (pp. 362-365). IEEE. [<uri>https://doi.org/10.1109/CDS52072.2021.00068</uri>] </mixed-citation>
      </ref>
      <ref id="ref027">
        <label>[27]</label>
        <mixed-citation> Shen, Y., Wang, X., Tan, Z., Xu, G., Xie, P., Huang, F., … &amp; Zhuang, Y. (2022). Parallel instance query network for named entity recognition. <italic>arXiv preprint arXiv:2203.10545</italic>. [<uri>https://doi.org/10.48550/arXiv.2203.10545</uri>] </mixed-citation>
      </ref>
      <ref id="ref028">
        <label>[28]</label>
        <mixed-citation> Burkard, R. E., &amp; Cela, E. (1999). Linear assignment problems and extensions. In Handbook of combinatorial optimization: <italic>Supplement volume A</italic> (pp. 75-149). Boston, MA: Springer US. [<uri>https://doi.org/10.1007/978-1-4757-3023-4_2</uri>] </mixed-citation>
      </ref>
      <ref id="ref029">
        <label>[29]</label>
        <mixed-citation> Al-Rfou, R., Choe, D., Constant, N., Guo, M., &amp; Jones, L. (2019, July). Character-level language modeling with deeper self-attention. In <italic>Proceedings of the AAAI conference on artificial intelligence</italic> (Vol. 33, No. 01, pp. 3159-3166). [<uri>https://doi.org/10.1609/aaai.v33i01.33013159</uri>] </mixed-citation>
      </ref>
      <ref id="ref030">
        <label>[30]</label>
        <mixed-citation> Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., &amp; Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In <italic>European conference on computer vision</italic> (pp. 213-229). Cham: Springer International Publishing. [<uri>https://doi.org/10.1007/978-3-030-58452-8_13</uri>] </mixed-citation>
      </ref>
      <ref id="ref031">
        <label>[31]</label>
        <mixed-citation> Zhang, X., Liu, S., &amp; Wang, H. (2023). Personalized learning path recommendation for e-learning based on knowledge graph and graph convolutional network. <italic>International journal of software engineering and knowledge engineering</italic>, 33(01), 109-131. [<uri>https://doi.org/10.1142/S0218194022500681</uri>] </mixed-citation>
      </ref>
      <ref id="ref032">
        <label>[32]</label>
        <mixed-citation> Zheng, Y., Wang, D., Zhang, J., Li, Y., Xu, Y., Zhao, Y., &amp; Zheng, Y. (2024). A unified framework for personalized learning pathway recommendation in e-learning contexts. <italic>Education and Information Technologies</italic>, 1-38. [<uri>https://doi.org/10.1007/s10639-024-13045-8</uri>] </mixed-citation>
      </ref>
      <ref id="ref033">
        <label>[33]</label>
        <mixed-citation> Shi, D., Wang, T., Xing, H., &amp; Xu, H. (2020). A learning path recommendation model based on a multidimensional knowledge graph framework for e-learning. <italic>Knowledge-Based Systems, 195</italic>, 105618. [<uri>https://doi.org/10.1016/j.knosys.2020.105618</uri>] </mixed-citation>
      </ref>
      <ref id="ref034">
        <label>[34]</label>
        <mixed-citation> Duan, S., Chen, K., Yang, Y., &amp; Shi, S. (2023, August). Research on personalized learning recommendation based on subject knowledge graphs and learner portraits. In <italic>International Conference on Computer Science and Educational Informatization</italic> (pp. 367-374). Singapore: Springer Nature Singapore. [<uri>https://doi.org/10.1007/978-981-99-9492-2_31</uri>] </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>
