<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD with MathML3 v1.1d2 20140930//EN" "JATS-journalpublishing1-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.1d2" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="nlm-ta">CJIF</journal-id>
      <journal-id journal-id-type="publisher-id">ICCK</journal-id>
      <journal-title-group>
        <journal-title>Chinese Journal of Information Fusion</journal-title>
      </journal-title-group>
      <issn pub-type="ppub" publication-format="print">2998-3363</issn>
      <issn pub-type="epub" publication-format="electronic">2998-3371</issn>
      <publisher>
        <publisher-name>Institute of Central Computation and Knowledge Inc</publisher-name>
        <publisher-loc>522 W RIVERSIDE AVE STE N, SPOKANE, WA, 99201, UNITED STATES</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.62762/CJIF.2024.734267</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Research Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Unsupervised Industrial Anomaly Detection Based on Feature Mask Generation and Reverse Distillation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0007-9728-0481</contrib-id>
          <name>
            <surname>Qi</surname>
            <given-names>Pei</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-3960-8828</contrib-id>
          <name>
            <surname>Chai</surname>
            <given-names>Lin</given-names>
          </name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">https://orcid.org/0009-0001-0707-1317</contrib-id>
          <name>
            <surname>Ye</surname>
            <given-names>Xinyu</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff1"><label>1</label>School of Automation, Southeast University, Nanjing 210000, China</aff>
        <aff id="aff2"><label>2</label>Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing 210096, China</aff>
      </contrib-group>
      <author-notes>
        <corresp id="cor2">Corresponding Author: Lin Chai. Email: <email>chailin1@seu.edu.cn</email></corresp>
      </author-notes>
      <pub-date date-type="pub" pub-type="epub" publication-format="online">
        <day>30</day>
        <month>9</month>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <issue>2</issue>
      <fpage>160</fpage>
      <lpage>174</lpage>
      <history>
        <date date-type="received">
          <day>31</day>
          <month>7</month>
          <year>2024</year>
        </date>
        <date date-type="accepted">
          <day>23</day>
          <month>9</month>
          <year>2024</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2024 by the Authors. Published by Institute of Central Computation and Knowledge. This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/).</copyright-statement>
        <copyright-year>2024</copyright-year>
        <copyright-holder>The Authors</copyright-holder>
        <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
        </license>
      </permissions>
      <self-uri xlink:href="https://www.icck.org/article/abs/cjif.2024.734267">This article is available from https://www.icck.org/article/abs/cjif.2024.734267</self-uri>
      <abstract>
        <p>In the realm of industrial defect detection, unsupervised anomaly detection methods draw considerable attention as a result of their exceptional accomplishments. Among these, knowledge distillation-based methods have emerged as a prominent research focus, favored for their streamlined architecture, precision, and efficiency. However, the challenge of characterizing the variability in anomaly samples hinders the accuracy of detection. To address this issue, our research presents a novel approach for anomaly detection and localization, leveraging feature fusion through inverse knowledge distillation as its cornerstone. We employ the encoder as the guiding teacher model and designate the decoder as the learning student model, leveraging the structural disparity wthin the model fusion framework to mitigate the generalization challenge. Additionally, we integrate an attention-based feature fusion mechanism into the distillation process to concentrate on the precise extraction and reconstruction of image features, thereby preventing the loss of nuanced details. To further refine the feature fusion learning process, we have developed a feature mask generation module that minimizes the impact of spatial redundancy in the teacher's features, thereby enhancing the acquisition and fusion of pivotal information. Comprehensive experimental evaluations, carried out meticulously on the MVTec AD dataset, convincingly illustrate the superiority of our proposed method over prevalent methodologies in both detecting and pinpointing anomalies across a diverse range of 15 categories. The proposed methodology attains superior outcomes, evinced by the detection AUROC, localization AUROC, and localization PRO metrics achieving respective values of 99.1%, 98.5%, and 95.9%. To substantiate the significance of individual components within the model, we conduct ablation studies, thereby reinforcing both the efficacy and applicability of our feature fusion approach.</p>
      </abstract>
      <kwd-group kwd-group-type="author" xml:lang="en">
        <kwd>unsupervised learning</kwd>
        <kwd>feature fusion</kwd>
        <kwd>anomaly detection</kwd>
        <kwd>knowledge distillation</kwd>
        <kwd>attention mechanism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="S1">
      <label>1.</label>
      <title>Introduction</title>
      <p id="S1.p1">In the realm of industrial vision recognition systems, the pivotal role of anomaly detection is paramount for guaranteeing the impeccable quality of products and maintaining the stability of manufacturing processes [<xref rid="ref002" ref-type="bibr">2</xref>]. Traditional manual inspection methods struggle to meet the demands of efficient quality control in modern intelligent manufacturing environments due to their high cost and limited throughput. The evolution of computer vision and deep learning has propelled deep learning-based anomaly detection methods to the forefront, with applications spanning medical image diagnosis [<xref rid="ref003" ref-type="bibr">3</xref>], industrial quality inspection [<xref rid="ref004" ref-type="bibr">4</xref>, <xref rid="ref005" ref-type="bibr">5</xref>], and video surveillance [<xref rid="ref006" ref-type="bibr">6</xref>].</p>
      <p id="S1.p2">In surface defect detection in industrial scenarios, the costly and time-consuming acquisition of anomalous samples limits the use of supervised anomaly detection [<xref rid="ref007" ref-type="bibr">7</xref>, <xref rid="ref008" ref-type="bibr">8</xref>]. In practice, normal sample data dominates the dataset while abnormal sample data is relatively scarce or missing, so most anomaly detection methods are in unsupervised or semi-supervised form [<xref rid="ref009" ref-type="bibr">9</xref>, <xref rid="ref010" ref-type="bibr">10</xref>]. The significant imbalance between normal and abnormal data renders supervised methods inapplicable for addressing anomaly detection challenges. Unsupervised anomaly detection methods are generally classified into self-encoder based [<xref rid="ref013" ref-type="bibr">13</xref>, <xref rid="ref014" ref-type="bibr">14</xref>, <xref rid="ref015" ref-type="bibr">15</xref>], generative adversarial network based [<xref rid="ref016" ref-type="bibr">16</xref>, <xref rid="ref017" ref-type="bibr">17</xref>, <xref rid="ref018" ref-type="bibr">18</xref>], and teacher-student (T-S) model based [<xref rid="ref019" ref-type="bibr">19</xref>, <xref rid="ref020" ref-type="bibr">20</xref>] approaches. Among these, knowledge distillation-based methods have shown promise in unsupervised anomaly detection, attributed to their superior knowledge transfer and learning guidance capabilities. However, existing knowledge distillation methods primarily train on normal data and detect anomalies by identifying representational discrepancies between teacher and student models, which can result in reduced sensitivity to anomalous samples.</p>
      <p id="S1.p3">To address the limitations of current knowledge distillation methods, such as low anomaly sensitivity and imprecise localization, this paper introduces an unsupervised industrial anomaly detection method that integrates feature mask generation and reverse distillation techniques. This approach diverges from conventional T-S models by processing the original image solely through the teacher model to capture feature representations. Subsequently, the student model extracts the latent representation and reconstructs the image features at the original scale. Specifically, the initial step involves processing images via a pre-trained encoder, employed as a teacher model, to extract and derive pertinent features. These features are then channeled through an attention-based bottleneck module, which integrates local information via a multiscale feature fusion module to extract a more compact coding. This encoding is subsequently passed to the student model, in conjunction with a feature mask module, to restore the original feature representation. To refine the presentation of the model's loss function, it is devised utilizing the cosine proximity measure between the encoder's output features at each layer in the teacher model and the corresponding decoder's output features in the student model.</p>
      <p id="S1.p4">The main contributions delineated in this paper are as follows:</p>
      <p>
        <list list-type="order" id="S1.I1">
          <list-item id="S1.I1.i1">
            <p id="S1.I1.i1.p1">We introduce a masked reverse knowledge distillation technique aimed at augmenting the structural diversity within the T-S model framework, thereby bolstering its capabilities for anomaly detection and precise localization. This approach effectively addresses the pervasive issue of overgeneralization encountered in image-based anomaly detection systems, ensuring enhanced performance and robustness.</p>
          </list-item>
          <list-item id="S1.I1.i2">
            <p id="S1.I1.i2.p1">An attention mechanism is introduced to leverage the high-quality features derived from normal data training, assisting the student model in reconstructing multi-scale normal modal information.</p>
          </list-item>
          <list-item id="S1.I1.i3">
            <p id="S1.I1.i3.p1">The development of a feature mask generation module (FMM) refines the pixel representation by emphasizing feature pixels that encapsulate information about neighboring pixels, thereby enhancing the performance of the feature-based distillation approach.</p>
          </list-item>
        </list>
      </p>
      <p id="S1.p6">The organization of this paper is structured in the following manner: In Section 2, we conduct an exhaustive examination of the pertinent literature, offering a comprehensive review of related work. Section 3 delves into the intricacies of the novel methodology proposed, providing a thorough elaboration. Subsequently, Section 4 showcases the experimental outcomes achieved on the widely accessible MVTec AD dataset, accompanied by a series of ablation studies aimed at substantiating the effectiveness of our proposed approach. Lastly, Section 5 brings the paper to a conclusion, reflecting on the key findings and contemplating potential avenues for future research endeavors.</p>
    </sec>
    <sec id="S2">
      <label>2.</label>
      <title>Related Work</title>
      <sec id="S2.SS1">
        <label>2.1</label>
        <title>Reconstruction-Based Methods</title>
        <p id="S2.SS1.p1">At present, the primary detection approaches rooted in image reconstruction encompass a diverse set of methodologies, such as the Auto-Encoder (AE) framework, the Variational Auto-Encoder (VAE) [<xref rid="ref011" ref-type="bibr">11</xref>] and the Generative Adversarial Network (GAN) paradigm [<xref rid="ref012" ref-type="bibr">12</xref>]. Self-encoder based on methods use reconstruction error to determine anomalous samples. Kwon et al. [<xref rid="ref013" ref-type="bibr">13</xref>] employ cosine similarity to measure the angular deviation of gradient vectors among normal samples, establishing consistency constraints for gradient vector directions to detect anomalies. Chu et al. [<xref rid="ref014" ref-type="bibr">14</xref>] analyze the loss function's change curve to identify anomalous images within unlabeled datasets. Kim et al. [<xref rid="ref015" ref-type="bibr">15</xref>] reduce the self-encoder's false positive rate by comparing the complexity of shapes, sizes, and colors in the learned samples.</p>
        <p id="S2.SS1.p2">These methods mainly rely on reconstruction error to identify abnormal samples. However, when the distribution of training samples is diversified, the reconstruction accuracy may be affected due to the lack of feature-level discriminant information, resulting in an increase of false detections and missed detections. Especially for complex anomaly patterns, relying solely on reconstruction errors may not adequately capture their properties.</p>
        <p id="S2.SS1.p3">GAN-based methods capitalize on the capability of GANs to generate realistic images, offering clearer reconstructions than self-encoders for anomaly detection. Schlegl et al. [<xref rid="ref016" ref-type="bibr">16</xref>] introduce AnoGAN, which utilizes backward iterative propagation to pinpoint potential anomaly indicators. Akcay et al. [<xref rid="ref017" ref-type="bibr">17</xref>] propose GANomaly, integrating an encoder to refine the traditional reconstruction error through multi-angle constraints, resulting in closer alignment of reconstructed images to originals. Schlegl et al. [<xref rid="ref018" ref-type="bibr">18</xref>] further propose f-AnoGAN, replacing the AE decoder with a trained GAN generator for more direct image reconstruction leveraging GAN's generation prowess.</p>
        <p id="S2.SS1.p4">While GAN-based methods are able to generate more realistic images and thus provide a sharper reconstruction than autoencoders, they are similarly limited by a single evaluation criterion for reconstruction error. These methods do not make full use of the rich information contained in large-scale datasets, and only judging anomalies by reconstruction error may ignore the deep relationship and high-order features between data.</p>
      </sec>
      <sec id="S2.SS2">
        <label>2.2</label>
        <title>Knowledge Distillation</title>
        <p id="S2.SS2.p1">The methodology of knowledge distillation revolves around a pre-established teacher network and a student network that undergoes training. The foundation of knowledge distillation lies in the interaction between an established network of teacher and student network being trained. The teacher network serves a pivotal role as a feature extractor, while the student network is tasked with reconstructing exclusively the normal data, leveraging the insights garnered from the teacher. When anomalies are input, the student network, not trained on anomalous samples, exhibits a significant discrepancy in feature extraction compared to the teacher, leading to weaker anomaly reconstruction and enabling anomaly detection based on feature map differences.</p>
        <p>
          <fig id="F1">
            <label>Figure 1.</label>
            <caption>
              <p>(a) Knowledge distillation architecture.(b) Reverse distillation architecture.</p>
            </caption>
            <graphic xlink:href="fig1.jpg"/>
            <!-- The element block 
 is currently not supported for the main body.
	-->
          </fig>
        </p>
        <p id="S2.SS2.p2">Bergmann et al. [<xref rid="ref008" ref-type="bibr">8</xref>] have implemented knowledge distillation in the realm of unsupervised anomaly detection, where they employ a pre-trained teacher model to guide the training of a student model on a dataset exclusively comprising normal data. This approach aims to achieve maximal consistency between the embedding outputs generated by both models, thereby enhancing the effectiveness of anomaly detection. Salehi et al. [<xref rid="ref019" ref-type="bibr">19</xref>] propose the MKD method to leverage the generalization ability of the middle layer semantic features. However, in knowledge distillation, the student model often mirrors the teacher model in terms of their architectural configuration or exhibits similarities, and the data flow is consistent between the two, which potentially lead to similar representations of anomalous data in both models. To address this issue, Deng et al. [<xref rid="ref020" ref-type="bibr">20</xref>] propose the RD architecture, featuring a heterogeneous teacher encoder and student decoder structure. The configurations for both the standard knowledge distillation and the reverse distillation approaches are shown in Figure <xref ref-type="fig" rid="F1">1</xref>.</p>
        <p id="S2.SS2.p3">In the field of knowledge distillation, although the effect of anomaly detection can be improved through the interaction of teacher network and student network, there are still some shortcomings in the existing methods. Firstly, the student network often tends to imitate the architecture and representation of the teacher network, resulting in that the two may have similar representations on abnormal data, reducing the sensitivity to anomalies. Secondly, in the traditional knowledge distillation process, the data flow is single, which may limit the ability to extract effective features from abnormal data. In addition, when dealing with complex or hidden anomalies, existing knowledge distillation methods may not fully capture the unique characteristics of these anomalies, resulting in poor detection results.</p>
        <p id="S2.SS2.p4">To overcome these shortcomings, this paper proposes a new approach to integrate SimAM and SCConv modules into the knowledge distillation framework. The SimAM module is able to enhance the sensitivity of the model to abnormal data, while the SCConv module helps to capture richer contextual information. At the same time, the FMM was introduced to further improve the network's ability to understand the image context. These improvements make the model more accurate to distinguish between normal samples and abnormal samples, thereby improving the efficiency of anomaly detection.</p>
        <p id="S2.SS2.p5">In summary, the improvements in this paper not only solve the limitations of existing methods in anomaly detection, but also improve the detection performance and robustness of the model by introducing new modules and mechanisms.</p>
      </sec>
    </sec>
    <sec id="S3">
      <label>3.</label>
      <title>Methodology</title>
      <sec id="S3.SS1">
        <label>3.1</label>
        <title>Overall Architecture</title>
        <p id="S3.SS1.p1">The presented approach's comprehensive framework, depicted in Figure <xref ref-type="fig" rid="F2">2</xref>, encapsulates the foundational structure of the inverse knowledge distillation paradigm, augmented by a strategic bottleneck module, and integrated with the Feature Modulation Mechanism (FMM). In this model, the central pillar for anomaly detection revolves around the utilization of reversed knowledge distillation, which harnesses an encoder-decoder paradigm as its fundamental structure. Utilizing the multi-scale feature fusion module (MSFF), the input feature maps are sequentially processed layer by layer, thereby outputting semantically richer feature maps through hierarchical fusion. The architecture incorporates a bottleneck unit, intricately woven with Spatial and Channel Reconstruction Convolutional blocks (SCConv) alongside a Simplified Attention Mechanism (SimAM), collectively referred to as the SCSAtt module, enables the model to capture cross-channel information and positional encoding information simultaneously. In the FMM, the model augments its anomaly detection capabilities by strategically obscuring pixel attributes within in a randomized fashion and subsequently employs a simple generative module to restore synthesized feature-level anomalies. These designs improve the model's accuracy and efficiency from multiple perspectives. In the following sections, the distinct roles and functionalities of each individual module will be elaborated upon in a comprehensive manner.</p>
        <p>
          <fig id="F2">
            <label>Figure 2.</label>
            <caption>
              <p>The overall framework of the proposed method.</p>
            </caption>
            <graphic xlink:href="fig2.jpg"/>
          </fig>
        </p>
      </sec>
      <sec id="S3.SS2">
        <label>3.2</label>
        <title>Reverse Distillation Model</title>
        <p id="S3.SS2.p1">In conventional knowledge distillation frameworks, the student network often mirrors the structure of the teacher model, either closely resembling or being identical to it, and processes raw imagery or data as its primary input. However, when anomalies occur in the model's operation, maintaining congruity in the T-S network architecture and data propagation pathways can potentially introduce data confusion during the knowledge transfer process, leading to the disappearance of activation differences and thus undermining the effectiveness of anomaly detection mechanisms. Although this issue can be mitigated by simplifying the network structure, it concurrently diminishes the model's precision in both identifying and pinpointing the targeted entities.</p>
        <p id="S3.SS2.p2">In an effort to confront the aforementioned obstacles, our study presents a novel reverse knowledge distillation framework, which leverages an encoder-decoder architecture to facilitate the transfer of knowledge from the teacher's deeper hierarchical levels towards the earlier layers of the student model. In this model, the teacher model processes the image to learn feature representations, while the student model is tasked with recovering these representations. To enhance single-category refinement, a bottleneck module is designed to bridge the teacher and student models.</p>
        <p id="S3.SS2.p3">In the process of training, taking into account the proficiency of ResNet [<xref rid="ref021" ref-type="bibr">21</xref>] and WideResNet [<xref rid="ref022" ref-type="bibr">22</xref>] architectures in distilling intricate features from image data, we opted for the WideResNet50 model, which had undergone pre-training on ImageNet [<xref rid="ref023" ref-type="bibr">23</xref>], as our teacher network encoder to facilitate the extraction of comprehensive and nuanced semantic information, and froze the teacher model parameters to ensure its effectiveness in extracting anomaly features during the inference phase. The input image <inline-formula><mml:math alttext="x_{n}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math></inline-formula> undergoes a process of feature extraction by the teacher network, resulting in a set of features denoted as <inline-formula><mml:math alttext="F_{E}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula>, where the superscript <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> represents the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th block within the teacher network architecture. After obtaining the output features, they are sent into the bottleneck module to generate a compact feature representation <inline-formula><mml:math alttext="F_{B}" display="inline"><mml:msub><mml:mi>F</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:math></inline-formula>. In order to align with the characteristic representations of the instructor encoder, the student network incorporates a decoder architecture that mirrors the structure of the teacher network, ensuring a symmetrical configuration. To fulfill the objective of replicating the feature representation <inline-formula><mml:math alttext="F_{D}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula>, the student network meticulously extracts the insights and characteristics encapsulated within the teacher network's output, and achieves learning of positive samples through comparison with the input data.</p>
        <p>
          <disp-formula-group id="S5.EGx1">
            <disp-formula id="S3.E1">
              <mml:math alttext="\displaystyle\begin{cases}F_{E}^{i}=\mathrm{Encoder}(x_{n})\\&#10;F_{B}=\mathrm{OCE}(\mathrm{SCAM}(\mathrm{MFF}(F_{E}^{i})))\\&#10;F_{D}^{i}=\mathrm{Decoder}(F_{B})\end{cases}" display="inline">
                <mml:mrow>
                  <mml:mo>{</mml:mo>
                  <mml:mtable columnspacing="5pt" rowspacing="0pt">
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>E</mml:mi>
                            <mml:mi>i</mml:mi>
                          </mml:msubsup>
                          <mml:mo>=</mml:mo>
                          <mml:mrow>
                            <mml:mi>Encoder</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:msub>
                                <mml:mi>x</mml:mi>
                                <mml:mi>n</mml:mi>
                              </mml:msub>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>F</mml:mi>
                            <mml:mi>B</mml:mi>
                          </mml:msub>
                          <mml:mo>=</mml:mo>
                          <mml:mrow>
                            <mml:mi>OCE</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mrow>
                                <mml:mi>SCAM</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mrow>
                                  <mml:mo stretchy="false">(</mml:mo>
                                  <mml:mrow>
                                    <mml:mi>MFF</mml:mi>
                                    <mml:mo>⁢</mml:mo>
                                    <mml:mrow>
                                      <mml:mo stretchy="false">(</mml:mo>
                                      <mml:msubsup>
                                        <mml:mi>F</mml:mi>
                                        <mml:mi>E</mml:mi>
                                        <mml:mi>i</mml:mi>
                                      </mml:msubsup>
                                      <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                  </mml:mrow>
                                  <mml:mo stretchy="false">)</mml:mo>
                                </mml:mrow>
                              </mml:mrow>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>D</mml:mi>
                            <mml:mi>i</mml:mi>
                          </mml:msubsup>
                          <mml:mo>=</mml:mo>
                          <mml:mrow>
                            <mml:mi>Decoder</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:msub>
                                <mml:mi>F</mml:mi>
                                <mml:mi>B</mml:mi>
                              </mml:msub>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                  </mml:mtable>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>
          <fig id="F3">
            <label>Figure 3.</label>
            <caption>
              <p>Bottleneck module structure.</p>
            </caption>
            <graphic xlink:href="fig3.jpg"/>
          </fig>
        </p>
        <p id="S3.SS2.p4">During the testing stage, the teacher model accurately extracts distinguishing characteristics from both standard and anomalous imagery. Nonetheless, the student model's capability in reconstructing the unique features of anomaly images remains incomplete, resulting in a notable discrepancy in the outputted features between the two models, thereby achieving the identification and detection of anomaly images.</p>
      </sec>
      <sec id="S3.SS3">
        <label>3.3</label>
        <title>Bottleneck Module</title>
        <p id="S3.SS3.p1">Figure <xref ref-type="fig" rid="F3">3</xref> illustrates the structural design of the bottleneck module. In reverse knowledge distillation, the primary objective of the student network is to recover the feature representation that is embodied within the teacher network. However, directly connecting the terminal coding block of the teacher network for activation outputs may lead to redundant and anomalous information in the high-dimensional features affecting the efficacy of the student network's reconstruction process. Hence, the bottleneck module crafted within this study primarily encompasses three distinct components: the MSFF module for fusing different scale feature information, the SCSAtt for aggregating local data, and the OCE [<xref rid="ref020" ref-type="bibr">20</xref>] for suppressing anomalous information. Subsequently, we will provide an introduction and analysis of each sub-module constituting the bottleneck module.</p>
        <p id="S3.SS3.p2">Within the MSFF module, three sets of input features, namely <inline-formula><mml:math alttext="F_{E}^{1}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math alttext="F_{E}^{2}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math alttext="F_{E}^{3}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>3</mml:mn></mml:msubsup></mml:math></inline-formula>, are considered. To enable their integration, <inline-formula><mml:math alttext="F_{E}^{1}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msubsup></mml:math></inline-formula> is down-sampled twice, while <inline-formula><mml:math alttext="F_{E}^{2}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula> is down-sampled once, aligning their dimensions with that of <inline-formula><mml:math alttext="F_{E}^{3}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>3</mml:mn></mml:msubsup></mml:math></inline-formula>. Each reduction in sample size is accomplished by utilizing a 3x3 convolutional layer with a stride of 2, which is then proceeded by Batch Normalization (BN), and lastly, a Rectified Linear Unit (ReLU) activation function is implemented to incorporate non-linear behavior. The post-down-sampling features, <inline-formula><mml:math alttext="F_{E}^{1^{\prime}}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:msup><mml:mn>1</mml:mn><mml:mo>′</mml:mo></mml:msup></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math alttext="F_{E}^{2^{\prime}}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:msup><mml:mn>2</mml:mn><mml:mo>′</mml:mo></mml:msup></mml:msubsup></mml:math></inline-formula>, are then concatenated with the original features <inline-formula><mml:math alttext="F_{E}^{3}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>3</mml:mn></mml:msubsup></mml:math></inline-formula> along the channel axis. Subsequent to this concatenation process, a 1x1 convolutional layer is implemented, featuring a stride of unity. This layer is accompanied by the utilization of a Rectified Linear Unit (ReLU) as the activation function and Batch Normalization, to yield a feature set that is both rich in information and compact in representation. The concise embedding mechanism effectively mitigates the dissemination of aberrant perturbations to the student model, thereby enhancing the distinctiveness of anomaly representations within the T-S framework.</p>
      </sec>
      <sec id="S3.SS4">
        <label>3.4</label>
        <title>SCSAtt Module</title>
        <p id="S3.SS4.p1">In the bottleneck module, the MSFF approach fuses multi-scale features extracted from the teacher network at the channel dimension, generating a fusion feature that matches the size of the last feature. However, there is a potential problem in this integration process: due to the texture information contained in the low-level features being compressed in the convolution operation, some feature information may become blurred. This makes it harder for subsequent layers to accurately recreate those elements, posing quite the conundrum for the neural net tasked with rebuilding them. When the results of this reconstruction are significantly different from the normal features of the teacher network, the model may misclassify.</p>
        <p id="S3.SS4.p2">To enhance the quality of the student network's initial features and boost its ability to detect unusual features, we designed SCSAtt module in the bottleneck module. This module comprises the Spatial and Channel reconstruction Convolution [<xref rid="ref024" ref-type="bibr">24</xref>] (SCConv) and the Simple Parameter-Free Attention Module [<xref rid="ref025" ref-type="bibr">25</xref>] (SimAM). As depicted in Figure <xref ref-type="fig" rid="F3">3</xref>, using SCConv instead of standard convolution to aggregate local information can reduce redundant features and enhance feature representation, while reducing complexity and computational cost. SimAM is placed after the <inline-formula><mml:math alttext="1\times 1" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> convolution module used for modeling channel interactions, to readjust the output multi-scale fusion features. This mechanism not only helps to ensure a good feature representation of the student network at the initial stage, but also identifies and suppresses anomalous features, thus blocking the propagation of anomalous information.</p>
      </sec>
      <sec id="S3.SS5">
        <label>3.5</label>
        <title>Lightweight Convolutional Module Scconv</title>
        <p id="S3.SS5.p1">In the realm of computational efficiency, the SCConv module emerges as a compact convolutional component, leveraging feature redundancy compression as detailed in [<xref rid="ref025" ref-type="bibr">25</xref>]. This module is composed of two integral units: the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU). The configuration of SCConv is visually depicted in Figure <xref ref-type="fig" rid="F4">4</xref>.</p>
        <p>
          <fig id="F4">
            <label>Figure 4.</label>
            <caption>
              <p>SCConv structure.</p>
            </caption>
            <graphic xlink:href="fig4.jpg"/>
          </fig>
        </p>
        <p id="S3.SS5.p2">The SRU module incorporates a distinctive feature separation-reconstruction process. The primary goal of feature separation is to distinguish between less valuable and repetitive feature maps from those that are more valuable and crucial. In essence, the input feature maps <inline-formula><mml:math alttext="X" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> are initially normalized as a group in both the vertical and horizontal dimensions.</p>
        <p>
          <disp-formula-group id="S5.EGx2">
            <disp-formula id="S3.E2">
              <mml:math alttext="\displaystyle X_{\mathrm{out}}=\gamma\frac{X-\mu}{\sqrt{\sigma^{2}+\varepsilon%&#10;}}+\beta,~{}X\in R^{N\times C\times H\times W}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>X</mml:mi>
                      <mml:mi>out</mml:mi>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mi>γ</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mstyle displaystyle="true">
                          <mml:mfrac>
                            <mml:mrow>
                              <mml:mi>X</mml:mi>
                              <mml:mo>−</mml:mo>
                              <mml:mi>μ</mml:mi>
                            </mml:mrow>
                            <mml:msqrt>
                              <mml:mrow>
                                <mml:msup>
                                  <mml:mi>σ</mml:mi>
                                  <mml:mn>2</mml:mn>
                                </mml:msup>
                                <mml:mo>+</mml:mo>
                                <mml:mi>ε</mml:mi>
                              </mml:mrow>
                            </mml:msqrt>
                          </mml:mfrac>
                        </mml:mstyle>
                      </mml:mrow>
                      <mml:mo>+</mml:mo>
                      <mml:mi>β</mml:mi>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo rspace="0.497em">,</mml:mo>
                  <mml:mrow>
                    <mml:mi>X</mml:mi>
                    <mml:mo>∈</mml:mo>
                    <mml:msup>
                      <mml:mi>R</mml:mi>
                      <mml:mrow>
                        <mml:mi>N</mml:mi>
                        <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                        <mml:mi>C</mml:mi>
                        <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                        <mml:mi>H</mml:mi>
                        <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                        <mml:mi>W</mml:mi>
                      </mml:mrow>
                    </mml:msup>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="N" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> refers to the batch size, <inline-formula><mml:math alttext="C" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> denotes the number of channels, and <inline-formula><mml:math alttext="H" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math alttext="W" display="inline"><mml:mi>W</mml:mi></mml:math></inline-formula> represent the spatial dimensions of the feature map, namely the height and width, respectively. Within the framework of normalization techniques, <inline-formula><mml:math alttext="\gamma" display="inline"><mml:mi>γ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math alttext="\beta" display="inline"><mml:mi>β</mml:mi></mml:math></inline-formula> are the learnable parameters of an affine transformation that plays a crucial role in adjusting the feature maps. To ensure numerical stability, a minuscule positive constant <inline-formula><mml:math alttext="\varepsilon" display="inline"><mml:mi>ε</mml:mi></mml:math></inline-formula> is incorporated.</p>
        <p id="S3.SS5.p3">The mean (<inline-formula><mml:math alttext="\mu" display="inline"><mml:mi>μ</mml:mi></mml:math></inline-formula>) and standard deviation (<inline-formula><mml:math alttext="\sigma" display="inline"><mml:mi>σ</mml:mi></mml:math></inline-formula>) of the input feature maps are calculated to standardize the distribution. Building upon this, the group normalization layer employs a trainable parameter <inline-formula><mml:math alttext="\gamma" display="inline"><mml:mi>γ</mml:mi></mml:math></inline-formula>, which is a vector in <inline-formula><mml:math alttext="R^{C}" display="inline"><mml:msup><mml:mi>R</mml:mi><mml:mi>C</mml:mi></mml:msup></mml:math></inline-formula>, to measure the variance of spatial pixels across each channel within the batch. This variance computation is pivotal for deriving the significance weights attributed to the distinct feature maps, thereby enhancing the model's ability to focus on the most informative features.</p>
        <p>
          <disp-formula-group id="S5.EGx3">
            <disp-formula id="S3.E3">
              <mml:math alttext="\displaystyle W_{\gamma}=\{w_{i}\}=\frac{\gamma_{i}}{\sum_{j=1}^{C}\gamma_{j}}%&#10;,~{}i,j=1,2,\ldots,C" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>W</mml:mi>
                      <mml:mi>γ</mml:mi>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">{</mml:mo>
                      <mml:msub>
                        <mml:mi>w</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                      <mml:mo stretchy="false">}</mml:mo>
                    </mml:mrow>
                    <mml:mo>=</mml:mo>
                    <mml:mstyle displaystyle="true">
                      <mml:mfrac>
                        <mml:msub>
                          <mml:mi>γ</mml:mi>
                          <mml:mi>i</mml:mi>
                        </mml:msub>
                        <mml:mrow>
                          <mml:msubsup>
                            <mml:mo>∑</mml:mo>
                            <mml:mrow>
                              <mml:mi>j</mml:mi>
                              <mml:mo>=</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                            <mml:mi>C</mml:mi>
                          </mml:msubsup>
                          <mml:msub>
                            <mml:mi>γ</mml:mi>
                            <mml:mi>j</mml:mi>
                          </mml:msub>
                        </mml:mrow>
                      </mml:mfrac>
                    </mml:mstyle>
                  </mml:mrow>
                  <mml:mo rspace="0.497em">,</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mi>i</mml:mi>
                      <mml:mo>,</mml:mo>
                      <mml:mi>j</mml:mi>
                    </mml:mrow>
                    <mml:mo>=</mml:mo>
                    <mml:mn>1</mml:mn>
                  </mml:mrow>
                  <mml:mo>,</mml:mo>
                  <mml:mrow>
                    <mml:mn>2</mml:mn>
                    <mml:mo>,</mml:mo>
                    <mml:mi mathvariant="normal">…</mml:mi>
                    <mml:mo>,</mml:mo>
                    <mml:mi>C</mml:mi>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS5.p4">The gating process is implemented by thresholding the weights for reconstructing the input features. The correlation weights, denoted by <inline-formula><mml:math alttext="W_{\gamma}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mi>γ</mml:mi></mml:msub></mml:math></inline-formula>, undergo a transformation through a sigmoid function, which effectively scales them to the interval (0, 1). Subsequently, a thresholding operation is applied to these normalized weights. Specifically, weights that exceed the threshold of 0.5 are assigned a value of 1, thereby identifying and highlighting the significant information weights, represented as <inline-formula><mml:math alttext="W_{1}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>. Conversely, weights falling below this threshold are set to 0, which helps in isolating the redundant information weights, denoted by <inline-formula><mml:math alttext="W_{2}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>.</p>
        <p>
          <disp-formula-group id="S5.EGx4">
            <disp-formula id="S3.E4">
              <mml:math alttext="\displaystyle W_{n}=\mathrm{Gate}(\mathrm{Sigmoid}(W_{\gamma}(X_{\mathrm{out}}%&#10;))),~{}n=1,2" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>W</mml:mi>
                      <mml:mi>n</mml:mi>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mi>Gate</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo stretchy="false">(</mml:mo>
                        <mml:mrow>
                          <mml:mi>Sigmoid</mml:mi>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mrow>
                              <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mi>γ</mml:mi>
                              </mml:msub>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                  <mml:mi>X</mml:mi>
                                  <mml:mi>out</mml:mi>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mo stretchy="false">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo rspace="0.497em">,</mml:mo>
                  <mml:mrow>
                    <mml:mi>n</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mn>1</mml:mn>
                      <mml:mo>,</mml:mo>
                      <mml:mn>2</mml:mn>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS5.p5">The two weights are multiplied element-by-element with the input feature <inline-formula><mml:math alttext="X" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> to obtain the important feature <inline-formula><mml:math alttext="X_{1}^{w}" display="inline"><mml:msubsup><mml:mi>X</mml:mi><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:msubsup></mml:math></inline-formula> and the redundant feature <inline-formula><mml:math alttext="X_{2}^{w}" display="inline"><mml:msubsup><mml:mi>X</mml:mi><mml:mn>2</mml:mn><mml:mi>w</mml:mi></mml:msubsup></mml:math></inline-formula> and the features are reconstructed and merged by adding <inline-formula><mml:math alttext="X_{1}^{w}" display="inline"><mml:msubsup><mml:mi>X</mml:mi><mml:mn>1</mml:mn><mml:mi>w</mml:mi></mml:msubsup></mml:math></inline-formula> and <inline-formula><mml:math alttext="X_{2}^{w}" display="inline"><mml:msubsup><mml:mi>X</mml:mi><mml:mn>2</mml:mn><mml:mi>w</mml:mi></mml:msubsup></mml:math></inline-formula> to obtain the spatial refinement feature <inline-formula><mml:math alttext="X^{w}" display="inline"><mml:msup><mml:mi>X</mml:mi><mml:mi>w</mml:mi></mml:msup></mml:math></inline-formula>:</p>
        <p>
          <disp-formula-group id="S5.EGx5">
            <disp-formula id="S3.E5">
              <mml:math alttext="\displaystyle\begin{cases}X_{1}^{w}=W_{1}\otimes X,\\&#10;X_{2}^{w}=W_{2}\otimes X,\\&#10;X_{11}^{w}\oplus X_{22}^{w}=X^{w1},X_{21}^{w}\oplus X_{12}^{w}=X^{w2},\\&#10;X^{w1}\cup X^{w2}=X^{w}.\end{cases}" display="inline">
                <mml:mrow>
                  <mml:mo>{</mml:mo>
                  <mml:mtable columnspacing="5pt" rowspacing="0pt">
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:msubsup>
                              <mml:mi>X</mml:mi>
                              <mml:mn>1</mml:mn>
                              <mml:mi>w</mml:mi>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:mrow>
                              <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mn>1</mml:mn>
                              </mml:msub>
                              <mml:mo lspace="0.222em" rspace="0.222em">⊗</mml:mo>
                              <mml:mi>X</mml:mi>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mo>,</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:msubsup>
                              <mml:mi>X</mml:mi>
                              <mml:mn>2</mml:mn>
                              <mml:mi>w</mml:mi>
                            </mml:msubsup>
                            <mml:mo>=</mml:mo>
                            <mml:mrow>
                              <mml:msub>
                                <mml:mi>W</mml:mi>
                                <mml:mn>2</mml:mn>
                              </mml:msub>
                              <mml:mo lspace="0.222em" rspace="0.222em">⊗</mml:mo>
                              <mml:mi>X</mml:mi>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mo>,</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:msubsup>
                                  <mml:mi>X</mml:mi>
                                  <mml:mn>11</mml:mn>
                                  <mml:mi>w</mml:mi>
                                </mml:msubsup>
                                <mml:mo>⊕</mml:mo>
                                <mml:msubsup>
                                  <mml:mi>X</mml:mi>
                                  <mml:mn>22</mml:mn>
                                  <mml:mi>w</mml:mi>
                                </mml:msubsup>
                              </mml:mrow>
                              <mml:mo>=</mml:mo>
                              <mml:msup>
                                <mml:mi>X</mml:mi>
                                <mml:mrow>
                                  <mml:mi>w</mml:mi>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mn>1</mml:mn>
                                </mml:mrow>
                              </mml:msup>
                            </mml:mrow>
                            <mml:mo>,</mml:mo>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:msubsup>
                                  <mml:mi>X</mml:mi>
                                  <mml:mn>21</mml:mn>
                                  <mml:mi>w</mml:mi>
                                </mml:msubsup>
                                <mml:mo>⊕</mml:mo>
                                <mml:msubsup>
                                  <mml:mi>X</mml:mi>
                                  <mml:mn>12</mml:mn>
                                  <mml:mi>w</mml:mi>
                                </mml:msubsup>
                              </mml:mrow>
                              <mml:mo>=</mml:mo>
                              <mml:msup>
                                <mml:mi>X</mml:mi>
                                <mml:mrow>
                                  <mml:mi>w</mml:mi>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mn>2</mml:mn>
                                </mml:mrow>
                              </mml:msup>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mo>,</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                    <mml:mtr>
                      <mml:mtd class="ltx_align_left" columnalign="left">
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mrow>
                              <mml:msup>
                                <mml:mi>X</mml:mi>
                                <mml:mrow>
                                  <mml:mi>w</mml:mi>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mn>1</mml:mn>
                                </mml:mrow>
                              </mml:msup>
                              <mml:mo>∪</mml:mo>
                              <mml:msup>
                                <mml:mi>X</mml:mi>
                                <mml:mrow>
                                  <mml:mi>w</mml:mi>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mn>2</mml:mn>
                                </mml:mrow>
                              </mml:msup>
                            </mml:mrow>
                            <mml:mo>=</mml:mo>
                            <mml:msup>
                              <mml:mi>X</mml:mi>
                              <mml:mi>w</mml:mi>
                            </mml:msup>
                          </mml:mrow>
                          <mml:mo lspace="0em">.</mml:mo>
                        </mml:mrow>
                      </mml:mtd>
                      <mml:mtd/>
                    </mml:mtr>
                  </mml:mtable>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="\otimes" display="inline"><mml:mo>⊗</mml:mo></mml:math></inline-formula> denotes element-by-element multiplication, <inline-formula><mml:math alttext="\oplus" display="inline"><mml:mo>⊕</mml:mo></mml:math></inline-formula> denotes element-by-element summation, and <inline-formula><mml:math alttext="\cup" display="inline"><mml:mo>∪</mml:mo></mml:math></inline-formula> denotes channel splicing. After the input features are processed by SRU, the input features are meticulously evaluated to discern the informative from the less informative elements.</p>
        <p id="S3.SS5.p6">For the redundancy in the channel dimension, CRU is used for processing, which operates through a systematic three-step process: splitting, transforming, and fusing. Initially, the split <inline-formula><mml:math alttext="X^{w}" display="inline"><mml:msup><mml:mi>X</mml:mi><mml:mi>w</mml:mi></mml:msup></mml:math></inline-formula> is bifurcated into two segments—the upper half <inline-formula><mml:math alttext="X_{\mathrm{up}}" display="inline"><mml:msub><mml:mi>X</mml:mi><mml:mi>up</mml:mi></mml:msub></mml:math></inline-formula> and the lower half <inline-formula><mml:math alttext="X_{\mathrm{low}}" display="inline"><mml:msub><mml:mi>X</mml:mi><mml:mi>low</mml:mi></mml:msub></mml:math></inline-formula>. In the transformation phase, a <inline-formula><mml:math alttext="k\times k" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula> grouped convolution (GWC) is applied to <inline-formula><mml:math alttext="X_{\mathrm{up}}" display="inline"><mml:msub><mml:mi>X</mml:mi><mml:mi>up</mml:mi></mml:msub></mml:math></inline-formula>, complemented by a <inline-formula><mml:math alttext="1\times 1" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> point-by-point convolution (PWC). These operations are strategically chosen over the conventional convolution to enhance feature representation. The results of these convolutions are then aggregated to form the feature map <inline-formula><mml:math alttext="Y_{1}" display="inline"><mml:msub><mml:mi>Y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula>. After performing the PWC convolution on <inline-formula><mml:math alttext="X_{\mathrm{low}}" display="inline"><mml:msub><mml:mi>X</mml:mi><mml:mi>low</mml:mi></mml:msub></mml:math></inline-formula> the outputs are again spliced with <inline-formula><mml:math alttext="X_{\mathrm{low}}" display="inline"><mml:msub><mml:mi>X</mml:mi><mml:mi>low</mml:mi></mml:msub></mml:math></inline-formula> along the channels to obtain the feature map <inline-formula><mml:math alttext="Y_{2}" display="inline"><mml:msub><mml:mi>Y</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>. The final stage, fusion, involves pooling the outputs to attenuate channel redundancy. During this phase, a channel descriptor <inline-formula><mml:math alttext="S_{m}" display="inline"><mml:msub><mml:mi>S</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:math></inline-formula> is derived, enriched with global spatial information, through the pooling operation:</p>
        <p>
          <disp-formula-group id="S5.EGx6">
            <disp-formula id="S3.Ex1">
              <mml:math alttext="\displaystyle S_{m}=\frac{1}{H\times W}\sum_{i=1}^{H}\sum_{j=1}^{W}Y_{m}(i,j)," display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>S</mml:mi>
                      <mml:mi>m</mml:mi>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mstyle displaystyle="true">
                        <mml:mfrac>
                          <mml:mn>1</mml:mn>
                          <mml:mrow>
                            <mml:mi>H</mml:mi>
                            <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                            <mml:mi>W</mml:mi>
                          </mml:mrow>
                        </mml:mfrac>
                      </mml:mstyle>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mstyle displaystyle="true">
                          <mml:munderover>
                            <mml:mo movablelimits="false">∑</mml:mo>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>=</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                            <mml:mi>H</mml:mi>
                          </mml:munderover>
                        </mml:mstyle>
                        <mml:mrow>
                          <mml:mstyle displaystyle="true">
                            <mml:munderover>
                              <mml:mo movablelimits="false">∑</mml:mo>
                              <mml:mrow>
                                <mml:mi>j</mml:mi>
                                <mml:mo>=</mml:mo>
                                <mml:mn>1</mml:mn>
                              </mml:mrow>
                              <mml:mi>W</mml:mi>
                            </mml:munderover>
                          </mml:mstyle>
                          <mml:mrow>
                            <mml:msub>
                              <mml:mi>Y</mml:mi>
                              <mml:mi>m</mml:mi>
                            </mml:msub>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mi>i</mml:mi>
                              <mml:mo>,</mml:mo>
                              <mml:mi>j</mml:mi>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>,</mml:mo>
                </mml:mrow>
              </mml:math>
            </disp-formula>
            <disp-formula id="S3.E6">
              <mml:math alttext="\displaystyle\quad m=1,2,~{}S_{m}\in R^{c\times 1\times 1}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:mi>m</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mrow>
                      <mml:mn>1</mml:mn>
                      <mml:mo>,</mml:mo>
                      <mml:mn>2</mml:mn>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo rspace="0.497em">,</mml:mo>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>S</mml:mi>
                      <mml:mi>m</mml:mi>
                    </mml:msub>
                    <mml:mo>∈</mml:mo>
                    <mml:msup>
                      <mml:mi>R</mml:mi>
                      <mml:mrow>
                        <mml:mi>c</mml:mi>
                        <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                        <mml:mn>1</mml:mn>
                        <mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                    </mml:msup>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS5.p8">The channel descriptors <inline-formula><mml:math alttext="S_{1},S_{2}" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> are then stacked together using channel soft attention in order to generate significant feature vectors <inline-formula><mml:math alttext="\bm{\beta}_{1},\bm{\beta}_{2}\in R^{c}" display="inline"><mml:mrow><mml:mrow><mml:msub><mml:mi>𝜷</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>𝜷</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mi>c</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>:</p>
        <p>
          <disp-formula-group id="S5.EGx7">
            <disp-formula id="S3.E7">
              <mml:math alttext="\displaystyle\bm{\beta}_{1}=\frac{e^{s1}}{e^{s1}+e^{s2}},~{}\bm{\beta}_{2}=%&#10;\frac{e^{s2}}{e^{s1}+e^{s2}},~{}\bm{\beta}_{1}+\bm{\beta}_{2}=1" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>𝜷</mml:mi>
                      <mml:mn>1</mml:mn>
                    </mml:msub>
                    <mml:mo>=</mml:mo>
                    <mml:mstyle displaystyle="true">
                      <mml:mfrac>
                        <mml:msup>
                          <mml:mi>e</mml:mi>
                          <mml:mrow>
                            <mml:mi>s</mml:mi>
                            <mml:mo>⁢</mml:mo>
                            <mml:mn>1</mml:mn>
                          </mml:mrow>
                        </mml:msup>
                        <mml:mrow>
                          <mml:msup>
                            <mml:mi>e</mml:mi>
                            <mml:mrow>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                          </mml:msup>
                          <mml:mo>+</mml:mo>
                          <mml:msup>
                            <mml:mi>e</mml:mi>
                            <mml:mrow>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mn>2</mml:mn>
                            </mml:mrow>
                          </mml:msup>
                        </mml:mrow>
                      </mml:mfrac>
                    </mml:mstyle>
                  </mml:mrow>
                  <mml:mo rspace="0.497em">,</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>𝜷</mml:mi>
                        <mml:mn>2</mml:mn>
                      </mml:msub>
                      <mml:mo>=</mml:mo>
                      <mml:mstyle displaystyle="true">
                        <mml:mfrac>
                          <mml:msup>
                            <mml:mi>e</mml:mi>
                            <mml:mrow>
                              <mml:mi>s</mml:mi>
                              <mml:mo>⁢</mml:mo>
                              <mml:mn>2</mml:mn>
                            </mml:mrow>
                          </mml:msup>
                          <mml:mrow>
                            <mml:msup>
                              <mml:mi>e</mml:mi>
                              <mml:mrow>
                                <mml:mi>s</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mn>1</mml:mn>
                              </mml:mrow>
                            </mml:msup>
                            <mml:mo>+</mml:mo>
                            <mml:msup>
                              <mml:mi>e</mml:mi>
                              <mml:mrow>
                                <mml:mi>s</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mn>2</mml:mn>
                              </mml:mrow>
                            </mml:msup>
                          </mml:mrow>
                        </mml:mfrac>
                      </mml:mstyle>
                    </mml:mrow>
                    <mml:mo rspace="0.497em">,</mml:mo>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:msub>
                          <mml:mi>𝜷</mml:mi>
                          <mml:mn>1</mml:mn>
                        </mml:msub>
                        <mml:mo>+</mml:mo>
                        <mml:msub>
                          <mml:mi>𝜷</mml:mi>
                          <mml:mn>2</mml:mn>
                        </mml:msub>
                      </mml:mrow>
                      <mml:mo>=</mml:mo>
                      <mml:mn>1</mml:mn>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS5.p9">Finally, <inline-formula><mml:math alttext="Y_{1}" display="inline"><mml:msub><mml:mi>Y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="Y_{2}" display="inline"><mml:msub><mml:mi>Y</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula> combine to yield the channel refinement feature <inline-formula><mml:math alttext="Y" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula>:</p>
        <p>
          <disp-formula-group id="S5.EGx8">
            <disp-formula id="S3.E8">
              <mml:math alttext="\displaystyle Y=\bm{\beta}_{1}Y_{1}+\bm{\beta}_{2}Y_{2}" display="inline">
                <mml:mrow>
                  <mml:mi>Y</mml:mi>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>𝜷</mml:mi>
                        <mml:mn>1</mml:mn>
                      </mml:msub>
                      <mml:mo>⁢</mml:mo>
                      <mml:msub>
                        <mml:mi>Y</mml:mi>
                        <mml:mn>1</mml:mn>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>+</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>𝜷</mml:mi>
                        <mml:mn>2</mml:mn>
                      </mml:msub>
                      <mml:mo>⁢</mml:mo>
                      <mml:msub>
                        <mml:mi>Y</mml:mi>
                        <mml:mn>2</mml:mn>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
      </sec>
      <sec id="S3.SS6">
        <label>3.6</label>
        <title>Parameter-free Attention Module Simam</title>
        <p id="S3.SS6.p1">SimAM is a complete three-dimensional, weighted, parameter-free attention mechanism [<xref rid="ref024" ref-type="bibr">24</xref>], which can provide appropriate weights for neurons with spatial information in the neural network, while suppressing extraneous information from the surrounding neurons, effectively improving the feature extraction ability. Figure <xref ref-type="fig" rid="F5">5</xref> illustrates the architecture of SimAM.</p>
        <p>
          <fig id="F5">
            <label>Figure 5.</label>
            <caption>
              <p>SimAM attention module.</p>
            </caption>
            <graphic xlink:href="fig5.jpg"/>
          </fig>
        </p>
        <p id="S3.SS6.p2">This module focuses precisely on key neurons and establishes the energy function, employing binary labeling and incorporating regular terms to ensure the target neuron achieves the lowest possible energy function.</p>
        <p>
          <disp-formula-group id="S5.EGx9">
            <disp-formula id="S3.E9">
              <mml:math alttext="\displaystyle e_{t}^{*}=\frac{4({\hat{\sigma}}^{2}+\lambda)}{{(t-\hat{\mu})}^{%&#10;2}+2{\hat{\sigma}}^{2}+2\lambda}" display="inline">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>e</mml:mi>
                    <mml:mi>t</mml:mi>
                    <mml:mo>∗</mml:mo>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mstyle displaystyle="true">
                    <mml:mfrac>
                      <mml:mrow>
                        <mml:mn>4</mml:mn>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mrow>
                            <mml:msup>
                              <mml:mover accent="true">
                                <mml:mi>σ</mml:mi>
                                <mml:mo>^</mml:mo>
                              </mml:mover>
                              <mml:mn>2</mml:mn>
                            </mml:msup>
                            <mml:mo>+</mml:mo>
                            <mml:mi>λ</mml:mi>
                          </mml:mrow>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                      <mml:mrow>
                        <mml:msup>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mrow>
                              <mml:mi>t</mml:mi>
                              <mml:mo>−</mml:mo>
                              <mml:mover accent="true">
                                <mml:mi>μ</mml:mi>
                                <mml:mo>^</mml:mo>
                              </mml:mover>
                            </mml:mrow>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                          <mml:mn>2</mml:mn>
                        </mml:msup>
                        <mml:mo>+</mml:mo>
                        <mml:mrow>
                          <mml:mn>2</mml:mn>
                          <mml:mo>⁢</mml:mo>
                          <mml:msup>
                            <mml:mover accent="true">
                              <mml:mi>σ</mml:mi>
                              <mml:mo>^</mml:mo>
                            </mml:mover>
                            <mml:mn>2</mml:mn>
                          </mml:msup>
                        </mml:mrow>
                        <mml:mo>+</mml:mo>
                        <mml:mrow>
                          <mml:mn>2</mml:mn>
                          <mml:mo>⁢</mml:mo>
                          <mml:mi>λ</mml:mi>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mfrac>
                  </mml:mstyle>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where</p>
        <p>
          <disp-formula id="S3.Ex2">
            <mml:math alttext="u_{t}=\frac{1}{M-1}\sum_{i=1}^{M-1}x_{i},~{}\sigma_{t}^{2}=\frac{1}{M-1}\sum_{%&#10;i=1}^{M-1}{(x_{i}-u_{t})}^{2}" display="block">
              <mml:mrow>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>u</mml:mi>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mfrac>
                      <mml:mn>1</mml:mn>
                      <mml:mrow>
                        <mml:mi>M</mml:mi>
                        <mml:mo>−</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                    </mml:mfrac>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:munderover>
                        <mml:mo movablelimits="false">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mrow>
                          <mml:mi>M</mml:mi>
                          <mml:mo>−</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                      </mml:munderover>
                      <mml:msub>
                        <mml:mi>x</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
                <mml:mo rspace="0.497em">,</mml:mo>
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>σ</mml:mi>
                    <mml:mi>t</mml:mi>
                    <mml:mn>2</mml:mn>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mfrac>
                      <mml:mn>1</mml:mn>
                      <mml:mrow>
                        <mml:mi>M</mml:mi>
                        <mml:mo>−</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                    </mml:mfrac>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:munderover>
                        <mml:mo movablelimits="false" rspace="0em">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mrow>
                          <mml:mi>M</mml:mi>
                          <mml:mo>−</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                      </mml:munderover>
                      <mml:msup>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mrow>
                            <mml:msub>
                              <mml:mi>x</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msub>
                            <mml:mo>−</mml:mo>
                            <mml:msub>
                              <mml:mi>u</mml:mi>
                              <mml:mi>t</mml:mi>
                            </mml:msub>
                          </mml:mrow>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                        <mml:mn>2</mml:mn>
                      </mml:msup>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:mrow>
            </mml:math>
          </disp-formula>
        </p>
        <p><inline-formula><mml:math alttext="u_{t}" display="inline"><mml:msub><mml:mi>u</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="\sigma_{t}^{2}" display="inline"><mml:msubsup><mml:mi>σ</mml:mi><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula> represent the mean and variance, respectively, of all neurons in the input feature channel, excluding the target neuron <inline-formula><mml:math alttext="t" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>. Here, <inline-formula><mml:math alttext="x_{i}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> denotes the activations of other neurons within the same channel. The parameter <inline-formula><mml:math alttext="\lambda" display="inline"><mml:mi>λ</mml:mi></mml:math></inline-formula> serves as a regularization coefficient, which helps to control the influence of the energy function on the overall model.</p>
        <p id="S3.SS6.p3">Theoretically, for each channel, there exists an energy function <inline-formula><mml:math alttext="E" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula> that is a function of <inline-formula><mml:math alttext="M=H\times W" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi>H</mml:mi><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mi>W</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math alttext="H" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math alttext="W" display="inline"><mml:mi>W</mml:mi></mml:math></inline-formula> are the spatial dimensions of the feature map. The formula implies that a lower energy value for a neuron <inline-formula><mml:math alttext="t" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> indicates a higher degree of differentiation from its neighboring neurons, i.e., the more linearly distinguishable it is and the more important it is. Feature refinement is performed using the scaling operator:</p>
        <p>
          <disp-formula-group id="S5.EGx10">
            <disp-formula id="S3.E10">
              <mml:math alttext="\displaystyle\tilde{X}=\mathrm{sigmoid}\left(\frac{1}{E}\right)\odot X" display="inline">
                <mml:mrow>
                  <mml:mover accent="true">
                    <mml:mi>X</mml:mi>
                    <mml:mo>~</mml:mo>
                  </mml:mover>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mrow>
                      <mml:mi>sigmoid</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mstyle displaystyle="true">
                          <mml:mfrac>
                            <mml:mn>1</mml:mn>
                            <mml:mi>E</mml:mi>
                          </mml:mfrac>
                        </mml:mstyle>
                        <mml:mo rspace="0.055em">)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo rspace="0.222em">⊙</mml:mo>
                    <mml:mi>X</mml:mi>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="X" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> is the input feature, <inline-formula><mml:math alttext="E" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula> groups all <inline-formula><mml:math alttext="e_{t}^{*}" display="inline"><mml:msubsup><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mo>∗</mml:mo></mml:msubsup></mml:math></inline-formula> across channels and spatial dimensions, and this collection of energy values <inline-formula><mml:math alttext="E" display="inline"><mml:mi>E</mml:mi></mml:math></inline-formula> is then subjected to a sigmoid function. The inclusion of the SCSAtt module empowers the model to allocate attention to the most informative parts of the input, thus potentially enhancing the overall performance of the model in terms of feature representation and reconstruction accuracy.</p>
      </sec>
      <sec id="S3.SS7">
        <label>3.7</label>
        <title>Feature Mask Generation Module (FMM)</title>
        <p id="S3.SS7.p1">In the framework of reverse knowledge distillation, the student network is tasked with acquiring feature representations from the teacher network. Nevertheless, when faced with a large amount of similar information, the student network faces challenges in identifying key information. Although we have introduced an attention mechanism to optimize the learning process, there is still an issue of unbalanced information learning. To address this problem, this paper further introduces the FMM, as shown in Figure <xref ref-type="fig" rid="F6">6</xref>. The FMM improves the efficiency of local information utilization and enhances the model's sensitivity to anomaly information by randomly masking pixel features within the student network. It then uses a generative module to recover these features, effectively simulating and synthesizing anomalies at the feature level [<xref rid="ref026" ref-type="bibr">26</xref>].</p>
        <p>
          <fig id="F6">
            <label>Figure 6.</label>
            <caption>
              <p>An illustration of FMM.</p>
            </caption>
            <graphic xlink:href="fig6.jpg"/>
          </fig>
        </p>
        <p id="S3.SS7.p2">In the FMM, random masking of all areas of the student network's output features is performed to simulate feature-level anomalies:</p>
        <p>
          <disp-formula-group id="S5.EGx11">
            <disp-formula id="S3.E11">
              <mml:math alttext="\displaystyle M_{A}^{i}(h,w)=\begin{cases}0,&amp;R^{i}(h,w)&lt;\lambda_{M}\\&#10;1,&amp;\text{otherwise}\end{cases}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msubsup>
                      <mml:mi>M</mml:mi>
                      <mml:mi>A</mml:mi>
                      <mml:mi>i</mml:mi>
                    </mml:msubsup>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mi>h</mml:mi>
                      <mml:mo>,</mml:mo>
                      <mml:mi>w</mml:mi>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mo>{</mml:mo>
                    <mml:mtable columnspacing="5pt" rowspacing="0pt">
                      <mml:mtr>
                        <mml:mtd class="ltx_align_left" columnalign="left">
                          <mml:mrow>
                            <mml:mn>0</mml:mn>
                            <mml:mo>,</mml:mo>
                          </mml:mrow>
                        </mml:mtd>
                        <mml:mtd class="ltx_align_left" columnalign="left">
                          <mml:mrow>
                            <mml:mrow>
                              <mml:msup>
                                <mml:mi>R</mml:mi>
                                <mml:mi>i</mml:mi>
                              </mml:msup>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>h</mml:mi>
                                <mml:mo>,</mml:mo>
                                <mml:mi>w</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>&lt;</mml:mo>
                            <mml:msub>
                              <mml:mi>λ</mml:mi>
                              <mml:mi>M</mml:mi>
                            </mml:msub>
                          </mml:mrow>
                        </mml:mtd>
                      </mml:mtr>
                      <mml:mtr>
                        <mml:mtd class="ltx_align_left" columnalign="left">
                          <mml:mrow>
                            <mml:mn>1</mml:mn>
                            <mml:mo>,</mml:mo>
                          </mml:mrow>
                        </mml:mtd>
                        <mml:mtd class="ltx_align_left" columnalign="left">
                          <mml:mtext>otherwise</mml:mtext>
                        </mml:mtd>
                      </mml:mtr>
                    </mml:mtable>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="h" display="inline"><mml:mi>h</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math alttext="w" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> represent the height and width respectively, <inline-formula><mml:math alttext="R^{i}" display="inline"><mml:msup><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msup></mml:math></inline-formula> symbolizes a random number falling within the range of (0, 1) on the feature image coordinate (<inline-formula><mml:math alttext="h,w" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi></mml:mrow></mml:math></inline-formula>). Moreover, <inline-formula><mml:math alttext="M_{A}^{i}" display="inline"><mml:msubsup><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> stands for the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th random mask designed to encapsulate the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th feature of the student, while <inline-formula><mml:math alttext="\lambda_{M}" display="inline"><mml:msub><mml:mi>λ</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:math></inline-formula> signifies the mask rate. Subsequently, this specific mask is applied to conceal the student's feature map with the aim of replicating the teacher's feature map.</p>
        <p>
          <disp-formula-group id="S5.EGx12">
            <disp-formula id="S3.Ex3">
              <mml:math alttext="\displaystyle F_{G}^{i}=G(f_{\mathrm{align}}(F_{D}^{i})\cdot M_{A}^{i})" display="inline">
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mi>F</mml:mi>
                    <mml:mi>G</mml:mi>
                    <mml:mi>i</mml:mi>
                  </mml:msubsup>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mi>G</mml:mi>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>f</mml:mi>
                            <mml:mi>align</mml:mi>
                          </mml:msub>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:msubsup>
                              <mml:mi>F</mml:mi>
                              <mml:mi>D</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msubsup>
                            <mml:mo rspace="0.055em" stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mo rspace="0.222em">⋅</mml:mo>
                        <mml:msubsup>
                          <mml:mi>M</mml:mi>
                          <mml:mi>A</mml:mi>
                          <mml:mi>i</mml:mi>
                        </mml:msubsup>
                      </mml:mrow>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
            <disp-formula id="S3.Ex4">
              <mml:math alttext="\displaystyle=W_{l2}(\mathrm{ReLU}(W_{l1}(f_{\mathrm{align}}(F_{D}^{i})\cdot M%&#10;_{A}^{i})))" display="inline">
                <mml:mrow>
                  <mml:mi/>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>W</mml:mi>
                      <mml:mrow>
                        <mml:mi>l</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mn>2</mml:mn>
                      </mml:mrow>
                    </mml:msub>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mrow>
                        <mml:mi>ReLU</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mo stretchy="false">(</mml:mo>
                          <mml:mrow>
                            <mml:msub>
                              <mml:mi>W</mml:mi>
                              <mml:mrow>
                                <mml:mi>l</mml:mi>
                                <mml:mo>⁢</mml:mo>
                                <mml:mn>1</mml:mn>
                              </mml:mrow>
                            </mml:msub>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mrow>
                                <mml:mrow>
                                  <mml:msub>
                                    <mml:mi>f</mml:mi>
                                    <mml:mi>align</mml:mi>
                                  </mml:msub>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mrow>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:msubsup>
                                      <mml:mi>F</mml:mi>
                                      <mml:mi>D</mml:mi>
                                      <mml:mi>i</mml:mi>
                                    </mml:msubsup>
                                    <mml:mo rspace="0.055em" stretchy="false">)</mml:mo>
                                  </mml:mrow>
                                </mml:mrow>
                                <mml:mo rspace="0.222em">⋅</mml:mo>
                                <mml:msubsup>
                                  <mml:mi>M</mml:mi>
                                  <mml:mi>A</mml:mi>
                                  <mml:mi>i</mml:mi>
                                </mml:msubsup>
                              </mml:mrow>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                          <mml:mo stretchy="false">)</mml:mo>
                        </mml:mrow>
                      </mml:mrow>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="F_{D}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> is the student network feature representation, <inline-formula><mml:math alttext="F_{G}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>G</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> is the final recovered features of <inline-formula><mml:math alttext="G" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula>. The generative module <inline-formula><mml:math alttext="G" display="inline"><mml:mi>G</mml:mi></mml:math></inline-formula> contains two convolutional layers <inline-formula><mml:math alttext="W_{l1}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>⁢</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="W_{l2}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>⁢</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, followed by a subsequent activation phase implemented via ReLU. This study applies <inline-formula><mml:math alttext="1\times 1" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> convolutional layers in the role of the adaptation layer, denoted as <inline-formula><mml:math alttext="f_{\mathrm{align}}" display="inline"><mml:msub><mml:mi>f</mml:mi><mml:mi>align</mml:mi></mml:msub></mml:math></inline-formula>, and utilizes <inline-formula><mml:math alttext="3\times 3" display="inline"><mml:mrow><mml:mn>3</mml:mn><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:math></inline-formula> convolutional layers as the projector layer <inline-formula><mml:math alttext="W_{l1}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>⁢</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula><mml:math alttext="W_{l2}" display="inline"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>⁢</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. For the convolutional neural network-based model, the deeper features usually have a larger sensory field and can represent the information of the original input image more comprehensively. Therefore, even if some pixels are masked, the complete feature map can be recovered from the remaining pixels. The primary goal of the FMM module is to assist the student network in attaining an improved representation by producing features for the teacher network, thus improving its performance in anomaly detection tasks.</p>
      </sec>
      <sec id="S3.SS8">
        <label>3.8</label>
        <title>Loss Function</title>
        <p id="S3.SS8.p1">For the design of distillation loss, we refer to the theories of two loss functions, MKD [<xref rid="ref019" ref-type="bibr">19</xref>] and RD [<xref rid="ref020" ref-type="bibr">20</xref>]. MKD integrates both the Euclidean metric and cosine similarity within its loss function framework. This hybrid approach capitalizes on the strengths of the cosine similarity method, which is particularly advantageous in traditional knowledge distillation (KD) architectures. RD takes this a step further by advocating for the exclusive use of cosine similarity in its loss function. Empirical evidence from RD suggests that relying solely on cosine similarity can effectively capture and represent the correlations between low-dimensional and high-dimensional feature spaces within the context of reverse knowledge distillation architectures. Building on these insights, the paper in question opts to adopt cosine similarity exclusively as the KD loss function for the T-S model.</p>
        <p id="S3.SS8.p2">For a given image <inline-formula><mml:math alttext="x_{n}" display="inline"><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math></inline-formula>, multi-layer intermediate features <inline-formula><mml:math alttext="F_{E}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> are first extracted from the first three residual stages of the pre-trained WideResNet50 teacher network. Then, <inline-formula><mml:math alttext="F_{E}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> is encoded into compact features <inline-formula><mml:math alttext="F_{B}" display="inline"><mml:msub><mml:mi>F</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:math></inline-formula> via the bottleneck module. Next, the student network generates the corresponding feature mapping <inline-formula><mml:math alttext="F_{D}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>D</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> based on <inline-formula><mml:math alttext="F_{B}" display="inline"><mml:msub><mml:mi>F</mml:mi><mml:mi>B</mml:mi></mml:msub></mml:math></inline-formula>, which is followed by outputting the features <inline-formula><mml:math alttext="F_{G}^{i}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>G</mml:mi><mml:mi>i</mml:mi></mml:msubsup></mml:math></inline-formula> via the mask generation module. To quantify the similarity between the features of the teacher network and the student network, we use the vector cosine distance as the loss function for training, and generate a two-dimensional anomaly score map <inline-formula><mml:math alttext="M^{i}" display="inline"><mml:msup><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msup></mml:math></inline-formula> at each level of scale:</p>
        <p>
          <disp-formula-group id="S5.EGx13">
            <disp-formula id="S3.E12">
              <mml:math alttext="\displaystyle M^{i}(h,w)=1-\frac{(F_{E}^{i}(h,w))^{\rm T}\cdot F_{G}^{i}(h,w)}%&#10;{\|F_{E}^{i}(h,w)\|\|F_{G}^{i}(h,w)\|}" display="inline">
                <mml:mrow>
                  <mml:mrow>
                    <mml:msup>
                      <mml:mi>M</mml:mi>
                      <mml:mi>i</mml:mi>
                    </mml:msup>
                    <mml:mo>⁢</mml:mo>
                    <mml:mrow>
                      <mml:mo stretchy="false">(</mml:mo>
                      <mml:mi>h</mml:mi>
                      <mml:mo>,</mml:mo>
                      <mml:mi>w</mml:mi>
                      <mml:mo stretchy="false">)</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mn>1</mml:mn>
                    <mml:mo>−</mml:mo>
                    <mml:mstyle displaystyle="true">
                      <mml:mfrac>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:msup>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mrow>
                                  <mml:msubsup>
                                    <mml:mi>F</mml:mi>
                                    <mml:mi>E</mml:mi>
                                    <mml:mi>i</mml:mi>
                                  </mml:msubsup>
                                  <mml:mo>⁢</mml:mo>
                                  <mml:mrow>
                                    <mml:mo stretchy="false">(</mml:mo>
                                    <mml:mi>h</mml:mi>
                                    <mml:mo>,</mml:mo>
                                    <mml:mi>w</mml:mi>
                                    <mml:mo stretchy="false">)</mml:mo>
                                  </mml:mrow>
                                </mml:mrow>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                              <mml:mi mathvariant="normal">T</mml:mi>
                            </mml:msup>
                            <mml:mo lspace="0.222em" rspace="0.222em">⋅</mml:mo>
                            <mml:msubsup>
                              <mml:mi>F</mml:mi>
                              <mml:mi>G</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msubsup>
                          </mml:mrow>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">(</mml:mo>
                            <mml:mi>h</mml:mi>
                            <mml:mo>,</mml:mo>
                            <mml:mi>w</mml:mi>
                            <mml:mo stretchy="false">)</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mo stretchy="false">‖</mml:mo>
                            <mml:mrow>
                              <mml:msubsup>
                                <mml:mi>F</mml:mi>
                                <mml:mi>E</mml:mi>
                                <mml:mi>i</mml:mi>
                              </mml:msubsup>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>h</mml:mi>
                                <mml:mo>,</mml:mo>
                                <mml:mi>w</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo stretchy="false">‖</mml:mo>
                          </mml:mrow>
                          <mml:mo>⁢</mml:mo>
                          <mml:mrow>
                            <mml:mo stretchy="false">‖</mml:mo>
                            <mml:mrow>
                              <mml:msubsup>
                                <mml:mi>F</mml:mi>
                                <mml:mi>G</mml:mi>
                                <mml:mi>i</mml:mi>
                              </mml:msubsup>
                              <mml:mo>⁢</mml:mo>
                              <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>h</mml:mi>
                                <mml:mo>,</mml:mo>
                                <mml:mi>w</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo stretchy="false">‖</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mfrac>
                    </mml:mstyle>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="h" display="inline"><mml:mi>h</mml:mi></mml:math></inline-formula> stands for the height of the feature map, <inline-formula><mml:math alttext="w" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> represents its width. Meanwhile, <inline-formula><mml:math alttext="F_{E}^{i}(h,w)" display="inline"><mml:mrow><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msubsup><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> signifies the feature vector of the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-th layer of the teacher network at a specific location on the map, while <inline-formula><mml:math alttext="F_{G}^{i}(h,w)" display="inline"><mml:mrow><mml:msubsup><mml:mi>F</mml:mi><mml:mi>G</mml:mi><mml:mi>i</mml:mi></mml:msubsup><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> represents the feature vector yielded by the FMM at that same point. To handle the differing sizes of the feature maps at each level, a mean compression method is used to compress the 2D anomaly maps <inline-formula><mml:math alttext="M^{i}(h,w)" display="inline"><mml:mrow><mml:msup><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msup><mml:mo>⁢</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:mi>w</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> at each level into a scalar and accumulated to obtain the loss function <inline-formula><mml:math alttext="L_{kd}" display="inline"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>⁢</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>:</p>
        <p>
          <disp-formula-group id="S5.EGx14">
            <disp-formula id="S3.E13">
              <mml:math alttext="\displaystyle L_{kd}=\sum_{i=1}^{3}\left\{\frac{1}{H^{i}W^{i}}\sum_{h,w=1}^{H^%&#10;{i}W^{i}}M^{i}(h,w)\right\}" display="inline">
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>L</mml:mi>
                    <mml:mrow>
                      <mml:mi>k</mml:mi>
                      <mml:mo>⁢</mml:mo>
                      <mml:mi>d</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:mstyle displaystyle="true">
                      <mml:munderover>
                        <mml:mo movablelimits="false">∑</mml:mo>
                        <mml:mrow>
                          <mml:mi>i</mml:mi>
                          <mml:mo>=</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mn>3</mml:mn>
                      </mml:munderover>
                    </mml:mstyle>
                    <mml:mrow>
                      <mml:mo>{</mml:mo>
                      <mml:mrow>
                        <mml:mstyle displaystyle="true">
                          <mml:mfrac>
                            <mml:mn>1</mml:mn>
                            <mml:mrow>
                              <mml:msup>
                                <mml:mi>H</mml:mi>
                                <mml:mi>i</mml:mi>
                              </mml:msup>
                              <mml:mo>⁢</mml:mo>
                              <mml:msup>
                                <mml:mi>W</mml:mi>
                                <mml:mi>i</mml:mi>
                              </mml:msup>
                            </mml:mrow>
                          </mml:mfrac>
                        </mml:mstyle>
                        <mml:mo>⁢</mml:mo>
                        <mml:mrow>
                          <mml:mstyle displaystyle="true">
                            <mml:munderover>
                              <mml:mo movablelimits="false">∑</mml:mo>
                              <mml:mrow>
                                <mml:mrow>
                                  <mml:mi>h</mml:mi>
                                  <mml:mo>,</mml:mo>
                                  <mml:mi>w</mml:mi>
                                </mml:mrow>
                                <mml:mo>=</mml:mo>
                                <mml:mn>1</mml:mn>
                              </mml:mrow>
                              <mml:mrow>
                                <mml:msup>
                                  <mml:mi>H</mml:mi>
                                  <mml:mi>i</mml:mi>
                                </mml:msup>
                                <mml:mo>⁢</mml:mo>
                                <mml:msup>
                                  <mml:mi>W</mml:mi>
                                  <mml:mi>i</mml:mi>
                                </mml:msup>
                              </mml:mrow>
                            </mml:munderover>
                          </mml:mstyle>
                          <mml:mrow>
                            <mml:msup>
                              <mml:mi>M</mml:mi>
                              <mml:mi>i</mml:mi>
                            </mml:msup>
                            <mml:mo>⁢</mml:mo>
                            <mml:mrow>
                              <mml:mo stretchy="false">(</mml:mo>
                              <mml:mi>h</mml:mi>
                              <mml:mo>,</mml:mo>
                              <mml:mi>w</mml:mi>
                              <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                          </mml:mrow>
                        </mml:mrow>
                      </mml:mrow>
                      <mml:mo>}</mml:mo>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p id="S3.SS8.p3">Since deeper key layers will lose more local information, the first 3 key layers of WideResNet50 are selected in the paper, which contain low-dimensional structural and high-dimensional semantic information.</p>
      </sec>
    </sec>
    <sec id="S4">
      <label>4.</label>
      <title>Experiments and Discussions</title>
      <sec id="S4.SS1">
        <label>4.1</label>
        <title>Dataset and Assessment Metrics</title>
        <p id="S4.SS1.p1">The experiments were conducted predominantly using the MVTec AD dataset [<xref rid="ref027" ref-type="bibr">27</xref>], housing 5354 images showcasing industrial products across 5 texture categories and 10 object categories.The abnormal areas in the dataset are accurately labeled and contain pixelated labels for 70 different types of abnormal defects. Within the dataset, training solely consists of normal samples, whereas the test set features a mix of normal and abnormal samples.</p>
        <p id="S4.SS1.p2">To thoroughly assess the model's effectiveness, we utilize both image-level and pixel-level metrics based on AUROC. The image-level AUROC represents a key indicator of the model's general anomaly detection proficiency, whereas the pixel-level AUROC emphasizes the model's accuracy in pinpointing anomalies. Furthermore, PRO [<xref rid="ref008" ref-type="bibr">8</xref>] acts as an additional metric, offering a more detailed evaluation of the model's anomaly localization capabilities.</p>
      </sec>
      <sec id="S4.SS2">
        <label>4.2</label>
        <title>Experimental Parameterization</title>
        <p id="S4.SS2.p1">The study was carried out in a setting furnished with Ubuntu 20.04 OS, an RTX 3090 GPU, PyTorch 1.10.0, and the CUDA 11.3 framework. Raw images were preprocessed and uniformly resized to <inline-formula><mml:math alttext="256\times 256" display="inline"><mml:mrow><mml:mn>256</mml:mn><mml:mo lspace="0.222em" rspace="0.222em">×</mml:mo><mml:mn>256</mml:mn></mml:mrow></mml:math></inline-formula> pixels. The architecture incorporates WideResNet50 as the foundational network.</p>
        <p id="S4.SS2.p2">To bolster the model's generalization and prevent overfitting, we integrated a suite of regularization strategies. The Adam optimizer was employed with an initial learning rate of 1e-3, alongside a batch size of 16. Early stopping was implemented to halt training if validation loss stagnated for a set number of epochs, thus avoiding excessive adaptation to training data. Additionally, we add L2 regularization to prevent the model from overfitting:</p>
        <p>
          <disp-formula-group id="S5.EGx15">
            <disp-formula id="S4.E1">
              <mml:math alttext="\displaystyle L=L_{kd}+\lambda_{R}\sum_{i=1}^{n}w_{i}^{2}" display="inline">
                <mml:mrow>
                  <mml:mi>L</mml:mi>
                  <mml:mo>=</mml:mo>
                  <mml:mrow>
                    <mml:msub>
                      <mml:mi>L</mml:mi>
                      <mml:mrow>
                        <mml:mi>k</mml:mi>
                        <mml:mo>⁢</mml:mo>
                        <mml:mi>d</mml:mi>
                      </mml:mrow>
                    </mml:msub>
                    <mml:mo>+</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>λ</mml:mi>
                        <mml:mi>R</mml:mi>
                      </mml:msub>
                      <mml:mo>⁢</mml:mo>
                      <mml:mrow>
                        <mml:mstyle displaystyle="true">
                          <mml:munderover>
                            <mml:mo movablelimits="false">∑</mml:mo>
                            <mml:mrow>
                              <mml:mi>i</mml:mi>
                              <mml:mo>=</mml:mo>
                              <mml:mn>1</mml:mn>
                            </mml:mrow>
                            <mml:mi>n</mml:mi>
                          </mml:munderover>
                        </mml:mstyle>
                        <mml:msubsup>
                          <mml:mi>w</mml:mi>
                          <mml:mi>i</mml:mi>
                          <mml:mn>2</mml:mn>
                        </mml:msubsup>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
              </mml:math>
            </disp-formula>
          </disp-formula-group>
        </p>
        <p>where <inline-formula><mml:math alttext="w_{i}" display="inline"><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the weight of the <inline-formula><mml:math alttext="i" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th parameter of the model, <inline-formula><mml:math alttext="n" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of parameters, <inline-formula><mml:math alttext="\lambda_{R}" display="inline"><mml:msub><mml:mi>λ</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:math></inline-formula> is the regularization coefficient. The model underwent training for 200 epochs.</p>
      </sec>
      <sec id="S4.SS3">
        <label>4.3</label>
        <title>Comparison Experiments</title>
        <p id="S4.SS3.p1">We performed comparative experiments with our proposed method and several mainstream anomaly detection algorithms on the MVTec AD dataset. The algorithms selected for comparison include SSIM and AE-L2 [<xref rid="ref028" ref-type="bibr">28</xref>], AnoGAN [<xref rid="ref016" ref-type="bibr">16</xref>], MKD [<xref rid="ref019" ref-type="bibr">19</xref>], SPADE [<xref rid="ref029" ref-type="bibr">29</xref>], top-K-M [<xref rid="ref041" ref-type="bibr">41</xref>], Patch-SVDD [<xref rid="ref030" ref-type="bibr">30</xref>], PaDiM [<xref rid="ref031" ref-type="bibr">31</xref>], STPM [<xref rid="ref032" ref-type="bibr">32</xref>], CutPaste [<xref rid="ref033" ref-type="bibr">33</xref>], DRAEM [<xref rid="ref034" ref-type="bibr">34</xref>], RD [<xref rid="ref020" ref-type="bibr">20</xref>], RecDMs [<xref rid="ref039" ref-type="bibr">39</xref>], and CA-AE [<xref rid="ref040" ref-type="bibr">40</xref>], which serve as our baselines.</p>
        <p>
          <table-wrap id="T1">
            <label>Table 1</label>
            <caption>
              <p>Comparative results of anomaly detection based on MvTec AD dataset AUROC (%).</p>
            </caption>
            <table>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" colspan="2" align="center">Category/Method</td>
                  <td style="border-top: 1px solid black;" align="center">SSIM-AE</td>
                  <td style="border-top: 1px solid black;" align="center">AnoGAN</td>
                  <td style="border-top: 1px solid black;" align="center">MKD</td>
                  <td style="border-top: 1px solid black;" align="center">SPADE</td>
                  <td style="border-top: 1px solid black;" align="center">top-K-M</td>
                  <td style="border-top: 1px solid black;" align="center">Patch-SVDD</td>
                  <td style="border-top: 1px solid black;" align="center">PaDiM</td>
                  <td style="border-top: 1px solid black;" align="center">STPM</td>
                  <td style="border-top: 1px solid black;" align="center">CutPaste</td>
                  <td style="border-top: 1px solid black;" align="center">DRAEM</td>
                  <td style="border-top: 1px solid black;" align="center">RD</td>
                  <td style="border-top: 1px solid black;" align="center">RecDMs</td>
                  <td style="border-top: 1px solid black;" align="center">CA-AE</td>
                  <td style="border-top: 1px solid black;" align="center">Ours</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="5" align="center">Textures</td>
                  <td style="border-top: 1px solid black;" align="center">Carpet</td>
                  <td style="border-top: 1px solid black;" align="center">67</td>
                  <td style="border-top: 1px solid black;" align="center">49</td>
                  <td style="border-top: 1px solid black;" align="center">79.3</td>
                  <td style="border-top: 1px solid black;" align="center">92.8</td>
                  <td style="border-top: 1px solid black;" align="center">89.4</td>
                  <td style="border-top: 1px solid black;" align="center">92.9</td>
                  <td style="border-top: 1px solid black;" align="center">99.8</td>
                  <td style="border-top: 1px solid black;" align="center">–</td>
                  <td style="border-top: 1px solid black;" align="center">93.9</td>
                  <td style="border-top: 1px solid black;" align="center">97</td>
                  <td style="border-top: 1px solid black;" align="center">98.9</td>
                  <td style="border-top: 1px solid black;" align="center">94.8</td>
                  <td style="border-top: 1px solid black;" align="center">85</td>
                  <td style="border-top: 1px solid black;" align="center">99.8</td>
                </tr>
                <tr>
                  <td align="center">Grid</td>
                  <td align="center">69</td>
                  <td align="center">51</td>
                  <td align="center">78</td>
                  <td align="center">47.3</td>
                  <td align="center">96.8</td>
                  <td align="center">94.6</td>
                  <td align="center">96.7</td>
                  <td align="center">–</td>
                  <td align="center">100</td>
                  <td align="center">99.9</td>
                  <td align="center">100</td>
                  <td align="center">99.5</td>
                  <td align="center">89.6</td>
                  <td align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Leather</td>
                  <td align="center">46</td>
                  <td align="center">52</td>
                  <td align="center">95.1</td>
                  <td align="center">95.4</td>
                  <td align="center">88.7</td>
                  <td align="center">90.9</td>
                  <td align="center">100</td>
                  <td align="center">–</td>
                  <td align="center">100</td>
                  <td align="center">100</td>
                  <td align="center">100</td>
                  <td align="center">100</td>
                  <td align="center">92</td>
                  <td align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Tile</td>
                  <td align="center">52</td>
                  <td align="center">51</td>
                  <td align="center">91.6</td>
                  <td align="center">96.5</td>
                  <td align="center">97.8</td>
                  <td align="center">97.8</td>
                  <td align="center">98.1</td>
                  <td align="center">–</td>
                  <td align="center">94.6</td>
                  <td align="center">99.6</td>
                  <td align="center">99.3</td>
                  <td align="center">99.8</td>
                  <td align="center">92.8</td>
                  <td align="center">99.8</td>
                </tr>
                <tr>
                  <td align="center">Wood</td>
                  <td align="center">83</td>
                  <td align="center">68</td>
                  <td align="center">94.3</td>
                  <td align="center">95.8</td>
                  <td align="center">92.6</td>
                  <td align="center">96.5</td>
                  <td align="center">99.2</td>
                  <td align="center">–</td>
                  <td align="center">99.1</td>
                  <td align="center">99.1</td>
                  <td align="center">99.2</td>
                  <td align="center">99.6</td>
                  <td align="center">95.3</td>
                  <td align="center">99.3</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="10" align="center">Objects</td>
                  <td style="border-top: 1px solid black;" align="center">Bottle</td>
                  <td style="border-top: 1px solid black;" align="center">88</td>
                  <td style="border-top: 1px solid black;" align="center">69</td>
                  <td style="border-top: 1px solid black;" align="center">99.4</td>
                  <td style="border-top: 1px solid black;" align="center">97.2</td>
                  <td style="border-top: 1px solid black;" align="center">95.7</td>
                  <td style="border-top: 1px solid black;" align="center">98.6</td>
                  <td style="border-top: 1px solid black;" align="center">99.9</td>
                  <td style="border-top: 1px solid black;" align="center">–</td>
                  <td style="border-top: 1px solid black;" align="center">98.2</td>
                  <td style="border-top: 1px solid black;" align="center">99.2</td>
                  <td style="border-top: 1px solid black;" align="center">100</td>
                  <td style="border-top: 1px solid black;" align="center">98.8</td>
                  <td style="border-top: 1px solid black;" align="center">94</td>
                  <td style="border-top: 1px solid black;" align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Cable</td>
                  <td align="center">61</td>
                  <td align="center">53</td>
                  <td align="center">89.2</td>
                  <td align="center">84.8</td>
                  <td align="center">60.8</td>
                  <td align="center">90.3</td>
                  <td align="center">92.7</td>
                  <td align="center">–</td>
                  <td align="center">81.2</td>
                  <td align="center">91.8</td>
                  <td align="center">95</td>
                  <td align="center">92.1</td>
                  <td align="center">93</td>
                  <td align="center">97.2</td>
                </tr>
                <tr>
                  <td align="center">Capsule</td>
                  <td align="center">61</td>
                  <td align="center">61</td>
                  <td align="center">80.5</td>
                  <td align="center">89.7</td>
                  <td align="center">74.3</td>
                  <td align="center">76.7</td>
                  <td align="center">91.3</td>
                  <td align="center">–</td>
                  <td align="center">98.2</td>
                  <td align="center">98.5</td>
                  <td align="center">96.3</td>
                  <td align="center">97.9</td>
                  <td align="center">83.7</td>
                  <td align="center">98.7</td>
                </tr>
                <tr>
                  <td align="center">Hazelnut</td>
                  <td align="center">54</td>
                  <td align="center">50</td>
                  <td align="center">98.4</td>
                  <td align="center">88.1</td>
                  <td align="center">97.2</td>
                  <td align="center">92</td>
                  <td align="center">92</td>
                  <td align="center">–</td>
                  <td align="center">98.3</td>
                  <td align="center">100</td>
                  <td align="center">99.9</td>
                  <td align="center">98.9</td>
                  <td align="center">100</td>
                  <td align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Metal nut</td>
                  <td align="center">54</td>
                  <td align="center">50</td>
                  <td align="center">73.6</td>
                  <td align="center">71</td>
                  <td align="center">73.4</td>
                  <td align="center">94</td>
                  <td align="center">98.7</td>
                  <td align="center">–</td>
                  <td align="center">99.9</td>
                  <td align="center">98.7</td>
                  <td align="center">100</td>
                  <td align="center">96.6</td>
                  <td align="center">89.5</td>
                  <td align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Pill</td>
                  <td align="center">60</td>
                  <td align="center">62</td>
                  <td align="center">82.7</td>
                  <td align="center">80.1</td>
                  <td align="center">52.5</td>
                  <td align="center">86.1</td>
                  <td align="center">93.3</td>
                  <td align="center">–</td>
                  <td align="center">94.9</td>
                  <td align="center">98.9</td>
                  <td align="center">96.6</td>
                  <td align="center">95.3</td>
                  <td align="center">86.3</td>
                  <td align="center">97.3</td>
                </tr>
                <tr>
                  <td align="center">Screw</td>
                  <td align="center">51</td>
                  <td align="center">35</td>
                  <td align="center">83.3</td>
                  <td align="center">66.7</td>
                  <td align="center">84.4</td>
                  <td align="center">81.3</td>
                  <td align="center">85.8</td>
                  <td align="center">–</td>
                  <td align="center">88.7</td>
                  <td align="center">93.9</td>
                  <td align="center">97</td>
                  <td align="center">99.8</td>
                  <td align="center">100</td>
                  <td align="center">98.7</td>
                </tr>
                <tr>
                  <td align="center">Toothbrush</td>
                  <td align="center">74</td>
                  <td align="center">57</td>
                  <td align="center">92.2</td>
                  <td align="center">88.9</td>
                  <td align="center">89.8</td>
                  <td align="center">100</td>
                  <td align="center">96.1</td>
                  <td align="center">–</td>
                  <td align="center">99.4</td>
                  <td align="center">100</td>
                  <td align="center">99.5</td>
                  <td align="center">96.3</td>
                  <td align="center">93.2</td>
                  <td align="center">100</td>
                </tr>
                <tr>
                  <td align="center">Transistor</td>
                  <td align="center">52</td>
                  <td align="center">67</td>
                  <td align="center">85.6</td>
                  <td align="center">90.3</td>
                  <td align="center">74.6</td>
                  <td align="center">91.5</td>
                  <td align="center">97.4</td>
                  <td align="center">–</td>
                  <td align="center">96.1</td>
                  <td align="center">93.1</td>
                  <td align="center">96.7</td>
                  <td align="center">99.5</td>
                  <td align="center">86.8</td>
                  <td align="center">98.1</td>
                </tr>
                <tr>
                  <td align="center">Zipper</td>
                  <td align="center">80</td>
                  <td align="center">59</td>
                  <td align="center">93.2</td>
                  <td align="center">96.6</td>
                  <td align="center">91.8</td>
                  <td align="center">97.9</td>
                  <td align="center">90.3</td>
                  <td align="center">–</td>
                  <td align="center">99.9</td>
                  <td align="center">100</td>
                  <td align="center">98.5</td>
                  <td align="center">98.8</td>
                  <td align="center">89.7</td>
                  <td align="center">98.2</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" colspan="2" align="left">Mean</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">63</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">55</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">87.8</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">85.4</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">84</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">92.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">95.5</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">95.5</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">96.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">98</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">98.5</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">98.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">91.2</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">99.1</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <p id="S4.SS3.p2">Table <xref rid="T1" ref-type="table">1</xref> records the AUROC scores at the image level, with the most outstanding results for particular categories emphasized in bold print. Our approach consistently delivered the top or second-best detection performance across all categories, with an overall AUROC reaching 99.1%, indicating the proposed method's good adaptability to different objects, shapes, and texture variations.</p>
        <p>
          <table-wrap id="T2">
            <label>Table 2</label>
            <caption>
              <p>Comparative results of anomaly detection based on CIFAR-10 dataset AUROC (%).</p>
            </caption>
            <table>
              <thead>
                <tr>
                  <th style="border-right: 1px solid black;border-top: 1px solid black;" align="center">Category/Method</th>
                  <th style="border-top: 1px solid black;" align="center">OCGAN</th>
                  <th style="border-top: 1px solid black;" align="center">LSA</th>
                  <th style="border-top: 1px solid black;" align="center">CAVGA-D</th>
                  <th style="border-top: 1px solid black;" align="center">US</th>
                  <th style="border-top: 1px solid black;" align="center">AnoGAN</th>
                  <th style="border-top: 1px solid black;" align="center">GT</th>
                  <th style="border-top: 1px solid black;" align="center">MKD</th>
                  <th style="border-top: 1px solid black;"/>
                </tr>
              </thead>
              <tbody>
                <tr>
                  <th style="border-right: 1px solid black;border-top: 1px solid black;" align="center">Airplane</th>
                  <td style="border-top: 1px solid black;" align="center">75.7</td>
                  <td style="border-top: 1px solid black;" align="center">73.5</td>
                  <td style="border-top: 1px solid black;" align="center">65.3</td>
                  <td style="border-top: 1px solid black;" align="center">78.9</td>
                  <td style="border-top: 1px solid black;" align="center">67.1</td>
                  <td style="border-top: 1px solid black;" align="center">76.2</td>
                  <td style="border-top: 1px solid black;" align="center">90.5</td>
                  <td style="border-top: 1px solid black;" align="center">Ours</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Car</th>
                  <td align="center">53.1</td>
                  <td align="center">58</td>
                  <td align="center">78.4</td>
                  <td align="center">84.9</td>
                  <td align="center">54.7</td>
                  <td align="center">84.8</td>
                  <td align="center">89.1</td>
                  <td align="center">93.1</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Bird</th>
                  <td align="center">64</td>
                  <td align="center">69</td>
                  <td align="center">76.1</td>
                  <td align="center">73.4</td>
                  <td align="center">52.9</td>
                  <td align="center">77.1</td>
                  <td align="center">80</td>
                  <td align="center">93.8</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Cat</th>
                  <td align="center">63</td>
                  <td align="center">54.2</td>
                  <td align="center">74.7</td>
                  <td align="center">74.8</td>
                  <td align="center">54.5</td>
                  <td align="center">73.2</td>
                  <td align="center">76.7</td>
                  <td align="center">82.1</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Deer</th>
                  <td align="center">72.3</td>
                  <td align="center">76.1</td>
                  <td align="center">77.5</td>
                  <td align="center">85.1</td>
                  <td align="center">65.1</td>
                  <td align="center">82.8</td>
                  <td align="center">87.1</td>
                  <td align="center">81.2</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Dog</th>
                  <td align="center">62</td>
                  <td align="center">54.6</td>
                  <td align="center">55.2</td>
                  <td align="center">79.3</td>
                  <td align="center">60.3</td>
                  <td align="center">84.8</td>
                  <td align="center">91.8</td>
                  <td align="center">91.6</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Frog</th>
                  <td align="center">72.3</td>
                  <td align="center">75.1</td>
                  <td align="center">81.3</td>
                  <td align="center">89.2</td>
                  <td align="center">58.5</td>
                  <td align="center">82</td>
                  <td align="center">89.3</td>
                  <td align="center">93</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Horse</th>
                  <td align="center">57.5</td>
                  <td align="center">53.5</td>
                  <td align="center">74.5</td>
                  <td align="center">83</td>
                  <td align="center">62.5</td>
                  <td align="center">88.7</td>
                  <td align="center">86.1</td>
                  <td align="center">93.8</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Ship</th>
                  <td align="center">82</td>
                  <td align="center">71.7</td>
                  <td align="center">80.1</td>
                  <td align="center">86.2</td>
                  <td align="center">75.8</td>
                  <td align="center">89.5</td>
                  <td align="center">91.6</td>
                  <td align="center">95.4</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;" align="center">Truck</th>
                  <td align="center">55.4</td>
                  <td align="center">54.8</td>
                  <td align="center">74.1</td>
                  <td align="center">84.8</td>
                  <td align="center">66.5</td>
                  <td align="center">83.4</td>
                  <td align="center">88.7</td>
                  <td align="center">90.5</td>
                </tr>
                <tr>
                  <th style="border-right: 1px solid black;border-top: 1px solid black;border-bottom: 1px solid black;" align="center">Mean</th>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">65.7</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">64.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">73.7</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">82</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">61.8</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">82.3</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">87.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">90.9</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <p id="S4.SS3.p3">In order to further evaluate the effectiveness and generality of the proposed method, the defect detection is carried out on the CIFAR-10 dataset and compared with seven related methods: OCGAN [<xref rid="ref035" ref-type="bibr">35</xref>], LSA [<xref rid="ref036" ref-type="bibr">36</xref>], CAVGA-D [<xref rid="ref037" ref-type="bibr">37</xref>], US [<xref rid="ref008" ref-type="bibr">8</xref>], AnoGAN [<xref rid="ref016" ref-type="bibr">16</xref>], GT [<xref rid="ref038" ref-type="bibr">38</xref>] and MKD [<xref rid="ref019" ref-type="bibr">19</xref>]. The experimental results are shown in Table <xref rid="T2" ref-type="table">2</xref>. The AUROC value reaches 90.9%, which is 4 percentage points higher than that of MKD, and achieves the best results in all categories.</p>
        <p>
          <table-wrap id="T3">
            <label>Table 3</label>
            <caption>
              <p>Comparative results of anomaly localization based on MvTec AD dataset AUROC/PRO (%).</p>
            </caption>
            <table>
              <tbody>
                <tr>
                  <td style="border-top: 1px solid black;" colspan="2" align="center">Category/Method</td>
                  <td style="border-top: 1px solid black;" align="center">L2-AE</td>
                  <td style="border-top: 1px solid black;" align="center">SSIM-AE</td>
                  <td style="border-top: 1px solid black;" align="center">AnoGAN</td>
                  <td style="border-top: 1px solid black;" align="center">MKD</td>
                  <td style="border-top: 1px solid black;" align="center">SPADE</td>
                  <td style="border-top: 1px solid black;" align="center">Patch-SVDD</td>
                  <td style="border-top: 1px solid black;" align="center">PaDiM</td>
                  <td style="border-top: 1px solid black;" align="center">CutPaste</td>
                  <td style="border-top: 1px solid black;" align="center">STPM</td>
                  <td style="border-top: 1px solid black;" align="center">DRAEM</td>
                  <td style="border-top: 1px solid black;" align="center">RD</td>
                  <td style="border-top: 1px solid black;" align="center">RecDMs</td>
                  <td style="border-top: 1px solid black;" align="center">CA-AE</td>
                  <td style="border-top: 1px solid black;" align="center">Ours</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="5" align="center">Textures</td>
                  <td style="border-top: 1px solid black;" align="center">Carpet</td>
                  <td style="border-top: 1px solid black;" align="center">59/45.6</td>
                  <td style="border-top: 1px solid black;" align="center">87/64.7</td>
                  <td style="border-top: 1px solid black;" align="center">54/20.4</td>
                  <td style="border-top: 1px solid black;" align="center">95.6/-</td>
                  <td style="border-top: 1px solid black;" align="center">97.5/94.7</td>
                  <td style="border-top: 1px solid black;" align="center">98.1/-</td>
                  <td style="border-top: 1px solid black;" align="center">99.1/96.2</td>
                  <td style="border-top: 1px solid black;" align="center">98.3/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.8/95.8</td>
                  <td style="border-top: 1px solid black;" align="center">95.5/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.9/97.0</td>
                  <td style="border-top: 1px solid black;" align="center">96.1/-</td>
                  <td style="border-top: 1px solid black;" align="center">88.4/88.0</td>
                  <td style="border-top: 1px solid black;" align="center">99.3/98.1</td>
                </tr>
                <tr>
                  <td align="center">Grid</td>
                  <td align="center">90/58.2</td>
                  <td align="center">94/84.9</td>
                  <td align="center">58/22.6</td>
                  <td align="center">91.8/-</td>
                  <td align="center">93.7/86.7</td>
                  <td align="center">96.8/-</td>
                  <td align="center">97.3/94.6</td>
                  <td align="center">97.5/-</td>
                  <td align="center">99/96.6</td>
                  <td align="center">99.7/-</td>
                  <td align="center">99.3/97.6</td>
                  <td align="center">93.4/-</td>
                  <td align="center">97.2/96.3</td>
                  <td align="center">99.5/97.7</td>
                </tr>
                <tr>
                  <td align="center">Leather</td>
                  <td align="center">75/81.9</td>
                  <td align="center">78/56.1</td>
                  <td align="center">64/37.8</td>
                  <td align="center">98.1/-</td>
                  <td align="center">97.6/97.2</td>
                  <td align="center">95.8/-</td>
                  <td align="center">99.2/97.8</td>
                  <td align="center">99.5/-</td>
                  <td align="center">99.3/98</td>
                  <td align="center">98.6/-</td>
                  <td align="center">99.4/99.1</td>
                  <td align="center">99.6/-</td>
                  <td align="center">96.6/95.0</td>
                  <td align="center">99.5/99.3</td>
                </tr>
                <tr>
                  <td align="center">Tile</td>
                  <td align="center">59/89.7</td>
                  <td align="center">59/17.5</td>
                  <td align="center">50/17.7</td>
                  <td align="center">82.8/-</td>
                  <td align="center">87.4/75.9</td>
                  <td align="center">92.6/-</td>
                  <td align="center">94.1/86.0</td>
                  <td align="center">90.5/-</td>
                  <td align="center">97.4/92.1</td>
                  <td align="center">99.2/-</td>
                  <td align="center">95.6/90.6</td>
                  <td align="center">93.5/-</td>
                  <td align="center">92.8/93.4</td>
                  <td align="center">97.9/92.8</td>
                </tr>
                <tr>
                  <td align="center">Wood</td>
                  <td align="center">73/72.7</td>
                  <td align="center">30/60.5</td>
                  <td align="center">62/38.6</td>
                  <td align="center">84.8/-</td>
                  <td align="center">88.5/87.4</td>
                  <td align="center">96.2/-</td>
                  <td align="center">94.9/91.1</td>
                  <td align="center">95.5/-</td>
                  <td align="center">97.2/93.6</td>
                  <td align="center">96.4/-</td>
                  <td align="center">95.3/90.9</td>
                  <td align="center">94.7/-</td>
                  <td align="center">91.4/90.0</td>
                  <td align="center">96.6/92.9</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;" rowspan="10" align="center">Objects</td>
                  <td style="border-top: 1px solid black;" align="center">Bottle</td>
                  <td style="border-top: 1px solid black;" align="center">86/91.0</td>
                  <td style="border-top: 1px solid black;" align="center">93/83.4</td>
                  <td style="border-top: 1px solid black;" align="center">86/62.0</td>
                  <td style="border-top: 1px solid black;" align="center">96.3/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.4/95.5</td>
                  <td style="border-top: 1px solid black;" align="center">97.6/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.3/94.8</td>
                  <td style="border-top: 1px solid black;" align="center">97.6/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.8/95.1</td>
                  <td style="border-top: 1px solid black;" align="center">99.1/-</td>
                  <td style="border-top: 1px solid black;" align="center">98.7/96.6</td>
                  <td style="border-top: 1px solid black;" align="center">94.3/-</td>
                  <td style="border-top: 1px solid black;" align="center">95.1/92.1</td>
                  <td style="border-top: 1px solid black;" align="center">98.9/97.8</td>
                </tr>
                <tr>
                  <td align="center">Cable</td>
                  <td align="center">86/82.5</td>
                  <td align="center">82/47.8</td>
                  <td align="center">78/38.3</td>
                  <td align="center">82.4/-</td>
                  <td align="center">97.2/90.9</td>
                  <td align="center">97.4/-</td>
                  <td align="center">96.7/88.8</td>
                  <td align="center">90.0/-</td>
                  <td align="center">95.5/87.8</td>
                  <td align="center">94.7/-</td>
                  <td align="center">97.4/91</td>
                  <td align="center">94.7/-</td>
                  <td align="center">92.6/91.4</td>
                  <td align="center">98.1/94.3</td>
                </tr>
                <tr>
                  <td align="center">Capsule</td>
                  <td align="center">88/86.2</td>
                  <td align="center">94/86.0</td>
                  <td align="center">84/30.6</td>
                  <td align="center">95.9/-</td>
                  <td align="center">99.0/93.7</td>
                  <td align="center">98.0/-</td>
                  <td align="center">98.5/93.5</td>
                  <td align="center">97.4/-</td>
                  <td align="center">98.3/92.2</td>
                  <td align="center">94.3/-</td>
                  <td align="center">98.7/95.8</td>
                  <td align="center">97.2/-</td>
                  <td align="center">93.1/92.2</td>
                  <td align="center">98.8/96.1</td>
                </tr>
                <tr>
                  <td align="center">Hazelnut</td>
                  <td align="center">95/91.7</td>
                  <td align="center">97/91.6</td>
                  <td align="center">87/69.8</td>
                  <td align="center">94.6/-</td>
                  <td align="center">99.1/95.4</td>
                  <td align="center">95.1/-</td>
                  <td align="center">98.2/92.6</td>
                  <td align="center">97.3/-</td>
                  <td align="center">98.5/94.3</td>
                  <td align="center">99.7/-</td>
                  <td align="center">98.9/95.5</td>
                  <td align="center">95.0/-</td>
                  <td align="center">98.2/99.0</td>
                  <td align="center">99.1/96.7</td>
                </tr>
                <tr>
                  <td align="center">Metal nut</td>
                  <td align="center">86/83.0</td>
                  <td align="center">89/60.3</td>
                  <td align="center">76/32.0</td>
                  <td align="center">86.4/-</td>
                  <td align="center">98.1/94.4</td>
                  <td align="center">95.7/-</td>
                  <td align="center">97.2/85.6</td>
                  <td align="center">93.1/-</td>
                  <td align="center">97.6/94.5</td>
                  <td align="center">99.5/-</td>
                  <td align="center">97.3/92.3</td>
                  <td align="center">92.7/-</td>
                  <td align="center">91.0/89.0</td>
                  <td align="center">98.4/93</td>
                </tr>
                <tr>
                  <td align="center">Pill</td>
                  <td align="center">91/89.3</td>
                  <td align="center">91/83.0</td>
                  <td align="center">87/77.6</td>
                  <td align="center">89.6/-</td>
                  <td align="center">96.5/94.6</td>
                  <td align="center">91.4/-</td>
                  <td align="center">95.7/92.7</td>
                  <td align="center">95.7/-</td>
                  <td align="center">97.8/96.5</td>
                  <td align="center">97.6/-</td>
                  <td align="center">98.2/96.4</td>
                  <td align="center">95.6/-</td>
                  <td align="center">92.6/93.2</td>
                  <td align="center">98.5/96.6</td>
                </tr>
                <tr>
                  <td align="center">Screw</td>
                  <td align="center">96/75.4</td>
                  <td align="center">96/88.7</td>
                  <td align="center">80/46.6</td>
                  <td align="center">96.0/-</td>
                  <td align="center">98.9/96.0</td>
                  <td align="center">98.1/-</td>
                  <td align="center">98.5/94.4</td>
                  <td align="center">96.7/-</td>
                  <td align="center">98.3/93</td>
                  <td align="center">97.6/-</td>
                  <td align="center">99.6/98.2</td>
                  <td align="center">93.9/-</td>
                  <td align="center">97.7/100</td>
                  <td align="center">99.6/98.5</td>
                </tr>
                <tr>
                  <td align="center">Toothbrush</td>
                  <td align="center">93/82.2</td>
                  <td align="center">92/78.4</td>
                  <td align="center">93/74.9</td>
                  <td align="center">96.1/-</td>
                  <td align="center">97.9/93.5</td>
                  <td align="center">97.0/-</td>
                  <td align="center">98.8/93.1</td>
                  <td align="center">98.1/-</td>
                  <td align="center">98.9/92.2</td>
                  <td align="center">98.1/-</td>
                  <td align="center">99.1/94.5</td>
                  <td align="center">97.2/-</td>
                  <td align="center">89.4/90.5</td>
                  <td align="center">99.2/96.3</td>
                </tr>
                <tr>
                  <td align="center">Transistor</td>
                  <td align="center">86/72.8</td>
                  <td align="center">90/72.5</td>
                  <td align="center">86/54.9</td>
                  <td align="center">76.5/-</td>
                  <td align="center">94.1/87.4</td>
                  <td align="center">90.8/-</td>
                  <td align="center">97.5/84.5</td>
                  <td align="center">93.0/-</td>
                  <td align="center">82.5/69.5</td>
                  <td align="center">90.9/-</td>
                  <td align="center">92.5/78</td>
                  <td align="center">86.4/-</td>
                  <td align="center">85.0/81.0</td>
                  <td align="center">96.1/93.2</td>
                </tr>
                <tr>
                  <td align="center">Zipper</td>
                  <td align="center">77/83.9</td>
                  <td align="center">88/66.5</td>
                  <td align="center">78/46.7</td>
                  <td align="center">93.9/-</td>
                  <td align="center">96.5/92.6</td>
                  <td align="center">95.1/-</td>
                  <td align="center">98.5/95.9</td>
                  <td align="center">99.3/-</td>
                  <td align="center">98.5/95.2</td>
                  <td align="center">98.8/-</td>
                  <td align="center">98.2/95.4</td>
                  <td align="center">88.5/-</td>
                  <td align="center">93.2/94.0</td>
                  <td align="center">99.2/96.5</td>
                </tr>
                <tr>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" colspan="2" align="center">Mean</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">82.7/79.0</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">87/69.4</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">74/44.3</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">90.7/-</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">96.0/91.7</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">95.7/-</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">97.5/92.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">96.0/-</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">96.5/92.1</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">97.3/-</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">97.8/93.9</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">94.6/-</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">93.0/92.3</td>
                  <td style="border-top: 1px solid black;border-bottom: 1px solid black;" align="center">98.5/95.9</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
        </p>
        <p id="S4.SS3.p4">Table <xref rid="T3" ref-type="table">3</xref> illustrates the efficacy of anomaly localization. The findings from the experiments demonstrate that the proposed approach outperforms other methods with an average AUROC value of 98.5%, and the AUROC of all 13 classes is more than 98%, highlighting its robustness in detecting diverse anomalies. Furthermore, from the PRO point of view, especially on the transistor class, the method in this paper improves 15.2% over RD. Although the image reconstruction-based DRAEM method performs well on some specific object data, its dependence on the information inherited from the trained reconstruction network may lead to a decline in detection performance for other objects. In contrast, our method shows more robust and generalized performance across different scenarios.</p>
      </sec>
      <sec id="S4.SS4">
        <label>4.4</label>
        <title>Visualization Results</title>
        <p id="S4.SS4.p1">To showcase the effectiveness of our approach in pinpointing anomalies, we opted to contrast it with STPM and MKD, culminating in the illustration of anomaly localization in Figure <xref ref-type="fig" rid="F7">7</xref>. The results vividly display the precision of our model in identifying anomalies. When it comes to selecting the backbone network, STPM [<xref rid="ref030" ref-type="bibr">30</xref>] employs ResNet18 and MKD [<xref rid="ref033" ref-type="bibr">33</xref>] utilizes VGG16, whereas our study harnesses the more potent feature extraction abilities of WideResNet50. In detection methods rooted in knowledge distillation, the feature extraction prowess of the teacher network significantly impacts the detection outcomes of the model. In addition, as described in 3.2, the reverse distillation architecture method used in this paper avoids the accuracy issues caused by insufficient representation of anomaly samples to a certain extent.</p>
        <p>
          <fig id="F7">
            <label>Figure 7.</label>
            <caption>
              <p>Visualization of anomaly localization on the MVTec AD dataset.</p>
            </caption>
            <graphic xlink:href="fig7.jpg"/>
          </fig>
        </p>
      </sec>
      <sec id="S4.SS5">
        <label>4.5</label>
        <title>Ablation Experiments</title>
        <sec id="S4.SS5.SSS1">
          <label>4.5.1</label>
          <title>Backbones</title>
          <p id="S4.SS5.SSS1.p1">In our investigation of the primary network structure, we employed ResNet18, ResNet50, and WideResNet50 as backbone networks to evaluate their influence on model precision, detailed in Table <xref rid="T4" ref-type="table">4</xref>. Our findings demonstrate that as the network's depth and breadth expand, it becomes capable of extracting more sophisticated semantic characteristics, thereby improving the model's discriminative performance and anomaly detection capability. Especially when using WideResNet50, the model achieved the highest accuracy. It is worth noting that even when using the smaller-scale ResNet18, the method still demonstrated excellent performance.</p>
          <p>
            <table-wrap id="T4">
              <label>Table 4</label>
              <caption>
                <p>Ablation experiments for different backbone networks (Best in bold).</p>
              </caption>
              <table>
                <thead>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">Backbones</th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>AD</sub></th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>Al</sub></th>
                    <th style="border-top: 1px solid black;" align="center">PRO</th>
                  </tr>
                </thead>
                <tbody>
                  <tr>
                    <td style="border-top: 1px solid black;" align="center">RestNet18</td>
                    <td style="border-top: 1px solid black;" align="center">98.2</td>
                    <td style="border-top: 1px solid black;" align="center">97.3</td>
                    <td style="border-top: 1px solid black;" align="center">94.4</td>
                  </tr>
                  <tr>
                    <td align="center">RestNet50</td>
                    <td align="center">98.8</td>
                    <td align="center">98.2</td>
                    <td align="center">95.2</td>
                  </tr>
                  <tr>
                    <td style="border-bottom: 1px solid black;" align="center">WideRestNet50</td>
                    <td style="border-bottom: 1px solid black;" align="center">99.1</td>
                    <td style="border-bottom: 1px solid black;" align="center">98.5</td>
                    <td style="border-bottom: 1px solid black;" align="center">95.9</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
        </sec>
        <sec id="S4.SS5.SSS2">
          <label>4.5.2</label>
          <title>FMM</title>
          <p id="S4.SS5.SSS2.p1">To verify the effectiveness of FMM, relevant ablation tests were carried out, with the findings outlined in Table <xref rid="T5" ref-type="table">5</xref>. FMM boosts the capacity of the student network to grasp feature representations by restoring feature-level anomalies, thereby enabling it to encompass finer pixel intricacies crucial for enhancing feature restoration. This method improves the expressiveness of detailed information and ultimately enhances the accuracy of restored features.</p>
          <p>
            <table-wrap id="T5">
              <label>Table 5</label>
              <caption>
                <p>Ablation experiments for FMM (Best in bold).</p>
              </caption>
              <table>
                <thead>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">Baseline</th>
                    <th style="border-top: 1px solid black;" align="center">FMM</th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>Al</sub></th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>Al</sub></th>
                    <th style="border-top: 1px solid black;" align="center">PRO</th>
                  </tr>
                </thead>
                <tbody>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-top: 1px solid black;"/>
                    <td style="border-top: 1px solid black;" align="center">98.8</td>
                    <td style="border-top: 1px solid black;" align="center">98.1</td>
                    <td style="border-top: 1px solid black;" align="center">95.3</td>
                  </tr>
                  <tr>
                    <th style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <td style="border-bottom: 1px solid black;" align="center">99.1</td>
                    <td style="border-bottom: 1px solid black;" align="center">98.5</td>
                    <td style="border-bottom: 1px solid black;" align="center">95.9</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
        </sec>
        <sec id="S4.SS5.SSS3">
          <label>4.5.3</label>
          <title>Scsatt</title>
          <p id="S4.SS5.SSS3.p1">To verify the effectiveness of SCConv and SimAM on the bottleneck module, ablation experiments were carried out, with the outcomes detailed in Table <xref rid="T6" ref-type="table">6</xref>. The inclusion of an attention mechanism can enhance the model's performance to some degree, and after adding both module, the improvement in image anomaly detection results is more significant. The AUROC values at both image and pixel levels saw an increase of 0.6 and 0.7, respectively. The experimental findings indicate that the SCSAtt mechanism adeptly filters and selects input data, thereby improving the accuracy of anomaly detection.</p>
          <p>
            <table-wrap id="T6">
              <label>Table 6</label>
              <caption>
                <p>Ablation experiments for SCSAtt (Best in bold).</p>
              </caption>
              <table>
                <thead>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">Baseline</th>
                    <th style="border-top: 1px solid black;" align="center">SimAM</th>
                    <th style="border-top: 1px solid black;" align="center">SCConv</th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>AD</sub></th>
                    <th style="border-top: 1px solid black;" align="center">AUROC<sub>Al</sub></th>
                    <th style="border-top: 1px solid black;" align="center">PRO</th>
                  </tr>
                </thead>
                <tbody>
                  <tr>
                    <td style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td style="border-top: 1px solid black;"/>
                    <td style="border-top: 1px solid black;"/>
                    <td style="border-top: 1px solid black;" align="center">98.5</td>
                    <td style="border-top: 1px solid black;" align="center">97.8</td>
                    <td style="border-top: 1px solid black;" align="center">93.9</td>
                  </tr>
                  <tr>
                    <td align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td/>
                    <td align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td align="center">98.6</td>
                    <td align="center">98</td>
                    <td align="center">95.5</td>
                  </tr>
                  <tr>
                    <td align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td/>
                    <td align="center">99.1</td>
                    <td align="center">98.2</td>
                    <td align="center">95.3</td>
                  </tr>
                  <tr>
                    <td style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </td>
                    <td style="border-bottom: 1px solid black;" align="center">99.1</td>
                    <td style="border-bottom: 1px solid black;" align="center">98.5</td>
                    <td style="border-bottom: 1px solid black;" align="center">95.9</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
        </sec>
        <sec id="S4.SS5.SSS4">
          <label>4.5.4</label>
          <title>Multi-scale Feature Map</title>
          <p id="S4.SS5.SSS4.p1">In terms of feature fusion, the impact of feature <inline-formula><mml:math alttext="F_{E}^{1}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>1</mml:mn></mml:msubsup></mml:math></inline-formula>, <inline-formula><mml:math alttext="F_{E}^{2}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula>, and <inline-formula><mml:math alttext="F_{E}^{3}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>3</mml:mn></mml:msubsup></mml:math></inline-formula> from different layers of the teacher network on the accuracy of the model was explored, and the results are shown in Table <xref rid="T7" ref-type="table">7</xref>. The analysis revealed that the features from the second layer <inline-formula><mml:math alttext="F_{E}^{2}" display="inline"><mml:msubsup><mml:mi>F</mml:mi><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:msubsup></mml:math></inline-formula> obtained by the teacher model demonstrated superior performance, primarily because they encompassed both local texture and the overall structural details. Given the diversity of anomaly types in the dataset, a single layer of features cannot fully detect all types of anomalies. In contrast, the effects of two-feature fusion and three-feature fusion are both better than a single layer of features, underscoring the significance of multi-scale feature integration in capturing diverse anomaly categories. The study validates that merging high-level and low-level features enriches semantic content, enhances detection precision, and equips the model with a more holistic portrayal of the input data.</p>
          <p>
            <table-wrap id="T7">
              <label>Table 7</label>
              <caption>
                <p>Ablation experiments for multi-scale feature map fusion (Best in bold).</p>
              </caption>
              <table>
                <thead>
                  <tr>
                    <th style="border-top: 1px solid black;" colspan="3" align="center">Feature Map</th>
                    <th style="border-top: 1px solid black;" rowspan="2" align="center">AUROCAd</th>
                    <th style="border-top: 1px solid black;" rowspan="2" align="center">AUROCA</th>
                    <th style="border-top: 1px solid black;" rowspan="2" align="center">PRO</th>
                  </tr>
                  <tr>
                    <th style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="F_{E}^{1}" display="inline">
                          <mml:msubsup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>E</mml:mi>
                            <mml:mn>1</mml:mn>
                          </mml:msubsup>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="F_{E}^{2}" display="inline">
                          <mml:msubsup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>E</mml:mi>
                            <mml:mn>2</mml:mn>
                          </mml:msubsup>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="F_{E}^{3}" display="inline">
                          <mml:msubsup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>E</mml:mi>
                            <mml:mn>3</mml:mn>
                          </mml:msubsup>
                        </mml:math>
                      </inline-formula>
                    </th>
                  </tr>
                </thead>
                <tbody>
                  <tr>
                    <th style="border-top: 1px solid black;"/>
                    <th style="border-top: 1px solid black;"/>
                    <th style="border-top: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <td style="border-top: 1px solid black;" align="center">96.5</td>
                    <td style="border-top: 1px solid black;" align="center">97.1</td>
                    <td style="border-top: 1px solid black;" align="center">92.2</td>
                  </tr>
                  <tr>
                    <th/>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th/>
                    <td align="center">96.3</td>
                    <td align="center">97.4</td>
                    <td align="center">92.8</td>
                  </tr>
                  <tr>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th/>
                    <th/>
                    <td align="center">93.9</td>
                    <td align="center">94.2</td>
                    <td align="center">91.3</td>
                  </tr>
                  <tr>
                    <th/>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <td align="center">98</td>
                    <td align="center">97.4</td>
                    <td align="center">94.8</td>
                  </tr>
                  <tr>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th/>
                    <td align="center">98.4</td>
                    <td align="center">97.5</td>
                    <td align="center">95.3</td>
                  </tr>
                  <tr>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th/>
                    <th align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <td align="center">97.9</td>
                    <td align="center">97.1</td>
                    <td align="center">94.4</td>
                  </tr>
                  <tr>
                    <th style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <th style="border-bottom: 1px solid black;" align="center">
                      <inline-formula>
                        <mml:math alttext="\checkmark" display="inline">
                          <mml:mi mathvariant="normal">✓</mml:mi>
                        </mml:math>
                      </inline-formula>
                    </th>
                    <td style="border-bottom: 1px solid black;" align="center">99.1</td>
                    <td style="border-bottom: 1px solid black;" align="center">98.5</td>
                    <td style="border-bottom: 1px solid black;" align="center">95.9</td>
                  </tr>
                </tbody>
              </table>
            </table-wrap>
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="S5">
      <label>5.</label>
      <title>Conclusion</title>
      <p id="S5.p1">This study presents a novel unsupervised anomaly detection approach utilizing an enhanced reverse knowledge distillation framework. Traditional methods employing knowledge distillation encounter a couple of challenges. First, the model's training exclusively on normal data can result in the student model acquiring an overly robust generalization capability, miscategorizing abnormal attributes as normal. Second, the convolutional compression process may cause multi-scale features produced during training to become distorted, hindering the comprehensive utilization of information. The proposed method addresses these issues by designing different data flows for the T-S network, enhancing the model's expression of anomalies; at the same time, the SCSAtt and FMM modules are designed to strengthen the correlation between pixels, making the localization more accurate. On the MVTec AD dataset, the proposed method has shown better performance in anomaly localization.</p>
      <p id="S5.p2">While this paper has indeed improved the student network's learning from the teacher network, there is still potential for enhancing the precision of the knowledge transfer process. Subsequent research could delve deeper into refining model architecture and integrating new data augmentation techniques to bolster the framework's ability to detect anomalies.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgments</title>
      <p id="ack.p1">This work was supported by the National Natural Science Foundation of China under Grant 62373102.</p>
    </ack>
    <sec id="sec0100" sec-type="COI-statement">
      <title>Conflict of interest</title>
      <p>The authors declare no conflicts of interest.</p>
    </sec>
    <ref-list>
      <title>References</title>
      <ref id="ref001">
        <label>[1]</label>
        <mixed-citation> </mixed-citation>
      </ref>
      <ref id="ref002">
        <label>[2]</label>
        <mixed-citation> Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., … &amp; Müller, K. R. (2021). A unifying review of deep and shallow anomaly detection. <italic>Proceedings of the IEEE, 109(5)</italic>, 756-795. [<uri>https://doi.org/10.1109/JPROC.2021.3052449</uri>] </mixed-citation>
      </ref>
      <ref id="ref003">
        <label>[3]</label>
        <mixed-citation> Li, Z., Wang, C., Han, M., Xue, Y., Wei, W., Li, L. J., &amp; Fei-Fei, L. (2018). Thoracic disease identification and localization with limited supervision. In <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic> (pp. 8290-8299). [<uri>https://doi.org/10.1109/CVPR.2018.00865</uri>] </mixed-citation>
      </ref>
      <ref id="ref004">
        <label>[4]</label>
        <mixed-citation> Lin, D., Li, Y., Prasad, S., Nwe, T. L., Dong, S., &amp; Oo, Z. M. (2021). CAM-guided Multi-Path Decoding U-Net with Triplet Feature Regularization for defect detection and segmentation. <italic>Knowledge-Based Systems, 228</italic>, 107272. [<uri>https://doi.org/10.1016/j.knosys.2021.107272</uri>] </mixed-citation>
      </ref>
      <ref id="ref005">
        <label>[5]</label>
        <mixed-citation> Luo, J., Yang, Z., Li, S., &amp; Wu, Y. (2021). FPCB surface defect detection: A decoupled two-stage object detection framework. <italic>IEEE Transactions on Instrumentation and Measurement, 70</italic>, 1-11. [<uri>https://doi.org/10.1109/TIM.2021.3092510</uri>] </mixed-citation>
      </ref>
      <ref id="ref006">
        <label>[6]</label>
        <mixed-citation> Chen, F., Wang, W., Yang, H., Pei, W., &amp; Lu, G. (2022). Multiscale feature fusion for surveillance video diagnosis. <italic>Knowledge-Based Systems, 240</italic>, 108103.[<uri>https://doi.org/10.1016/j.knosys.2021.108103</uri>] </mixed-citation>
      </ref>
      <ref id="ref007">
        <label>[7]</label>
        <mixed-citation> Niu, S., Li, B., Wang, X., &amp; Peng, Y. (2021). Region-and strength-controllable GAN for defect generation and segmentation in industrial images. <italic>IEEE Transactions on Industrial Informatics, 18</italic>(7), 4531-4541. [<uri>https://doi.org/10.1109/TII.2021.3127188</uri>] </mixed-citation>
      </ref>
      <ref id="ref008">
        <label>[8]</label>
        <mixed-citation> Bergmann, P., Fauser, M., Sattlegger, D., &amp; Steger, C. (2020). Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 4183-4192). [<uri>https://doi.org/10.1109/CVPR42600.2020.00424</uri>] </mixed-citation>
      </ref>
      <ref id="ref009">
        <label>[9]</label>
        <mixed-citation> Atlason, H. E., Love, A., Sigurdsson, S., Gudnason, V., &amp; Ellingsen, L. M. (2019, March). Unsupervised brain lesion segmentation from MRI using a convolutional autoencoder. In <italic>Medical Imaging 2019: Image Processing</italic> (Vol. 10949, pp. 372-378). SPIE. [<uri>https://doi.org/10.1117/12.2512953</uri>] </mixed-citation>
      </ref>
      <ref id="ref010">
        <label>[10]</label>
        <mixed-citation> Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., &amp; Gao, R. X. (2019). Deep learning and its applications to machine health monitoring. <italic>Mechanical Systems and Signal Processing, 115</italic>, 213-237. [<uri>https://doi.org/10.1016/j.ymssp.2018.05.050</uri>] </mixed-citation>
      </ref>
      <ref id="ref011">
        <label>[11]</label>
        <mixed-citation> Kingma, D. P., &amp; Welling, M. (2013, December). <italic>Auto-encoding variational bayes</italic>. </mixed-citation>
      </ref>
      <ref id="ref012">
        <label>[12]</label>
        <mixed-citation> Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … &amp; Bengio, Y. (2014). Generative adversarial nets. <italic>Advances in neural information processing systems, 27</italic>. </mixed-citation>
      </ref>
      <ref id="ref013">
        <label>[13]</label>
        <mixed-citation> Kwon, G., Prabhushankar, M., Temel, D., &amp; AlRegib, G. (2020, August). Backpropagated gradient representations for anomaly detection. In <italic>European conference on computer vision</italic> (pp. 206-226). Cham: Springer International Publishing. [<uri>https://doi.org/10.1007/978-3-030-58589-1_13</uri>] </mixed-citation>
      </ref>
      <ref id="ref014">
        <label>[14]</label>
        <mixed-citation> Chu, W. H., &amp; Kitani, K. M. (2020, August). Neural batch sampling with reinforcement learning for semi-supervised anomaly detection. In <italic>European conference on computer vision</italic> (pp. 751-766). Cham: Springer International Publishing. [<uri>https://doi.org/10.1007/978-3-030-58574-7_45</uri>] </mixed-citation>
      </ref>
      <ref id="ref015">
        <label>[15]</label>
        <mixed-citation> Kim, D., Jeong, D., Kim, H., Chong, K., Kim, S., &amp; Cho, H. (2022). Spatial contrastive learning for anomaly detection and localization. <italic>IEEE Access, 10</italic>, 17366-17376. [<uri>https://doi.org/10.1109/ACCESS.2022.3149130</uri>] </mixed-citation>
      </ref>
      <ref id="ref016">
        <label>[16]</label>
        <mixed-citation> Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., &amp; Langs, G. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In <italic>M. Niethammer, et al. (Eds.), Information processing in medical imaging: IPMI 2017</italic> (Lecture Notes in Computer Science, Vol. 10265, pp. 146–157). Springer. [<uri>https://doi.org/10.1007/978-3-319-59050-9_12</uri>] </mixed-citation>
      </ref>
      <ref id="ref017">
        <label>[17]</label>
        <mixed-citation> Akcay, S., Atapour-Abarghouei, A., &amp; Breckon, T. P. (2018, December). Ganomaly: Semi-supervised anomaly detection via adversarial training. In <italic>Asian conference on computer vision</italic> (pp. 622-637). Cham: Springer International Publishing.[<uri>https://doi.org/10.1007/978-3-030-20893-6_39</uri>] </mixed-citation>
      </ref>
      <ref id="ref018">
        <label>[18]</label>
        <mixed-citation> Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G., &amp; Schmidt-Erfurth, U. (2019). f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. <italic>Medical image analysis, 54</italic>, 30-44. [<uri>https://doi.org/10.1016/j.media.2019.01.010</uri>] </mixed-citation>
      </ref>
      <ref id="ref019">
        <label>[19]</label>
        <mixed-citation> Salehi, M., Sadjadi, N., Baselizadeh, S., Rohban, M. H., &amp; Rabiee, H. R. (2021). Multiresolution knowledge distillation for anomaly detection. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 14902-14912). [<uri>https://doi.org/10.1109/CVPR46437.2021.01466</uri>] </mixed-citation>
      </ref>
      <ref id="ref020">
        <label>[20]</label>
        <mixed-citation> Deng, H., &amp; Li, X. (2022). Anomaly detection via reverse distillation from one-class embedding. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 9737-9746). [<uri>https://doi.org/10.1109/CVPR52688.2022.00951</uri>] </mixed-citation>
      </ref>
      <ref id="ref021">
        <label>[21]</label>
        <mixed-citation> He, K., Zhang, X., Ren, S., &amp; Sun, J. (2016). Deep residual learning for image recognition. In <italic>Proceedings of the IEEE conference on computer vision and pattern recognition</italic> (pp. 770-778). </mixed-citation>
      </ref>
      <ref id="ref022">
        <label>[22]</label>
        <mixed-citation> Zagoruyko, S., &amp; Komodakis, N. (2016). Wide residual networks. <italic>arXiv preprint arXiv:1605.07146</italic>. </mixed-citation>
      </ref>
      <ref id="ref023">
        <label>[23]</label>
        <mixed-citation> Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., &amp; Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In <italic>2009 IEEE conference on computer vision and pattern recognition</italic> (pp. 248-255). Ieee. [<uri>https://doi.org/10.1109/CVPR.2009.5206848</uri>] </mixed-citation>
      </ref>
      <ref id="ref024">
        <label>[24]</label>
        <mixed-citation> Li, J., Wen, Y., &amp; He, L. (2023). Scconv: Spatial and channel reconstruction convolution for feature redundancy. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 6153-6162). [<uri>https://doi.org/10.1109/CVPR52729.2023.00596</uri>] </mixed-citation>
      </ref>
      <ref id="ref025">
        <label>[25]</label>
        <mixed-citation> Yang, L., Zhang, R. Y., Li, L., &amp; Xie, X. (2021, July). Simam: A simple, parameter-free attention module for convolutional neural networks. In <italic>International conference on machine learning</italic> (pp. 11863-11874). PMLR. </mixed-citation>
      </ref>
      <ref id="ref026">
        <label>[26]</label>
        <mixed-citation> Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z., &amp; Yuan, C. (2022, October). Masked generative distillation. In <italic>European conference on computer vision</italic> (pp. 53-69). Cham: Springer Nature Switzerland. [<uri>https://doi.org/10.1007/978-3-031-20083-0_4</uri>] </mixed-citation>
      </ref>
      <ref id="ref027">
        <label>[27]</label>
        <mixed-citation> Bergmann, P., Fauser, M., Sattlegger, D., &amp; Steger, C. (2019). MVTec AD–A comprehensive real-world dataset for unsupervised anomaly detection. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 9592-9600). [<uri>https://doi.org/10.1109/CVPR.2019.00982</uri>] </mixed-citation>
      </ref>
      <ref id="ref028">
        <label>[28]</label>
        <mixed-citation> Aytekin, C., Ni, X., Cricri, F., &amp; Aksu, E. (2018, July). Clustering and unsupervised anomaly detection with l 2 normalized deep auto-encoder representations. In <italic>2018 International Joint Conference on Neural Networks (IJCNN)</italic> (pp. 1-6). IEEE. [<uri>https://doi.org/10.1109/IJCNN.2018.8489068</uri>] </mixed-citation>
      </ref>
      <ref id="ref029">
        <label>[29]</label>
        <mixed-citation> Cohen, N., &amp; Hoshen, Y. (2020). Sub-image anomaly detection with deep pyramid correspondences. <italic>arXiv preprint arXiv:2005.02357</italic>. [<uri>https://doi.org/10.48550/arXiv.2005.02357</uri>] </mixed-citation>
      </ref>
      <ref id="ref030">
        <label>[30]</label>
        <mixed-citation> Yi, J., &amp; Yoon, S. (2020). Patch svdd: Patch-level svdd for anomaly detection and segmentation. In <italic>Proceedings of the Asian conference on computer vision</italic>.[<uri>https://doi.org/10.1007/978-3-030-69544-6_23</uri>] </mixed-citation>
      </ref>
      <ref id="ref031">
        <label>[31]</label>
        <mixed-citation> Defard, T., Setkov, A., Loesch, A., &amp; Audigier, R. (2021, January). Padim: a patch distribution modeling framework for anomaly detection and localization. In <italic>International conference on pattern recognition</italic> (pp. 475-489). Cham: Springer International Publishing. [<uri>https://doi.org/10.1007/978-3-030-68799-1_35</uri>] </mixed-citation>
      </ref>
      <ref id="ref032">
        <label>[32]</label>
        <mixed-citation> Wang, G., Han, S., Ding, E., &amp; Huang, D. (2021). Student-teacher feature pyramid matching for anomaly detection. <italic>arXiv preprint arXiv:2103.04257</italic>. [<uri>https://doi.org/10.48550/arXiv.2103.04257</uri>] </mixed-citation>
      </ref>
      <ref id="ref033">
        <label>[33]</label>
        <mixed-citation> Li, C. L., Sohn, K., Yoon, J., &amp; Pfister, T. (2021). Cutpaste: Self-supervised learning for anomaly detection and localization. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 9664-9674). </mixed-citation>
      </ref>
      <ref id="ref034">
        <label>[34]</label>
        <mixed-citation> Zavrtanik, V., Kristan, M., &amp; Skočaj, D. (2021). Draem-a discriminatively trained reconstruction embedding for surface anomaly detection. In <italic>Proceedings of the IEEE/CVF international conference on computer vision</italic> (pp. 8330-8339). </mixed-citation>
      </ref>
      <ref id="ref035">
        <label>[35]</label>
        <mixed-citation> Perera, P., Nallapati, R., &amp; Xiang, B. (2019). Ocgan: One-class novelty detection using gans with constrained latent representations. In <italic>Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</italic> (pp. 2898-2906). </mixed-citation>
      </ref>
      <ref id="ref036">
        <label>[36]</label>
        <mixed-citation> Abati, D., Porrello, A., Calderara, S., &amp; Cucchiara, R. (2019). Latent space autoregression for novelty detection. In <italic>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</italic> (pp. 481-490). [<uri>https://doi.org/10.1109/CVPR.2019.00057</uri>] </mixed-citation>
      </ref>
      <ref id="ref037">
        <label>[37]</label>
        <mixed-citation> Venkataramanan, S., Peng, K. C., Singh, R. V., &amp; Mahalanobis, A. (2020, August). Attention guided anomaly localization in images. In <italic>European Conference on Computer Vision</italic> (pp. 485-503). Cham: Springer International Publishing. [<uri>https://doi.org/10.1007/978-3-030-58520-4_29</uri>] </mixed-citation>
      </ref>
      <ref id="ref038">
        <label>[38]</label>
        <mixed-citation> Golan, I., &amp; El-Yaniv, R. (2018). Deep anomaly detection using geometric transformations. <italic>Advances in neural information processing systems, 31</italic>. </mixed-citation>
      </ref>
      <ref id="ref039">
        <label>[39]</label>
        <mixed-citation> Xu, H., Xu, S., &amp; Yang, W. (2023). Unsupervised industrial anomaly detection with diffusion models. <italic>Journal of Visual Communication and Image Representation, 97</italic>, 103983. [<uri>https://doi.org/10.1016/j.jvcir.2023.103983</uri>] </mixed-citation>
      </ref>
      <ref id="ref040">
        <label>[40]</label>
        <mixed-citation> Yang, Q., &amp; Guo, R. (2024). An unsupervised method for industrial image anomaly detection with vision transformer-based autoencoder. <italic>Sensors, 24</italic>(8), 2440. [<uri>https://doi.org/10.3390/s24082440</uri>] </mixed-citation>
      </ref>
      <ref id="ref041">
        <label>[41]</label>
        <mixed-citation> Shen, H., Wei, B., Ma, Y., &amp; Gu, X. (2023). Unsupervised industrial image ensemble anomaly detection based on object pseudo-anomaly generation and normal image feature combination enhancement. <italic>Computers &amp; Industrial Engineering, 182</italic>, 109337. [<uri>https://doi.org/10.1016/j.cie.2023.109337</uri>] </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>
