-
Are We Done with MMLU?
Authors:
Aryo Pradipta Gema,
Joshua Ong Jun Leang,
Giwon Hong,
Alessio Devoto,
Alberto Carlo Maria Mancino,
Rohit Saxena,
Xuanli He,
Yu Zhao,
Xiaotang Du,
Mohammad Reza Ghasemi Madani,
Claire Barale,
Robert McHardy,
Joshua Harris,
Jean Kaddour,
Emile van Krieken,
Pasquale Minervini
Abstract:
Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr…
▽ More
Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive framework for identifying dataset errors using a novel error taxonomy. Then, we create MMLU-Redux, which is a subset of 3,000 manually re-annotated questions across 30 MMLU subjects. Using MMLU-Redux, we demonstrate significant discrepancies with the model performance metrics that were originally reported. Our results strongly advocate for revising MMLU's error-ridden questions to enhance its future utility and reliability as a benchmark. Therefore, we open up MMLU-Redux for additional annotation https://huggingface.co/datasets/edinburgh-dawg/mmlu-redux.
△ Less
Submitted 7 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input
Authors:
Senmao Tian,
Haoyu Gao,
Gangyi Hong,
Shuyun Wang,
JingJie Wang,
Xin Yu,
Shunli Zhang
Abstract:
Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's i…
▽ More
Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's intermediate layers due to the accumulation of quantization errors. To address this, we propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. However, our theoretical analysis and empirical studies reveal that directly applying the soft quantizer can hinder network convergence. We further refine the training strategy to ensure convergence while simulating quantization errors. Additionally, we visualize the distribution of outputs from different samples in the feature space and observe significant changes compared to the full precision network, which harms performance. Based on this, we propose an Inter-class Distance-guided Distillation (IDD) strategy to preserve the relative distance between the embeddings of samples with different labels. Extensive experiments validate the effectiveness of our approach, demonstrating state-of-the-art accuracy across various settings and datasets. The code will be made publicly available.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Authors:
Giwon Hong,
Aryo Pradipta Gema,
Rohit Saxena,
Xiaotang Du,
Ping Nie,
Yu Zhao,
Laura Perez-Beltrachini,
Max Ryabinin,
Xuanli He,
Clémentine Fourrier,
Pasquale Minervini
Abstract:
Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and com…
▽ More
Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and compare the tendency of each model to produce hallucinations. The leaderboard uses a comprehensive set of benchmarks focusing on different aspects of hallucinations, such as factuality and faithfulness, across various tasks, including question-answering, summarisation, and reading comprehension. Our analysis provides insights into the performance of different models, guiding researchers and practitioners in choosing the most reliable models for their applications.
△ Less
Submitted 17 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4
Authors:
Aryo Pradipta Gema,
Giwon Hong,
Pasquale Minervini,
Luke Daines,
Beatrice Alex
Abstract:
The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports. In this study, we evaluate various Large Language Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Parameter-Efficient Fine-Tuning (PEFT). We propose a PEFT method to improve the consistency of LLMs by me…
▽ More
The NLI4CT task assesses Natural Language Inference systems in predicting whether hypotheses entail or contradict evidence from Clinical Trial Reports. In this study, we evaluate various Large Language Models (LLMs) with multiple strategies, including Chain-of-Thought, In-Context Learning, and Parameter-Efficient Fine-Tuning (PEFT). We propose a PEFT method to improve the consistency of LLMs by merging adapters that were fine-tuned separately using triplet and language modelling objectives. We found that merging the two PEFT adapters improves the F1 score (+0.0346) and consistency (+0.152) of the LLMs. However, our novel methods did not produce more accurate results than GPT-4 in terms of faithfulness and consistency. Averaging the three metrics, GPT-4 ranks joint-first in the competition with 0.8328. Finally, our contamination analysis with GPT-4 indicates that there was no test data leakage.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Closing the AI generalization gap by adjusting for dermatology condition distribution differences across clinical settings
Authors:
Rajeev V. Rikhye,
Aaron Loh,
Grace Eunhae Hong,
Preeti Singh,
Margaret Ann Smith,
Vijaytha Muralidharan,
Doris Wong,
Rory Sayres,
Michelle Phung,
Nicolas Betancourt,
Bradley Fong,
Rachna Sahasrabudhe,
Khoban Nasim,
Alec Eschholz,
Basil Mustafa,
Jan Freyberg,
Terry Spitz,
Yossi Matias,
Greg S. Corrado,
Katherine Chou,
Dale R. Webster,
Peggy Bui,
Yuan Liu,
Yun Liu,
Justin Ko
, et al. (1 additional authors not shown)
Abstract:
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generali…
▽ More
Recently, there has been great progress in the ability of artificial intelligence (AI) algorithms to classify dermatological conditions from clinical photographs. However, little is known about the robustness of these algorithms in real-world settings where several factors can lead to a loss of generalizability. Understanding and overcoming these limitations will permit the development of generalizable AI that can aid in the diagnosis of skin conditions across a variety of clinical settings. In this retrospective study, we demonstrate that differences in skin condition distribution, rather than in demographics or image capture mode are the main source of errors when an AI algorithm is evaluated on data from a previously unseen source. We demonstrate a series of steps to close this generalization gap, requiring progressively more information about the new source, ranging from the condition distribution to training data enriched for data less frequently seen during training. Our results also suggest comparable performance from end-to-end fine tuning versus fine tuning solely the classification layer on top of a frozen embedding model. Our approach can inform the adaptation of AI algorithms to new settings, based on the information and resources available.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Steady-State Analysis and Online Learning for Queues with Hawkes Arrivals
Authors:
Xinyun Chen,
Guiyu Hong
Abstract:
We investigate the long-run behavior of single-server queues with Hawkes arrivals and general service distributions and related optimization problems. In detail, utilizing novel coupling techniques, we establish finite moment bounds for the stationary distribution of the workload and busy period processes. In addition, we are able to show that, those queueing processes converge exponentially fast…
▽ More
We investigate the long-run behavior of single-server queues with Hawkes arrivals and general service distributions and related optimization problems. In detail, utilizing novel coupling techniques, we establish finite moment bounds for the stationary distribution of the workload and busy period processes. In addition, we are able to show that, those queueing processes converge exponentially fast to their stationary distribution. Based on these theoretic results, we develop an efficient numerical algorithm to solve the optimal staffing problem for the Hawkes queues in a data-driven manner. Numerical results indicate a sharp difference in staffing for Hawkes queues, compared to the classic GI/GI/1 model, especially in the heavy-traffic regime.
△ Less
Submitted 13 November, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Disposable Transfer Learning for Selective Source Task Unlearning
Authors:
Seunghee Koh,
Hyounguk Shon,
Janghyeon Lee,
Hyeong Gwon Hong,
Junmo Kim
Abstract:
Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation. Even after the pre-trained model is adapted for the target task, the representation performance of the feature extractor is retained to some extent. As the performance of the pre-trained model can be considered the private property of the owner, it is natural to seek the exclusive right…
▽ More
Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation. Even after the pre-trained model is adapted for the target task, the representation performance of the feature extractor is retained to some extent. As the performance of the pre-trained model can be considered the private property of the owner, it is natural to seek the exclusive right of the generalized performance of the pre-trained weight. To address this issue, we suggest a new paradigm of transfer learning called disposable transfer learning (DTL), which disposes of only the source task without degrading the performance of the target task. To achieve knowledge disposal, we propose a novel loss named Gradient Collision loss (GC loss). GC loss selectively unlearns the source knowledge by leading the gradient vectors of mini-batches in different directions. Whether the model successfully unlearns the source task is measured by piggyback learning accuracy (PL accuracy). PL accuracy estimates the vulnerability of knowledge leakage by retraining the scrubbed model on a subset of source data or new downstream data. We demonstrate that GC loss is an effective approach to the DTL problem by showing that the model trained with GC loss retains the performance on the target task with a significantly reduced PL accuracy.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study
Authors:
Seyoung Park,
Eun Ryung Lee,
Hyokyoung G. Hong
Abstract:
Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which ca…
▽ More
Health outcomes, such as body mass index and cholesterol levels, are known to be dependent on age and exhibit varying effects with their associated risk factors. In this paper, we propose a novel framework for dynamic modeling of the associations between health outcomes and risk factors using varying-coefficients (VC) regional quantile regression via K-nearest neighbors (KNN) fused Lasso, which captures the time-varying effects of age. The proposed method has strong theoretical properties, including a tight estimation error bound and the ability to detect exact clustered patterns under certain regularity conditions. To efficiently solve the resulting optimization problem, we develop an alternating direction method of multipliers (ADMM) algorithm. Our empirical results demonstrate the efficacy of the proposed method in capturing the complex age-dependent associations between health outcomes and their risk factors.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Localization using Multi-Focal Spatial Attention for Masked Face Recognition
Authors:
Yooshin Cho,
Hanbyel Cho,
Hyeong Gwon Hong,
Jaesung Ahn,
Dongmin Cho,
JungWoo Chang,
Junmo Kim
Abstract:
Since the beginning of world-wide COVID-19 pandemic, facial masks have been recommended to limit the spread of the disease. However, these masks hide certain facial attributes. Hence, it has become difficult for existing face recognition systems to perform identity verification on masked faces. In this context, it is necessary to develop masked Face Recognition (MFR) for contactless biometric reco…
▽ More
Since the beginning of world-wide COVID-19 pandemic, facial masks have been recommended to limit the spread of the disease. However, these masks hide certain facial attributes. Hence, it has become difficult for existing face recognition systems to perform identity verification on masked faces. In this context, it is necessary to develop masked Face Recognition (MFR) for contactless biometric recognition systems. Thus, in this paper, we propose Complementary Attention Learning and Multi-Focal Spatial Attention that precisely removes masked region by training complementary spatial attention to focus on two distinct regions: masked regions and backgrounds. In our method, standard spatial attention and networks focus on unmasked regions, and extract mask-invariant features while minimizing the loss of the conventional Face Recognition (FR) performance. For conventional FR, we evaluate the performance on the IJB-C, Age-DB, CALFW, and CPLFW datasets. We evaluate the MFR performance on the ICCV2021-MFR/Insightface track, and demonstrate the improved performance on the both MFR and FR datasets. Additionally, we empirically verify that spatial attention of proposed method is more precisely activated in unmasked regions.
△ Less
Submitted 7 September, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise
Authors:
Giwon Hong,
Jeonghwan Kim,
Junmo Kang,
Sung-Hyon Myaeng,
Joyce Jiyoung Whang
Abstract:
Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We obse…
▽ More
Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.
△ Less
Submitted 13 March, 2024; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Towards Understanding the Effect of Pretraining Label Granularity
Authors:
Guan Zhe Hong,
Yin Cui,
Ariel Fuxman,
Stanley H. Chan,
Enming Luo
Abstract:
In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results…
▽ More
In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.
△ Less
Submitted 5 October, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Online Learning and Optimization for Queues with Unknown Demand Curve and Service Distribution
Authors:
Xinyun Chen,
Yunan Liu,
Guiyu Hong
Abstract:
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity ÎĽto maximize the cumulative expected profit (the service revenue minus the capacity cost and delay penalty). The conventional predict-then-optimize (PTO) approach takes two steps: first, it estimates the model parameters (e.g., arrival rate and service-time…
▽ More
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity ÎĽto maximize the cumulative expected profit (the service revenue minus the capacity cost and delay penalty). The conventional predict-then-optimize (PTO) approach takes two steps: first, it estimates the model parameters (e.g., arrival rate and service-time distribution) from data; second, it optimizes a model based on the estimated parameters. A major drawback of PTO is that its solution accuracy can often be highly sensitive to the parameter estimation errors because PTO is unable to properly link these errors (step 1) to the quality of the optimized solutions (step 2). To remedy this issue, we develop an online learning framework that automatically incorporates the aforementioned parameter estimation errors in the solution prescription process; it is an integrated method that can "learn" the optimal solution without needing to set up the parameter estimation as a separate step as in PTO. Effectiveness of our online learning approach is substantiated by (i) theoretical results including the algorithm convergence and analysis of the regret ("cost" to pay over time for the algorithm to learn the optimal policy), and (ii) engineering confirmation via simulation experiments of a variety of representative examples. We also provide careful comparisons for PTO and the online learning method.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Data Poisoning Attack Aiming the Vulnerability of Continual Learning
Authors:
Gyojin Han,
Jaehyun Choi,
Hyeong Gwon Hong,
Junmo Kim
Abstract:
Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world constraints related to memory and privacy. However, this introduces a problem in these models by not being able to track the performance on each task. In essence, current continual learning methods are susceptible to attacks on previous tasks. We demonstrate the vulnerability…
▽ More
Generally, regularization-based continual learning models limit access to the previous task data to imitate the real-world constraints related to memory and privacy. However, this introduces a problem in these models by not being able to track the performance on each task. In essence, current continual learning methods are susceptible to attacks on previous tasks. We demonstrate the vulnerability of regularization-based continual learning methods by presenting a simple task-specific data poisoning attack that can be used in the learning process of a new task. Training data generated by the proposed attack causes performance degradation on a specific task targeted by the attacker. We experiment with the attack on the two representative regularization-based continual learning methods, Elastic Weight Consolidation (EWC) and Synaptic Intelligence (SI), trained with variants of MNIST dataset. The experiment results justify the vulnerability proposed in this paper and demonstrate the importance of developing continual learning models that are robust to adversarial attacks.
△ Less
Submitted 3 July, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Authors:
Qutang Cai,
Guoqiang Hong,
Zhijian Ye,
Ximin Li,
Haizhou Li
Abstract:
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). By combining several ResNet variants, our submission for track 1 attained a minDCF of 0:090 with EER 1:401%. By further incorporating three fine-tuned pre-trained models, our submission for track 2 achieved a minDCF of 0:072 with EER 1:119%. For track 4, our system consis…
▽ More
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). By combining several ResNet variants, our submission for track 1 attained a minDCF of 0:090 with EER 1:401%. By further incorporating three fine-tuned pre-trained models, our submission for track 2 achieved a minDCF of 0:072 with EER 1:119%. For track 4, our system consisted of voice activity detection (VAD), speaker embedding extraction, agglomerative hierarchical clustering (AHC) followed by a re-clustering step based on a Bayesian hidden Markov model and overlapped speech detection and handling. Our submission for track 4 achieved a diarisation error rate (DER) of 4.86%. The submissions all ranked the 2nd places for the corresponding tracks.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks
Authors:
Yooshin Cho,
Youngsoo Kim,
Hanbyel Cho,
Jaesung Ahn,
Hyeong Gwon Hong,
Junmo Kim
Abstract:
Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the atte…
▽ More
Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors. The results show the inefficacy of softmax operation which is generally used to normalize the attention map of the NL block. Attention maps normalized with softmax operation highly rely upon magnitude of key vectors, and performance is degenerated if the magnitude information is removed. By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet. In Addition, our method shows robustness to embedding channel reduction and embedding weight initialization. Notably, our method makes multi-head attention employable without additional computational cost.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Ultra-High Dimensional Sparse Representations with Binarization for Efficient Text Retrieval
Authors:
Kyoung-Rok Jang,
Junmo Kang,
Giwon Hong,
Sung-Hyon Myaeng,
Joohee Park,
Taewon Yoon,
Heecheol Seo
Abstract:
The semantic matching capabilities of neural information retrieval can ameliorate synonymy and polysemy problems of symbolic approaches. However, neural models' dense representations are more suitable for re-ranking, due to their inefficiency. Sparse representations, either in symbolic or latent form, are more efficient with an inverted index. Taking the merits of the sparse and dense representati…
▽ More
The semantic matching capabilities of neural information retrieval can ameliorate synonymy and polysemy problems of symbolic approaches. However, neural models' dense representations are more suitable for re-ranking, due to their inefficiency. Sparse representations, either in symbolic or latent form, are more efficient with an inverted index. Taking the merits of the sparse and dense representations, we propose an ultra-high dimensional (UHD) representation scheme equipped with directly controllable sparsity. UHD's large capacity and minimal noise and interference among the dimensions allow for binarized representations, which are highly efficient for storage and search. Also proposed is a bucketing method, where the embeddings from multiple layers of BERT are selected/merged to represent diverse linguistic aspects. We test our models with MS MARCO and TREC CAR, showing that our models outperforms other sparse models
△ Less
Submitted 15 October, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Student-Teacher Learning from Clean Inputs to Noisy Inputs
Authors:
Guanzhe Hong,
Zhiyuan Mao,
Xiaojun Lin,
Stanley H. Chan
Abstract:
Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the st…
▽ More
Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
Stay Connected, Leave no Trace: Enhancing Security and Privacy in WiFi via Obfuscating Radiometric Fingerprints
Authors:
Luis F. Abanto-Leon,
Andreas Baeuml,
Gek Hong,
Sim,
Matthias Hollick,
Arash Asadi
Abstract:
The intrinsic hardware imperfection of WiFi chipsets manifests itself in the transmitted signal, leading to a unique radiometric fingerprint. This fingerprint can be used as an additional means of authentication to enhance security. In fact, recent works propose practical fingerprinting solutions that can be readily implemented in commercial-off-the-shelf devices. In this paper, we prove analytica…
▽ More
The intrinsic hardware imperfection of WiFi chipsets manifests itself in the transmitted signal, leading to a unique radiometric fingerprint. This fingerprint can be used as an additional means of authentication to enhance security. In fact, recent works propose practical fingerprinting solutions that can be readily implemented in commercial-off-the-shelf devices. In this paper, we prove analytically and experimentally that these solutions are highly vulnerable to impersonation attacks. We also demonstrate that such a unique device-based signature can be abused to violate privacy by tracking the user device, and, as of today, users do not have any means to prevent such privacy attacks other than turning off the device.
We propose RF-Veil, a radiometric fingerprinting solution that not only is robust against impersonation attacks but also protects user privacy by obfuscating the radiometric fingerprint of the transmitter for non-legitimate receivers. Specifically, we introduce a randomized pattern of phase errors to the transmitted signal such that only the intended receiver can extract the original fingerprint of the transmitter. In a series of experiments and analyses, we expose the vulnerability of adopting naive randomization to statistical attacks and introduce countermeasures. Finally, we show the efficacy of RF-Veil experimentally in protecting user privacy and enhancing security. More importantly, our proposed solution allows communicating with other devices, which do not employ RF-Veil.
△ Less
Submitted 27 November, 2020; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Continual Learning with Extended Kronecker-factored Approximate Curvature
Authors:
Janghyeon Lee,
Hyeong Gwon Hong,
Donggyu Joo,
Junmo Kim
Abstract:
We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is depen…
▽ More
We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is dependence between examples, typically caused by BN layers in deep network architectures. We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions. We also propose a method of weight merging and reparameterization to properly handle statistical parameters of BN, which plays a critical role for continual learning with BN, and a method that selects hyperparameters without source task data. Our method shows better performance than baselines in the permuted MNIST task with BN layers and in sequential learning from the ImageNet classification task to fine-grained classification tasks with ResNet-50, without any explicit or implicit use of source task data for hyperparameter selection.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Residual Continual Learning
Authors:
Janghyeon Lee,
Donggyu Joo,
Hyeong Gwon Hong,
Junmo Kim
Abstract:
We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the siz…
▽ More
We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the size of the network does not increase at all. To apply the proposed method to general convolutional neural networks, the effects of batch normalization layers are also considered. By utilizing residual-learning-like reparameterization and a special weight decay loss, the trade-off between source and target performance is effectively controlled. The proposed method exhibits state-of-the-art performance in various continual learning scenarios.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Fairness-Aware Hybrid Precoding for mmWave NOMA Unicast/Multicast Transmissions in Industrial IoT
Authors:
Luis F. Abanto-Leon,
Gek Hong,
Sim
Abstract:
This paper investigates dual-layer non-orthogonally superimposed transmissions for industrial internet of things (IoT) millimeter-wave communications. Essentially, the overlayer is a ubiquitous multicast signal devised to serve all the devices in coverage with a common message, i.e., critical control packet. The underlayer is a composite signal that consists of private unicast messages. Due to saf…
▽ More
This paper investigates dual-layer non-orthogonally superimposed transmissions for industrial internet of things (IoT) millimeter-wave communications. Essentially, the overlayer is a ubiquitous multicast signal devised to serve all the devices in coverage with a common message, i.e., critical control packet. The underlayer is a composite signal that consists of private unicast messages. Due to safety implications, it is critical that all devices can decode the multicast information. To ensure this requirement, we jointly optimize the hybrid precoder, analog combiners, power allocation, and fairness. Specifically, we incorporate a power splitting constraint between the two overlaid signals and enforce supplementary per-device constraints to guarantee multicast fairness. Performance is evaluated in terms of the spectral efficiency, multicast fairness, and bit error rate, thus corroborating the feasibility of our proposed scheme.
△ Less
Submitted 27 February, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Learning-based Max-Min Fair Hybrid Precoding for mmWave Multicasting
Authors:
Luis F. Abanto-Leon,
Gek Hong,
Sim
Abstract:
This paper investigates the joint design of hybrid transmit precoder and analog receive combiners for single-group multicasting in millimeter-wave systems. We propose LB-GDM, a low-complexity learning-based approach that leverages gradient descent with momentum and alternating optimization to design (i) the digital and analog constituents of a hybrid transmitter and (ii) the analog combiners of ea…
▽ More
This paper investigates the joint design of hybrid transmit precoder and analog receive combiners for single-group multicasting in millimeter-wave systems. We propose LB-GDM, a low-complexity learning-based approach that leverages gradient descent with momentum and alternating optimization to design (i) the digital and analog constituents of a hybrid transmitter and (ii) the analog combiners of each receiver. In addition, we also extend our proposed approach to design fully-digital precoders. We show through numerical evaluation that, implementing LB-GDM in either hybrid or digital precoders attain superlative performance compared to competing designs based on semidefinite relaxation. Specifically, in terms of minimum signal-to-noise ratio, we report a remarkable improvement with gains of up to 105% and 101% for the fully-digital and hybrid precoders, respectively.
△ Less
Submitted 27 February, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
EDAS: Efficient and Differentiable Architecture Search
Authors:
Hyeong Gwon Hong,
Pyunghwan Ahn,
Junmo Kim
Abstract:
Transferrable neural architecture search can be viewed as a binary optimization problem where a single optimal path should be selected among candidate paths in each edge within the repeated cell block of the directed a cyclic graph form. Recently, the field of differentiable architecture search attempts to relax the search problem continuously using a one-shot network that combines all the candida…
▽ More
Transferrable neural architecture search can be viewed as a binary optimization problem where a single optimal path should be selected among candidate paths in each edge within the repeated cell block of the directed a cyclic graph form. Recently, the field of differentiable architecture search attempts to relax the search problem continuously using a one-shot network that combines all the candidate paths in search space. However, when the one-shot network is pruned to the model in the discrete architecture space by the derivation algorithm, performance is significantly degraded to an almost random estimator. To reduce the quantization error from the heavy use of relaxation, we only sample a single edge to relax the corresponding variable and clamp variables in the other edges to zero or one. By this method, there is no performance drop after pruning the one-shot network by derivation algorithm, due to the preservation of the discrete nature of optimization variables during the search. Furthermore, the minimization of relaxation degree allows searching in a deeper network to discover better performance with remarkable search cost reduction (0.125 GPU days) compared to previous methods. By adding several regularization methods that help explore within the search space, we could obtain the network with notable performances on CIFAR-10, CIFAR-100, and ImageNet.
△ Less
Submitted 4 December, 2019; v1 submitted 3 December, 2019;
originally announced December 2019.
-
6G Massive Radio Access Networks: Key Issues, Technologies, and Future Challenges
Authors:
Ying Loong Lee,
Donghong Qin,
Li-Chun Wang,
Gek Hong,
Sim
Abstract:
Driven by the emerging use cases in massive access future networks, there is a need for technological advancements and evolutions for wireless communications beyond the fifth-generation (5G) networks. In particular, we envisage the upcoming sixth-generation (6G) networks to consist of numerous devices demanding extremely high-performance interconnections even under strenuous scenarios such as dive…
▽ More
Driven by the emerging use cases in massive access future networks, there is a need for technological advancements and evolutions for wireless communications beyond the fifth-generation (5G) networks. In particular, we envisage the upcoming sixth-generation (6G) networks to consist of numerous devices demanding extremely high-performance interconnections even under strenuous scenarios such as diverse mobility, extreme density, and dynamic environment. To cater for such a demand, investigation on flexible and sustainable radio access network (RAN) techniques capable of supporting highly diverse requirements and massive connectivity is of utmost importance. To this end, this paper first outlines the key driving applications for 6G, including smart city and factory, which trigger the transformation of existing RAN techniques. We then examine and provide in-depth discussions on several critical performance requirements (i.e., the level of flexibility, the support for massive interconnectivity, and energy efficiency), issues, enabling technologies, and challenges in designing 6G massive RANs. We conclude the article by providing several artificial-intelligence-based approaches to overcome future challenges.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Hybrid Precoding for Multi-Group Multicasting in mmWave Systems
Authors:
Luis F. Abanto-Leon,
Matthias Hollick,
Gek Hong,
Sim
Abstract:
Multicast beamforming is known to improve spectral efficiency. However, its benefits and challenges for hybrid precoders design in millimeter-wave (mmWave) systems remain understudied. To this end, this paper investigates the first joint design of hybrid transmit precoders (with an arbitrary number of finite-resolution phase shifts) and receive combiners for mmWave multi-group multicasting. Our pr…
▽ More
Multicast beamforming is known to improve spectral efficiency. However, its benefits and challenges for hybrid precoders design in millimeter-wave (mmWave) systems remain understudied. To this end, this paper investigates the first joint design of hybrid transmit precoders (with an arbitrary number of finite-resolution phase shifts) and receive combiners for mmWave multi-group multicasting. Our proposed design leverages semidefinite relaxation (SDR), alternating optimization and Cholesky matrix factorization to sequentially optimize the digital/analog precoders at the transmitter and the combiners at each receiver. By considering receivers with multiple-antenna architecture, our design remarkably improves the overall system performance. Specifically, with only two receive antennas the average transmit power per received message improves by $ 16.8\% $ while the successful information reception is boosted by $ 60\% $. We demonstrate by means of extensive simulations that our hybrid precoder design performs very close to its fully-digital counterpart even under challenging scenarios (i.e., when co-located users belong to distinct multicast groups).
△ Less
Submitted 3 February, 2020; v1 submitted 7 August, 2019;
originally announced August 2019.
-
Joint Relaying and Spatial Sharing Multicast Scheduling for mmWave Networks
Authors:
Gek Hong,
Sim,
Mahdi Mousavi,
Lin Wang,
Anja Klein,
Matthias Hollick
Abstract:
Millimeter-wave (mmWave) communication plays a vital role to efficiently disseminate large volumes of data in beyond-5G networks. Unfortunately, the directionality of mmWave communication significantly complicates efficient data dissemination, particularly in multicasting, which is gaining more and more importance in emerging applications (e.g., V2X, public safety). While multicasting for systems…
▽ More
Millimeter-wave (mmWave) communication plays a vital role to efficiently disseminate large volumes of data in beyond-5G networks. Unfortunately, the directionality of mmWave communication significantly complicates efficient data dissemination, particularly in multicasting, which is gaining more and more importance in emerging applications (e.g., V2X, public safety). While multicasting for systems operating at lower frequencies (i.e., sub-6GHz) has been extensively studied, they are sub-optimal for mmWave systems as mmWave has significantly different propagation characteristics, i.e., using the directional transmission to compensate for the high path loss and thus promoting spectrum sharing. In this paper, we propose novel multicast scheduling algorithms by jointly exploiting relaying and spatial sharing gains while aiming to minimize the multicast completion time. We first characterize the min-time mmWave multicasting problem with a comprehensive model and formulate it with an integer linear program (ILP). We further design a practical and scalable distributed algorithm named mmDiMu, based on gradually maximizing the transmission throughput over time. Finally, we carry out validation through extensive simulations in different scales and the results show that mmDiMu significantly outperforms conventional algorithms with around 95% reduction on multicast completion time.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Rethinking Atmospheric Turbulence Mitigation
Authors:
Nicholas Chimitt,
Zhiyuan Mao,
Guanzhe Hong,
Stanley H. Chan
Abstract:
State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem…
▽ More
State-of-the-art atmospheric turbulence image restoration methods utilize standard image processing tools such as optical flow, lucky region and blind deconvolution to restore the images. While promising results have been reported over the past decade, many of the methods are agnostic to the physical model that generates the distortion. In this paper, we revisit the turbulence restoration problem by analyzing the reference frame generation and the blind deconvolution steps in a typical restoration pipeline. By leveraging tools in large deviation theory, we rigorously prove the minimum number of frames required to generate a reliable reference for both static and dynamic scenes. We discuss how a turbulence agnostic model can lead to potential flaws, and how to configure a simple spatial-temporal non-local weighted averaging method to generate references. For blind deconvolution, we present a new data-driven prior by analyzing the distributions of the point spread functions. We demonstrate how a simple prior can outperform state-of-the-art blind deconvolution methods.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
A Channel Measurement Campaign for mmWave Communication in Industrial Settings
Authors:
Adrian Loch,
Cristina Cano,
Gek Hong,
Sim,
Arash Asadi,
Xavier Vilajosana
Abstract:
Industry 4.0 relies heavily on wireless technologies. Energy efficiency and device cost have played a significant role in the initial design of such wireless systems for industry automation. However, high reliability, high throughput, and low latency are also key for certain sectors such as the manufacturing industry. In this sense, existing wireless solutions for industrial settings are limited.…
▽ More
Industry 4.0 relies heavily on wireless technologies. Energy efficiency and device cost have played a significant role in the initial design of such wireless systems for industry automation. However, high reliability, high throughput, and low latency are also key for certain sectors such as the manufacturing industry. In this sense, existing wireless solutions for industrial settings are limited. Emerging technologies such as millimeter-wave (mmWave) communication are highly promising to address this bottleneck. Still, the propagation characteristics at such high frequencies in harsh industrial settings are not well understood. Related work in this area is limited to isolated measurements in specific scenarios. In this work, we carry out an extensive measurement campaign in highly representative industrial environments. Most importantly, we derive the statistical distributions of the channel parameters of widely accepted mmWave channel models that fit these environments. This is a highly valuable contribution, since researchers in this field can use our empirical model to understand the performance of their mmWave systems in typical industrial settings. Beyond analyzing and discussing our insights, with this paper we also shareoour extensive dataset with the research community.
△ Less
Submitted 25 March, 2019;
originally announced March 2019.
-
Covariance-Insured Screening
Authors:
Kevin He,
Jian Kang,
Hyokyoung Grace Hong,
Ji Zhu,
Yanming Li,
Huazhen Lin,
Han Xu,
Yi Li
Abstract:
Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation infor…
▽ More
Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors far greater than the sample size. In order to identify more novel biomarkers and understand biological mechanisms, it is vital to detect signals weakly associated with outcomes among ultrahigh-dimensional predictors. However, existing screening methods, which typically ignore correlation information, are likely to miss these weak signals. By incorporating the inter-feature dependence, we propose a covariance-insured screening methodology to identify predictors that are jointly informative but only marginally weakly associated with outcomes. The validity of the method is examined via extensive simulations and real data studies for selecting potential genetic factors related to the onset of cancer.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring
Authors:
Chong Zhang,
Geok Soon Hong,
Jun-Hong Zhou,
Kay Chen Tan,
Haizhou Li,
Huan Xu,
Jihoon Hong,
Hian-Leng Chan
Abstract:
In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear…
▽ More
In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear estimation based on the different tool states. The proposed framework has the advantage of automatic feature representation learning and shows better performance in accuracy and robustness. The effectiveness of the proposed DBNMS is validated using a real-world dataset obtained from the gun drilling process. This dataset contains a large amount of measured signals involving different tool geometries under various operating conditions. The DBNMS is examined for both the tool state estimation and tool wear estimation tasks. In the experimental studies, the prediction results are evaluated and compared with popular machine learning approaches, which show the superior performance of the proposed DBNMS approach.
△ Less
Submitted 30 April, 2018;
originally announced May 2018.
-
A Cost-Sensitive Deep Belief Network for Imbalanced Classification
Authors:
Chong Zhang,
Kay Chen Tan,
Haizhou Li,
Geok Soon Hong
Abstract:
Imbalanced data with a skewed class distribution are common in many real-world applications. Deep Belief Network (DBN) is a machine learning technique that is effective in classification tasks. However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class. To deal with this problem, cost-sensitive approaches assign different misclassi…
▽ More
Imbalanced data with a skewed class distribution are common in many real-world applications. Deep Belief Network (DBN) is a machine learning technique that is effective in classification tasks. However, conventional DBN does not work well for imbalanced data classification because it assumes equal costs for each class. To deal with this problem, cost-sensitive approaches assign different misclassification costs for different classes without disrupting the true data sample distributions. However, due to lack of prior knowledge, the misclassification costs are usually unknown and hard to choose in practice. Moreover, it has not been well studied as to how cost-sensitive learning could improve DBN performance on imbalanced data problems. This paper proposes an evolutionary cost-sensitive deep belief network (ECS-DBN) for imbalanced classification. ECS-DBN uses adaptive differential evolution to optimize the misclassification costs based on training data, that presents an effective approach to incorporating the evaluation measure (i.e. G-mean) into the objective function. We first optimize the misclassification costs, then apply them to deep belief network. Adaptive differential evolution optimization is implemented as the optimization algorithm that automatically updates its corresponding parameters without the need of prior domain knowledge. The experiments have shown that the proposed approach consistently outperforms the state-of-the-art on both benchmark datasets and real-world dataset for fault diagnosis in tool condition monitoring.
△ Less
Submitted 5 May, 2018; v1 submitted 28 April, 2018;
originally announced April 2018.
-
Analysis of the Game-Theoretic Modeling of Backscatter Wireless Sensor Networks under Smart Interference
Authors:
Seung Gwan Hong,
Yu Min Hwang,
Sun Yui Lee,
Yoan Shin,
Dong In Kim,
Jin Young Kim
Abstract:
In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in ba…
▽ More
In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in backscatter WSNs. We formulate the problem based on a Stackelberg game theory and compute the optimal transmit power, time-switching ratio, and sub-channel allocation parameter to maximize a utility function against the smart interference. We propose two algorithms for the utility maximization using Lagrangian dual decomposition for the backscatter WSN and the smart interference to prove the existence of the Stackelberg equilibrium. Numerical results show that the proposed algorithms effectively maximize the utility, compared to that of the algorithm based on the Nash game, so as to overcome smart interference in backscatter communications.
△ Less
Submitted 21 December, 2017;
originally announced December 2017.
-
A Multi-Bit Neuromorphic Weight Cell using Ferroelectric FETs, suitable for SoC Integration
Authors:
Borna Obradovic,
Titash Rakshit,
Ryan Hatcher,
Jorge Kittl,
Rwik Sengupta,
Joon Goo Hong,
Mark S. Rodder
Abstract:
A multi-bit digital weight cell for high-performance, inference-only non-GPU-like neuromorphic accelerators is presented. The cell is designed with simplicity of peripheral circuitry in mind. Non-volatile storage of weights which eliminates the need for DRAM access is based on FeFETs and is purely digital. The Multiply-and-Accumulate operation is performed using passive resistors, gated by FeFETs.…
▽ More
A multi-bit digital weight cell for high-performance, inference-only non-GPU-like neuromorphic accelerators is presented. The cell is designed with simplicity of peripheral circuitry in mind. Non-volatile storage of weights which eliminates the need for DRAM access is based on FeFETs and is purely digital. The Multiply-and-Accumulate operation is performed using passive resistors, gated by FeFETs. The resulting weight cell offers a high degree of linearity and a large ON/OFF ratio. The key performance tradeoffs are investigated, and the device requirements are elucidated.
△ Less
Submitted 22 October, 2017;
originally announced October 2017.