Optimization for Robustness Evaluation beyond 𝓁p Metrics.

We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. ... We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. ... The aim of our paper is to provide metrics for evaluating robustness, and to demonstrate the importance of using such impartial measures to compare robustness. ...

arXiv:1605.07262v2 fatcat:fdx3sqki6reahmk25xdsu4vr3m

Multiple Versions

We evaluate the model robustness against imperceptible random p-norm corruptions and propose a novel robustness metric. ... We empirically investigate whether robustness transfers across different p-norms and derive conclusions on which p-norm corruptions a model should be trained and evaluated. ... corruptions for calculating mCE Lp and iCE metrics at test time. ...

arXiv:2305.05400v4 fatcat:335enfgtdrhqrnntmbs6u4ltie

Open Access Multiple Versions

To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment ... decisions without any need for lengthy A/B experimentation. ... In the pre-deployment evaluation, a set of expert-defined guard-rails is applied to the evaluation results to ensure robust model updates, especially for business-critical cases. ...

arXiv:2204.07135v1 fatcat:lz4r3xv5brhzzgqigkjtuqhfti

Open Access

Word error rate (WER) is used as an evaluation metric. ... Results and Discussion Word error rate (WER) is used as an evaluation metric for performance evaluation and comparison of the warped DFTbased cepstral feature extraction methods. ...

doi:10.5281/zenodo.54513 fatcat:ujwo2mtr5nfclgplxxiuelyha4

Open Access

Bard in terms of success rate, consistency, and robustness. ... Here, we propose LLM-based frameworks for different programming tasks in power systems. ... A set of evaluation metrics are designed to assess LLMs on a multi-metric scale, encompassing preknowledge in prompt, model assessment metrics and code assessment metrics in terms of success rate, consistency ...

arXiv:2305.11202v3 fatcat:qhgdn6lsnbggnnv3qstxwtkswq

Multiple Versions

of intersection-over-union (IoU), the typical performance evaluation measure for segmentation tasks. ... AbstractThis paper introduces robustness verification for semantic segmentation neural networks (in short, semantic segmentation networks [SSNs]), building on and extending recent approaches for robustness ... Additionally, we define and evaluate several metrics for robustness, as the robustness evaluation is more sophisticated for segmentation. ...

doi:10.1007/978-3-030-81685-8_12 fatcat:3co6pfnvxbahzjrgyq6rjuthrm

Open Access

We evaluate the effects of distinct values for the margin parameter focused on translational embedding representation models for multi-relational categorized data. ... Finally, the correlation between link prediction and classification accuracy shows traditional validation protocol for embedding models is a weak metric to represent the quality of embedding representation ... traditional LP metrics within the embedding training and evaluation protocols. ...

arXiv:1912.10264v1 fatcat:i5kbtznn5fcsnmb5uwwsehkbcq

However, in this paper, we find when adaptation protocols (LP, FT, LP+FT) are also evaluated on a variety of safety objectives (e.g., calibration, robustness, etc.), a complementary perspective to feature ... Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution ... ACKNOWLEDGMENTS We thank Ekdeep Singh Lubana for several helpful discussions during the course of this project. This work was performed under the auspices of the U.S. ...

arXiv:2303.13500v1 fatcat:l53glz2a7vbsxd2nr4lvwzoipm

We propose AutoFT, a data-driven approach for robust fine-tuning. Given a task, AutoFT searches for a fine-tuning procedure that enhances out-of-distribution (OOD) generalization. ... Specifically, AutoFT uses bi-level optimization to search for an objective function and hyperparameters that maximize post-adaptation performance on a small OOD validation set. ... Acknowledgements We thank Kyle Hsu, Lukas Haas, and other members of the IRIS lab for helpful feedback and discussions. We also thank Sachin Goyal for help with ImageNet experiments. ...

arXiv:2401.10220v2 fatcat:vrxnqn7tmza27bts5xmoi5cbbe

Open Access Multiple Versions

In this article, a novel method named Lp-norm and L2,1-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust ... Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. ... ACKNOWLEDGMENTS Thanks a lot for my co-tutor Yong Xu who is now a professor in Harbin Institute of Technology, Shenzhen, China. ...

doi:10.3389/fgene.2021.621317 pmid:33708239 pmcid:PMC7940841 fatcat:rjqiv52dwfazzgzyt7yxfflszm

DOAJ

This paper focuses on robustness to disturbance forces and uncertain payloads. We present a novel formulation to optimize the robustness of dynamic trajectories. ... The non-trivial transcription proposed allows trajectory optimization frameworks to converge to highly robust dynamic solutions. ... We would also like to thank the anonymous reviewers for their constructive comments. ...

arXiv:2003.00609v2 fatcat:pnpwhyammvbntlb7gbnfnqlfbi

Multiple Versions

We introduce new methods for predicting the degrees of strong and dynamic controllability for uncontrollable networks. ... In addition, we show empirically that both metrics are good predictors of the actual dispatch success rate. ... Finally, we thank Jordan Abrahams, Susan Martonosi, and Mohamed Omar for offering their expertise in temporal networks, optimization, and convex geometry respectively. ...

dblp:conf/aips/AkmalALB19 fatcat:5w6cojo2mrgfxahlgwolwn6d6q

Acknowledgements We thank our shepherd Nate Foster, and the reviewers for their insightful feedback. ... Beyond networking, the complexity status of robust optimization formulations has been investigated and tractable formulations derived for various special cases [12, 14] . ... We show that these techniques lead to tighter bounds on the validation problem than existing state-of-the-art approaches in robust optimization, a finding that has applications beyond networking. ...

dblp:conf/nsdi/ChangRT17 fatcat:b6bogltymnbbpesab3wt2pvcuu

Utilizing open-sourced OPT and Llama-2 models up to 13B in size, two publicly available instruction-tuning training datasets and evaluated by both automatic metrics & humans, our paper introduces a novel ... Our experiments span different-sized models, revealing that this characteristic holds for models ranging from 1B (small) to 13B (large) in size. ... The authors thank Shang Data Lab, Palash Chauhan, Amulya Bangalore, Shreyas Rajesh, Gautham Reddy, Sanjana Garg, Ethan Thai, Queso Tran, Rahul Mistry, Sandy La, and Sophia Do for their valuable contributions ...

arXiv:2402.10430v1 fatcat:npxudxgqejhlfecre7l36pmnta

Open Access

While most neural machine translation (NMT) systems are still trained using maximum likelihood estimation, recent work has demonstrated that optimizing systems to directly improve evaluation metrics such ... In this paper, we introduce an alternative reward function for optimizing NMT systems that is based on recent work in semantic similarity. ... Cer et al. (2010) compared several metrics to optimize for SMT, finding BLEU to be robust as a training metric and finding that the most effective and most stable metrics for training are not necessarily ...

doi:10.18653/v1/p19-1427 dblp:conf/acl/WietingBGN19 fatcat:ckylq5pjtbfhhpswp2lkgpplwe

Measuring Neural Net Robustness with Constraints [article]

Preserved Fulltext

Other Versions

Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions [article]

Preserved Fulltext

Other Versions

Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems [article]

Preserved Fulltext

Robust Speech Recognition Using Warped Dft-Based Cepstral Features In Clean And Multistyle Training

Preserved Fulltext

LLM-based Frameworks for Power Engineering from Routine to Novel Tasks [article]

Preserved Fulltext

Other Versions

Robustness Verification of Semantic Segmentation Neural Networks Using Relaxed Reachability [chapter]

Preserved Fulltext

Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data [article]

Preserved Fulltext

A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias [article]

Preserved Fulltext

AutoFT: Learning an Objective for Robust Fine-Tuning [article]

Preserved Fulltext

Joint Lp-Norm and L2,1-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery

Preserved Fulltext

Optimizing Dynamic Trajectories for Robustness to Disturbances Using Polytopic Projections [article]

Preserved Fulltext

Other Versions

Quantifying Degrees of Controllability in Temporal Networks with Uncertainty

Preserved Fulltext

Robust Validation of Network Designs under Uncertain Demands and Failures

Preserved Fulltext

Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models [article]

Preserved Fulltext

Beyond BLEU:Training Neural Machine Translation with Semantic Similarity

Preserved Fulltext