A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Measuring Neural Net Robustness with Constraints
[article]
2017
arXiv
pre-print
We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. ...
We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. ...
The aim of our paper is to provide metrics for evaluating robustness, and to demonstrate the importance of using such impartial measures to compare robustness. ...
arXiv:1605.07262v2
fatcat:fdx3sqki6reahmk25xdsu4vr3m
Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions
[article]
2024
arXiv
pre-print
We evaluate the model robustness against imperceptible random p-norm corruptions and propose a novel robustness metric. ...
We empirically investigate whether robustness transfers across different p-norms and derive conclusions on which p-norm corruptions a model should be trained and evaluated. ...
corruptions for calculating mCE Lp and iCE metrics at test time. ...
arXiv:2305.05400v4
fatcat:335enfgtdrhqrnntmbs6u4ltie
Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems
[article]
2022
arXiv
pre-print
To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment ...
decisions without any need for lengthy A/B experimentation. ...
In the pre-deployment evaluation, a set of expert-defined guard-rails is applied to the evaluation results to ensure robust model updates, especially for business-critical cases. ...
arXiv:2204.07135v1
fatcat:lz4r3xv5brhzzgqigkjtuqhfti
Robust Speech Recognition Using Warped Dft-Based Cepstral Features In Clean And Multistyle Training
2014
Zenodo
Word error rate (WER) is used as an evaluation metric. ...
Results and Discussion Word error rate (WER) is used as an evaluation metric for performance evaluation and comparison of the warped DFTbased cepstral feature extraction methods. ...
doi:10.5281/zenodo.54513
fatcat:ujwo2mtr5nfclgplxxiuelyha4
LLM-based Frameworks for Power Engineering from Routine to Novel Tasks
[article]
2023
arXiv
pre-print
Bard in terms of success rate, consistency, and robustness. ...
Here, we propose LLM-based frameworks for different programming tasks in power systems. ...
A set of evaluation metrics are designed to assess LLMs on a multi-metric scale, encompassing preknowledge in prompt, model assessment metrics and code assessment metrics in terms of success rate, consistency ...
arXiv:2305.11202v3
fatcat:qhgdn6lsnbggnnv3qstxwtkswq
Robustness Verification of Semantic Segmentation Neural Networks Using Relaxed Reachability
[chapter]
2021
Lecture Notes in Computer Science
of intersection-over-union (IoU), the typical performance evaluation measure for segmentation tasks. ...
AbstractThis paper introduces robustness verification for semantic segmentation neural networks (in short, semantic segmentation networks [SSNs]), building on and extending recent approaches for robustness ...
Additionally, we define and evaluate several metrics for robustness, as the robustness evaluation is more sophisticated for segmentation. ...
doi:10.1007/978-3-030-81685-8_12
fatcat:3co6pfnvxbahzjrgyq6rjuthrm
Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data
[article]
2019
arXiv
pre-print
We evaluate the effects of distinct values for the margin parameter focused on translational embedding representation models for multi-relational categorized data. ...
Finally, the correlation between link prediction and classification accuracy shows traditional validation protocol for embedding models is a weak metric to represent the quality of embedding representation ...
traditional LP metrics within the embedding training and evaluation protocols. ...
arXiv:1912.10264v1
fatcat:i5kbtznn5fcsnmb5uwwsehkbcq
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
[article]
2023
arXiv
pre-print
However, in this paper, we find when adaptation protocols (LP, FT, LP+FT) are also evaluated on a variety of safety objectives (e.g., calibration, robustness, etc.), a complementary perspective to feature ...
Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution ...
ACKNOWLEDGMENTS We thank Ekdeep Singh Lubana for several helpful discussions during the course of this project. This work was performed under the auspices of the U.S. ...
arXiv:2303.13500v1
fatcat:l53glz2a7vbsxd2nr4lvwzoipm
AutoFT: Learning an Objective for Robust Fine-Tuning
[article]
2024
arXiv
pre-print
We propose AutoFT, a data-driven approach for robust fine-tuning. Given a task, AutoFT searches for a fine-tuning procedure that enhances out-of-distribution (OOD) generalization. ...
Specifically, AutoFT uses bi-level optimization to search for an objective function and hyperparameters that maximize post-adaptation performance on a small OOD validation set. ...
Acknowledgements We thank Kyle Hsu, Lukas Haas, and other members of the IRIS lab for helpful feedback and discussions. We also thank Sachin Goyal for help with ImageNet experiments. ...
arXiv:2401.10220v2
fatcat:vrxnqn7tmza27bts5xmoi5cbbe
Joint Lp-Norm and L2,1-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery
2021
Frontiers in Genetics
In this article, a novel method named Lp-norm and L2,1-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust ...
Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. ...
ACKNOWLEDGMENTS Thanks a lot for my co-tutor Yong Xu who is now a professor in Harbin Institute of Technology, Shenzhen, China. ...
doi:10.3389/fgene.2021.621317
pmid:33708239
pmcid:PMC7940841
fatcat:rjqiv52dwfazzgzyt7yxfflszm
Optimizing Dynamic Trajectories for Robustness to Disturbances Using Polytopic Projections
[article]
2020
arXiv
pre-print
This paper focuses on robustness to disturbance forces and uncertain payloads. We present a novel formulation to optimize the robustness of dynamic trajectories. ...
The non-trivial transcription proposed allows trajectory optimization frameworks to converge to highly robust dynamic solutions. ...
We would also like to thank the anonymous reviewers for their constructive comments. ...
arXiv:2003.00609v2
fatcat:pnpwhyammvbntlb7gbnfnqlfbi
Quantifying Degrees of Controllability in Temporal Networks with Uncertainty
2019
International Conference on Automated Planning and Scheduling
We introduce new methods for predicting the degrees of strong and dynamic controllability for uncontrollable networks. ...
In addition, we show empirically that both metrics are good predictors of the actual dispatch success rate. ...
Finally, we thank Jordan Abrahams, Susan Martonosi, and Mohamed Omar for offering their expertise in temporal networks, optimization, and convex geometry respectively. ...
dblp:conf/aips/AkmalALB19
fatcat:5w6cojo2mrgfxahlgwolwn6d6q
Robust Validation of Network Designs under Uncertain Demands and Failures
2017
Symposium on Networked Systems Design and Implementation
Acknowledgements We thank our shepherd Nate Foster, and the reviewers for their insightful feedback. ...
Beyond networking, the complexity status of robust optimization formulations has been investigated and tractable formulations derived for various special cases [12, 14] . ...
We show that these techniques lead to tighter bounds on the validation problem than existing state-of-the-art approaches in robust optimization, a finding that has applications beyond networking. ...
dblp:conf/nsdi/ChangRT17
fatcat:b6bogltymnbbpesab3wt2pvcuu
Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models
[article]
2024
arXiv
pre-print
Utilizing open-sourced OPT and Llama-2 models up to 13B in size, two publicly available instruction-tuning training datasets and evaluated by both automatic metrics & humans, our paper introduces a novel ...
Our experiments span different-sized models, revealing that this characteristic holds for models ranging from 1B (small) to 13B (large) in size. ...
The authors thank Shang Data Lab, Palash Chauhan, Amulya Bangalore, Shreyas Rajesh, Gautham Reddy, Sanjana Garg, Ethan Thai, Queso Tran, Rahul Mistry, Sandy La, and Sophia Do for their valuable contributions ...
arXiv:2402.10430v1
fatcat:npxudxgqejhlfecre7l36pmnta
Beyond BLEU:Training Neural Machine Translation with Semantic Similarity
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
While most neural machine translation (NMT) systems are still trained using maximum likelihood estimation, recent work has demonstrated that optimizing systems to directly improve evaluation metrics such ...
In this paper, we introduce an alternative reward function for optimizing NMT systems that is based on recent work in semantic similarity. ...
Cer et al. (2010) compared several metrics to optimize for SMT, finding BLEU to be robust as a training metric and finding that the most effective and most stable metrics for training are not necessarily ...
doi:10.18653/v1/p19-1427
dblp:conf/acl/WietingBGN19
fatcat:ckylq5pjtbfhhpswp2lkgpplwe
« Previous
Showing results 1 — 15 out of 9,165 results