French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English.

We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. ... We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country ... We thank James Fiumara and Christopher Cieri for their guidance in the use of the Language ARC platform. ...

doi:10.18653/v1/2022.acl-long.583 fatcat:i3fyb46ojbh3pc4iwgqkun6slu

These novel insights contribute to a deeper understanding of bias mitigation in multilingual language models and provide practical guidance for debiasing techniques in different language contexts. ... Using translations of the CrowS-Pairs dataset, our analysis identifies SentenceDebias as the best technique across different languages, reducing bias in mBERT by an average of 13%. ... The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center). ...

arXiv:2310.10310v1 fatcat:pc5eemthpfdc3jrwcen5s7jkj4

Open Access

BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). ... Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. ... Model training ran on the Jean-Zay supercomputer of GENCI at IDRIS, and we thank the IDRIS team for their responsive support throughout the project, in particular Rémi Lacroix. ...

arXiv:2211.05100v3 fatcat:3q5i7btwyzd7pdqlkb5zf4f3fu

Open Access Multiple Versions

Citation

BigScience Workshop: Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro Von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo Wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf. "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model." arXiv (2023)

Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. ... Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. ... Acknowledgements This paper is based on results obtained from a project, JPNP18002, commissioned by the New Energy and Industrial Technology Development Organization (NEDO). ...

arXiv:2205.00551v3 fatcat:pr3hx54aszfxjg3t32qfelh5gy

Open Access Multiple Versions

multilingual dataset of social stereotypes, containing over 25K stereotypes, spanning 20 languages, with human annotations across 23 regions, and demonstrate its utility in identifying gaps in model evaluations ... While generative multilingual models are rapidly being deployed, their safety and fairness evaluations are largely limited to resources collected in English. ... Crows-pairs: A challenge dataset for measuring social biases in masked lan-guage models. In Proceedings of the 2020 Confer- Dirk Hovy and Diyi Yang. 2021. ...

arXiv:2403.05696v1 fatcat:ckcsjgpypbdw3b4o5sntiikqn4

Open Access

This paper presents a survey of fairness in multilingual and non-English contexts, highlighting the shortcomings of current research and the difficulties faced by methods designed for English. ... Thus, the measurement and mitigation of biases must evolve beyond the current dataset-driven practices that are narrowly focused on specific dimensions and types of biases and, therefore, impossible to ... Kaneko et al. (2022) measures gender bias in masked language models and proposes a method to use parallel corpora to evaluate bias in languages shown to have high correlations with human bias annotations ...

arXiv:2302.12578v2 fatcat:l6vll6d2xjcndd2nh4mfgaghee

Open Access Multiple Versions

Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. ... Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: ... than English. ...

arXiv:2309.00770v2 fatcat:idqaltzjdndnhnwcgd7dsqcejm

Multiple Versions

Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models. ... While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. ... Crows-pairs: A chal-Findings of the Association for Computational Lin-lenge dataset for measuring social biases in masked guistics: ACL/IJCNLP 2021, Online Event, August language models. arXiv preprint ...

arXiv:2112.10668v3 fatcat:lllzazhlv5at3mqbcc43dboq3a

Open Access Multiple Versions

Citation

Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li. "Few-shot Learning with Multilingual Language Models." arXiv (2022)

While existing works propose bias evaluation and mitigation methods for various tasks, there remains a need to cohesively understand the biases and the specific harms they measure, and how different measures ... As a validation of our framework and documentation questions, we also present several case studies of how existing bias measures in NLP -- both intrinsic measures of bias in representations and extrinsic ... to feasibly automatically label a large number of samples, there could also be biases from the classifier itself. ...

arXiv:2108.03362v2 fatcat:3e3ezwedyzc37hfm4nyblkbr7q

Open Access Multiple Versions

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. ... We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models. ... Acknowledgements We would like to thank Scott Jeschonek, Giri Anantharaman, Diego Sarina, Joaquin Colombo, Chris Bray, Stephen Roylance, Kalyan Saladi, Shubho Sengupta, and Brian O'Horo for helping to ...

arXiv:2205.01068v4 fatcat:arnzfthgmzfmzj5u6ut3bbct3q

Open Access Multiple Versions

Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? ... Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. ... We also thank the language experts who helped us understand the quality of model generations in their languages. We thank John Dang for helping to convert Aya T5x checkpoint to PyTorch. ...

arXiv:2402.07827v1 fatcat:dyu3fpzkgrfdjo54bu7yzx2bpm

Open Access

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. ... We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible ... We thank Shubho Sengupta, Kalyan Saladi, and all the AI infra team for their support. We thank Jane Yu for her input on evaluation. We thank Yongyi Hu for his help on data collection. ...

arXiv:2302.13971v1 fatcat:ug5hspba4jggrospmn45jk6qpy

Open Access

These prompted datasets allow for benchmarking the ability of a model to perform completely held-out tasks. ... To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable prompted form. ... The goal of the project is to research language models in a public environment outside large technology companies. The project has 600 researchers from 50 countries and more than 250 institutions. ...

arXiv:2110.08207v3 fatcat:vvacmc2phfg7dpmdqiebyvxvei

Open Access Multiple Versions

Citation

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush. "Multitask Prompted Training Enables Zero-Shot Task Generalization." arXiv (2022)

These novel insights contribute to a deeper understanding of bias mitigation in multilingual language models and provide practical guidance for debiasing techniques in different language contexts. ... Using translations of the CrowS-Pairs dataset, our analysis identifies SentenceDebias as the best technique across different languages, reducing bias in mBERT by an average of 13%. ... The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center). ...

doi:10.18653/v1/2023.emnlp-main.175 fatcat:dpa2u65tazeyfmfblfmu6bvvt4

These results indicate that fairness or bias evaluation remains challenging for contextualized language models, among other reasons because these choices remain subjective. ... We survey the literature on fairness metrics for pre-trained language models and experimentally evaluate compatibility, including both biases in language models and in their downstream tasks. ... Acknowledgements We thank Luc De Raedt for his continued support, Jessa Bekker for her practical advice on writing a survey, and Eva Vanmassenhove for sharing her knowledge on gender bias. ...

doi:10.18653/v1/2022.naacl-main.122 fatcat:d6gqg2ozircbbagywexckkrn3u

French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

Preserved Fulltext

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques [article]

Preserved Fulltext

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [article]

Preserved Fulltext

Other Versions

Gender Bias in Masked Language Models for Multiple Languages [article]

Preserved Fulltext

Other Versions

SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes [article]

Preserved Fulltext

Fairness in Language Models Beyond English: Gaps and Challenges [article]

Preserved Fulltext

Other Versions

Bias and Fairness in Large Language Models: A Survey [article]

Preserved Fulltext

Other Versions

Few-shot Learning with Multilingual Language Models [article]

Preserved Fulltext

Other Versions

On Measures of Biases and Harms in NLP [article]

Preserved Fulltext

Other Versions

OPT: Open Pre-trained Transformer Language Models [article]

Preserved Fulltext

Other Versions

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model [article]

Preserved Fulltext

LLaMA: Open and Efficient Foundation Language Models [article]

Preserved Fulltext

Multitask Prompted Training Enables Zero-Shot Task Generalization [article]

Preserved Fulltext

Other Versions

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Preserved Fulltext

Measuring Fairness with Biased Rulers: A Comparative Study on Bias Metrics for Pre-trained Language Models

Preserved Fulltext