[JAIH Vol. 15] Evaluation of Language Model Robustness Using Implicit Unethical Data_Yu Jin, Kim/ Ga Yeon, Jung/ Han Saem, Kim

Chung-Ang University
AI Humanities Research Institute
HK+ Artificial Intelligence Humanities

eISSN: 2951-388X

Print ISSN: 2635-4691 / Online ISSN: 2951-388X


Title	[JAIH Vol. 15] Evaluation of Language Model Robustness Using Implicit Unethical Data_Yu Jin, Kim/ Ga Yeon, Jung/ Han Saem, Kim2024-01-26 15:00
Writer	aihadmin
Attachment	03.암시적 비윤리 데이터를 활용한 언어 모델의 강건성 평가.pdf (11.06MB)
Abstract Unlike explicit unethical expressions, implicit unethical expressions are not only difficult to select as training data but also difficult to predict future production patterns. Therefore, to improve the detection ability of language models for implicit unethical expressions, research into the weaknesses of the models is essential. In this paper, we changed the notation of implicit unethical expressions (YaminJeongeum, alien words) and inserted positive factors (vocabulary, emojis) to induce changes in the model’s predictions. We also designed additional experiments using YaminJeongeum, alien words, and emojis. As a result, we found that (1) the influence of emojis is stronger than the text itself in the language model detection process, and (2) the language model is vulnerable to certain input variations. Thus, we then constructed a fine tuning dataset using the input variants that the language model was weak on, and fine tuned the model, which led to a noticeable performance improvement. We concluded that training with more diverse types of data is critical to improve the ability of language models to detect unethical expressions. We hope that this study will stimulate further research on implicit unethical expressions detection using language models.

Prev	[JAIH Vol. 15] Approaches for Constructing Data for Developing an Internet Meme Translation System_Dae Kyu, Lee/ Chan Kyu, Lee	aihadmin	2024-01-26
-	[JAIH Vol. 15] Evaluation of Language Model Robustness Using Implicit Unethical Data_Yu Jin, Kim/ Ga Yeon, Jung/ Han Saem, Kim	aihadmin	2024-01-26
Next	[JAIH Vol. 15] New Direction on the Research of ‘Human-Like Artificial Intelligence’ - Through the Necessity of Multi-dimensional Hermeneutics_Jong Woo, Kim/ Dong Jae, Kim	aihadmin	2024-01-26