• Chung-Ang University

    Humanities Research Institute
    HK+ Artificial Intelligence Humanities

JournalsPast Issues

Past Issues

eISSN: 2951-388X
Print ISSN: 2635-4691 / Online ISSN: 2951-388X
Title[JAIH Vol. 15] Evaluation of Language Model Robustness Using Implicit Unethical Data_Yu Jin, Kim/ Ga Yeon, Jung/ Han Saem, Kim2024-01-26 15:00
Writer Level 10
Attachment03.암시적 비윤리 데이터를 활용한 언어 모델의 강건성 평가.pdf (11.06MB)

Abstract

 

 

 

 

 

Unlike explicit unethical expressions, implicit unethical expressions are not only difficult to select as training data but also difficult to predict future production patterns. Therefore, to improve the detection ability of language models for implicit unethical expressions, research into the weaknesses of the models is essential. In this paper, we changed the notation of implicit unethical expressions (YaminJeongeum, alien words) and inserted positive factors (vocabulary, emojis) to induce changes in the model’s predictions. We also designed additional experiments using YaminJeongeum, alien words, and emojis. As a result, we found that (1) the influence of emojis is stronger than the text itself in the language model detection process, and (2) the language model is vulnerable to certain input variations. Thus, we then constructed a fine tuning dataset using the input variants that the language model was weak on, and fine tuned the model, which led to a noticeable performance improvement. We concluded that training with more diverse types of data is critical to improve the ability of language models to detect unethical expressions. We hope that this study will stimulate further research on implicit unethical expressions detection using language models. 

Chung-Ang University, Humanities Research Institute
#828, 310 Hall, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Korea  TEL +82-2-881-7354  FAX +82-2-813-7353  E-mail : aihumanities@cau.ac.krCOPYRIGHT(C) 2017-2023 CAU HUMANITIES RESEARCH INSTITUTE ALL RIGHTS RESERVED