Abstract
In this article, we introduces a methodology for language research using deep learning models in English, Korean, and other languages. Deep learning language models learn the stochastic pattern of a sequence of language representations. Thus, language models are sensitive to abnormal distributions of language expressions. This allowed us to calculate the so-called surprisal value from a psycholinguistic perspective. Ideally, this methodological pipeline could be used for all levels of language research, including morphology, syntax, and semantics. This method can also be used to analyze discourse and information structures. Furthermore, even the judgment of knowledge about the world and common sense manifested in language data can be analyzed using the surprisal metrics. We do not argue that deep learning-based methods are necessarily appropriate or accurate in linguistic studies. However, we can say that deep learning techniques can be employed as a viable method for further studies of human language. The library configured for this purpose will be made available online. |