The Role of Domain Knowledge in Deep Learning-based Natural Language Processing
Jinho Park
In Symbolic AI, the domain knowledge was considered indispensable. In rule-based NLP, likewise, the linguistic knowledge played an important role. As probabilistic NLP and machine learning techniques develop, the role of domain knowledge shrank. As deep learning appears, even the role of feature engineering and domain knowledge has become almost zero. In order to prove the importance of domain knowledge even in this deep learning age, I built a parts-of-speech tagger of Korean. This task in Korean is challenging, due to morphophonological alternations, deletions and contractions. I reformulated this task of segmentation as that of classification. For this purpose, I examined a large corpus, and found empirically 200 types of mapping between an input syllable and an output string. Based on these categories, I built and trained an LSTM-based neural network. With this model of segmentation, the parts-of-speech tagging model is easily trained by the familiar sequence tagging algorithm. By combining these two models and a few dictionaries, I got 98.0% of the F1 score.
Key words: domain knowledge, natural language processing, tagging, segmentation, deep learning |