Sogang Students Win National Institute of Korean Language Director General Award at 2020 National Language Processing System Contest  

- For Development of Sentiment Analyzing AI Software Utilizing a Pre-training Language Model -




(From the left) Chun Jaeyoon, Hwang Su-hyeon (alumnus), Kim Hyeon-jeong of Sogang University

Chun Jaeyoon (Dept. of Art & Technology, 15), Kim Hyeon-jeong (Dept. of European Languages and Cultures, 16) Hwang Su-hyeon (Graduate of School of Communication, 15) won the National Institute of Korean Language Director General Award (Bronze Prize) at the 2020 National Language Processing System Contest jointly hosted by the Ministry of Culture, Sports and Tourism and the National Institute of Korean Language (NIKL). This contest is held annually to promote the utilization of NIKL-developed Korean corpus resources and drive expanded informatization of the Korean language.


The 2020 competition offered two categories for entries: “Designated Field” and “General Field.” The Designated Field was for the development of software that analyzes the sentiment of review comments on movies, sports, and TV programs. Registration for the competition, which took place from September 1 to 15, was open to anyone regardless of qualifications. Over 200 teams applied. Out of all of the applicants, twelve teams were chosen to compete. Of the teams, one team won the Grand Prize, one team won the Gold Prize, two teams won the Silver Prize, three teams won the Bronze Prize and five teams won the Special Prize.


Winner of the National Institute of Korean Language Director General Prize, the Sogang team entered their development of a comment analysis model utilizing the Pre-training Language Model, “BERT” in the Designated Field. This model screens comments posted on movie reviews, sports news, and TV programs, and classifies them as positive or negative.


Considering the unique features of the data in the comments, the Sogang team used “KC BERT” pre-trained on these data, and, to overcome the limit of the lack of pre-trained data on comments on sports news and TV programs, proposed a “Domain Adoption Methodology” able to select similar pre-trained positive/negative data from existing comments. By applying this methodology, the Sogang team proposed an effective sentiment analysis data-building method utilizing “the Everyone's Corpus” released by the NIKL.


With humanities and social sciences as their primary majors, the Sogang team members took a secondary or third major in science and engineering courses, including convergence software, convergence cognitive science, big data science, and computer science and engineering. The members believe that the convergence of different perspectives offered by their different majors in approaching a problem is what led to such a good result.


Regarding the motivation to take on the challenge, Chun Jaeyoon said, "I have developed convergent thinking by taking multiple courses in art and technology together with convergence software. In addition, I realized the importance of language while in New Zealand as an exchange student. These experiences have inspired me to take an interest in the convergence of language and technology, which is natural language processing. Kim Hyeon-jeong added, "I knew from my own experience learning various foreign languages that foreign language learning is fundamentally similar to natural language processing. Thus, I was able to understand related themes and apply them more easily than others."


Hwang Su-hyeon, a graduate from Sogang, said, "Studying related technologies for this contest was a meaningful experience for me, in which I was able to envision the future of convergence between the media industry and AI."