An Introduction to Two Platforms on Asian Language Processing: Workshop on Asian Translation and Asian Language Treebank Project

Dr. ChenChen Ding

National Institute of Information and Communication Technology (NICT), Japan

In this talk, I will introduce two platforms for Asian language processing: Workshop on Asian Translation (WAT) and Asian Language Treebank (ALT) Project, covering the translation evaluation on mainstream Asian languages, and the corpus construction for low-resource Asian languages.

The WAT is an open machine translation evaluation campaign focusing on Asian languages. It has no deadline for the automatic translation quality evaluation, as the test data is fixed and open. Registered participants can submit translation results at any time. Further human evaluation for specific tasks is conducted on the annual workshop. WAT covers the translation tasks between English and Chinese/Japanese/Korean /Hindi/Indonesia on various fields.

The ALT project aims to promote natural language processing techniques on low-resource Asian languages, through the open collaboration with institutes and universities in ASEAN. The ALT is a parallel treebank for English, Japanese, and official languages in ASEAN, including Burmese (Myanmar), Khmer (Cambodian), Laotian, Malay, Tagalog (Filipino), Thai, and Vietnamese. ALT includes word segmentation, part-of-speech tags, syntactic analysis annotations, together with word alignment between English and other languages on 20, 000 sentences.

Research keywords : Natural language processing, Asian and low-resource languages, Linguistically oriented approach