Posts

Showing posts from August, 2022

Can ‘Huge Amounts’ of Synthetic In-Domain Data Improve Machine Translation

Image
  With the many noteworthy advances in machine translation ( MT ) and natural language processing ( NLP ), it is no wonder that large and small-scale users alike now expect each new MT iteration to measurably outperform its predecessor. From a functional perspective, MT does get better and better — thanks in no small part to research and all the large datasets freely available for training equally large MT engines. However, domain-specific MT ( check out a recent example ) remains a work very much in progress. Researchers Yasmin Moslem, Rejwanul Haque, John D. Kelleher, and Andy Way from Adapt Center, Dublin City University, National College of Ireland, and Technological University Dublin set out to tackle this domain-specific problem with an experiment using three different setups. In a  paper  published in August 2022, this group of NLP specialists defined the problem as “in-domain data scarcity […] common in translation settings due to the lack of specialized datasets ...