Showing posts with label machinetranslation. Show all posts
Showing posts with label machinetranslation. Show all posts

Friday, March 21, 2025

SlatorCon Remote March 2025 Offers Essential Insights on the Language Industry and AI

 A Pinch, a Twitch, and Everything in Between: Pinch’s Christian Safka and Twitch’s Susan Maria Howard were among the top language industry leaders who joined hundreds of attendees on March 18, 2025, for the first SlatorCon Remote conference of the year.

Kicking off the day’s events, Slator’s Head of Advisory, Esther Bond, welcomed attendees and invited Managing Director Florian Faes to share the latest findings and insights in his highly anticipated 'industry health check.

In his presentation, Faes began by reflecting on the challenges of 2024. He discussed data from Slator’s 2025 Language Service Provider Index (LSPI) and highlighted the growth of interpreting-focused companies, contrasted with the struggles faced by small, undifferentiated agencies and the rapid rise of language AI, driven by companies like ElevenLabs and DeepL.

Faes also highlighted key findings from Slator’s 2025 Localization Buyer Survey, including the challenges buyers face in implementing AI and the growing need for AI partners to address inefficiencies. He also noted the mixed outlook for the industry in the year ahead.

LLMs Are Just the Beginning

The first expert presentation was delivered by Sara Papi, a Postdoctoral Researcher at the Fondazione Bruno Kessler, who discussed the current state of research in simultaneous speech-to-text translation.

Papi highlighted discrepancies between the original definition and current practices in the speech translation field, identified through a review of expert literature. She specifically pointed out issues related to the use of pre-segmented speech and inconsistencies in terminology.

Slator’s Head of Research, Anna Wyndham, moderated the first panel of the day, featuring Simone Bohnenberger-Rich, Chief Product Officer at Phrase; Simon Koranter, Head of Global Production & Engineering at Compass Languages; and Matteo Nonne, Localization Program Manager at On.

The panelists discussed the evolving role of generative AI in localization, highlighting its shift from initial experimentation to scalable solutions that drive growth. They shared insights on how AI is transforming localization from a cost center into a strategic function by enabling customized, context-aware content adaptation and addressing challenges related to return on investment (ROI) and stakeholder expectations.

Slator’s Alex Edwards, Senior Research Analyst, moderated another panel discussion focused on the adoption of large language models (LLMs) for AI translation in enterprise workflows. Panelists Manuel Herranz, CEO of Pangeanic, and Bruno Bitter, CEO of Blackbird.io, explored whether LLMs truly represent the state of the art.

Herranz and Bitter emphasized that middleware and techniques like Retrieval-Augmented Generation (RAG) are more advanced, and highlighted the importance of fine-tuning smaller, domain-specific models. They also discussed the role of orchestration technology in effectively managing a range of AI tools.

In his presentation, Supertext’s CEO Samuel Läubli echoed insights shared by other speakers, emphasizing that LLMs generate fluent texts by considering broader context. He explored the implications of an AI-first era for translation, the rise of smaller competitive players, and the continued importance of human expertise.

Läubli highlighted that the new Supertext resulted from a 2024 merger between LSP Supertext and AI translation company Textshuttle. He remarked, “I’ve been working in this field for 10 years now, but I haven’t seen a system or AI agent that can guarantee a correct translation — and I’m quite sure I won’t see it in the next 10 years.”

Teresa Toronjo, Localization Manager at Malt, discussed collaboration within leaner localization teams, stressing the importance of diverse partnerships, scalable processes, and maintaining quality consistency with cost-effectiveness guided by experts.

If you missed SlatorCon Remote March 2025 in real-time, recordings will be available soon through our Pro and Enterprise plans.

Tuesday, March 11, 2025

New Research Explores How to Boost Large Language Models’ Multilingual Performance

In a February 20, 2025 paper, researchers Danni Liu and Jan Niehues from the Karlsruhe Institute of Technology proposed a way to improve how large language models (LLMs) perform across different languages.

New Research Explores How to Boost Large Language Models’ Multilingual Performance

They explained that LLMs like Llama 3 and Qwen 2.5, show strong performance in tasks like machine translation (MT) but often struggle with low-resource languages due to limited available data. Current fine-tuning processes do not effectively bridge the performance gaps across diverse languages, making it difficult for models to generalize effectively beyond high-resource settings.

The researchers focus on leveraging the middle layers of LLMs to enable better cross-lingual transfer across multiple tasks, including MT.

LLMs consist of multiple layers. The early (or bottom) layers handle basic patterns like individual words, while the final (or top) layers focus on producing a response. The middle layers play a key role in capturing the deeper meaning of sentences and how different words relate to each other.

Liu and Niehues found that these middle layers “exhibit the strongest potential for cross-lingual alignment,” meaning they help ensure that words and phrases with similar meanings are represented in a comparable way across languages. Strengthening this alignment helps the model transfer knowledge between languages more effectively.

By extracting embeddings (i.e., representations of text in vector form) from the model’s middle layers and adjusting them so that equivalent concepts are closer together across languages, the researchers aim to improve the model’s ability to understand and generate text in multiple languages.

Alternating Training Strategy

Rather than relying solely on task-specific fine-tuning, they introduce an “alternating training strategy” that switches between task-specific fine-tuning (e.g., for translation) and alignment training. Specifically, an additional step — middle-layer alignment — is integrated into the fine-tuning process to ensure that the representations learned in one language are more transferable to others.

Tests showed that this method improved both translation accuracy and performance across both high-resource and low-resource languages. Liu and Niehues noted that the models were also able to generalize their performance to languages not included in the initial alignment training.

One significant advantage of this method is its modular nature: “task-specific and alignment modules trained separately can be combined post-hoc to improve transfer performance” without requiring full model retraining. This makes it possible to improve existing models with enhanced multilingual capabilities while avoiding the high computational costs of retraining from scratch.

Additionally, this approach is faster and more cost-effective since “a few hundreds of parallel sentences as alignment data are sufficient.”

The researchers have made the code available on GitHub, allowing others to implement and test their approach.

Monday, December 30, 2024

The Most Popular Language Industry Stories of 2024

As 2024 comes to a close, it is time to reflect on the most popular stories, trends, innovations, and themes that made the Slator headlines throughout the year, highlighting key developments in the language industry.

Here is a selection of stories that attracted the most attention and engagement from our readers around the world.


Will Large Language Models Edge Linguists Out of the Language Industry?

One of Slator’s most-read stories in 2024 detailed a May 2024 paper from the University of Zurich and Georgetown University that explored the role of linguists in the evolving field of machine translation (MT). The entrance of large language models (LLMs) has reduced the reliance on linguists for grammar and semantic coherence while designing a system. 

However, the authors concluded, there are a number of points in the process where linguistic expertise is still essential. These include building parallel corpora for MT; developing technology for low-resource languages; and identifying linguistic phenomena that may present challenges for a system. Linguists can be especially helpful as humans and machines interface, for example, by designing effective human evaluations and reliably assessing advancements in the field.

Google Translate Ditches Tool for Detailed Human Feedback

Google retired its longstanding human feedback tool, Contribute, which allowed users to press a button and submit an alternative translation. 

Slator reported in April 2024, Google’s announcement, in which the company acknowledged Contribute’s role in improving Google Translate, explained that since launching the tool in 2014, “our systems have significantly evolved, allowing us to phase out Contribute.” 

Users can, however, still submit feedback by rating a given translation “good” or “poor,” and, for the latter, selecting a reason from a drop-down menu — a less involved process that speakers of low-resource languages worry might halt improvement of MT for their languages. 

Live Speech-to-Speech AI Translation Goes Commercial

Just one month into 2024, an increasing number of language AI researchers — from academia to private companies — had already begun to focus on live speech-to-speech translation (S2ST). 

This only accelerated the adoption of live S2ST across multiple commercial applications thanks to LLMs, which kicked off in mid-2023, with models such as Meta’s SeamlessM4T and Google’s AudioPaLM.

Slator’s rundown of real-world use cases included business meetings, where Microsoft Translator, integrated with the Teams meeting app, provides real-time speech translation in more than 30 languages through Azure AI services. KUDO and Interprefy specialize in real-time AI speech translation for live events and conferences.

Even the high-stakes world of healthcare presents an opportunity for expansion, especially for providers already offering voice technology for healthcare clients. Orion Labs, for instance, offers live speech translation via its Push-to-Talk 2.0 platform. 

Introducing Revamped New Translation Quality ISO Standard 5060

Published in February 2024, ISO 5060 applies not only to language services providers (LSPs), but also to in-house translation departments and individual translators. While it specifically provides guidance for human evaluation of translation output, it can be used for workflows involving human and machine translation, with or without subsequent post-editing. 

The International Organization for Standardization (ISO) established a framework based on “bilingual examination of target language content against source language content,” with the goal of standardizing evaluations so they do not differ significantly from rater to rater. 

There are seven main categories of errors, which can be classified as critical, major, or minor: terminology, accuracy, linguistic conventions, style, locale conventions, audience appropriateness, and design and markup. 

Translation AI Agency Lengoo Files for Bankruptcy

In March 2024, Lengoo filed for bankruptcy in a Berlin court, with German news sources pegging Lengoo’s accumulated losses between USD 8-16m.

Christopher Kränzler, Alexander Gigga, and Philipp Koch-Büttner founded Lengoo in 2014, originally as an online platform for automating project management and administrative tasks. 

Starting in 2018, investors such as RedalpineCreathor Ventures, Piton Capital, Inkef Capital, Techstars, and Polipo Ventures expressed confidence in Lengoo’s developing proprietary translation system, with Lengoo raising USD 34m by February 2021 — making it a long way for the LSP to fall.

Amazon Flags Risks of Training LLMs on Web-Scraped MT 

Training LLMs at scale relies on massive amounts of training data scraped from the web. A January 2024 research paper from Amazon investigating the prevalence and quality of MT on the web found that a “shocking amount of the web is machine translated” into many languages. 

And oftentimes, that MT output is low-quality, raising concerns about the quality of training data for LLMs. Researchers also noted a selection bias toward “shorter and more predictable sentences,” potentially from low-quality English content machine translated into many lower-resource languages.

The pervasiveness of low-quality MT in training data, the authors warned, could lead to less fluent models with more hallucinations, particularly for low-resource languages. 

Translators by Any Other Name

Slator’s January 2024 roundup of five polls from 2023 was crowned by the most voted-on — and perhaps introspective — question: Will the term “translator” disappear in the next five years? Close to half of respondents said no, with just over 30% saying it will “definitely” or “possibly” disappear in that time period. 

Inspired by a SlatorPod interview with ASAP-translation.com CEO Jakub Absolon, another poll asked whether readers agreed with Absolon, who suggested the term “full post-editing” should not be used, and should be priced as human translation.

More than 65% of readers agree that the term should not be used, while 18.7% want to keep using it. The remaining 16% are happy to use whatever term the client prefers. 

Other polls touched on inflation, with nearly half of respondents reporting flat rates; ChatGPT, which 80% of readers reporting they do not use it for translation; and the beloved Microsoft Language Portal, used “often” by 46.5% of respondents. 

Real-Time Speech Translation Stars in Biggest OpenAI Release Since ChatGPT

OpenAI has not slowed down since being credited with unleashing accessible AI to the masses. The company’s May 2024 release of GPT-4o offered a range of new or improved capabilities. The single new model was trained end-to-end across text, vision, and audio, with all inputs processed by the same neural network, reportedly with enhanced performance in around 50 languages. 

A demo of GPT-4o featured a brief conversation with OpenAI CTO Mira Murati asking the system a question in Italian, to which GPT-4o responds in English. Cue the hot takes of ‘RIP translators’ and shares in language learning resource Duolingo dipping 3%. OpenAI planned to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in an API within a few weeks.

EU Parliament Issues a New 2024 Call for Tenders for Translation Services

February 2024 notice posted for translation services would cover translation of single and multiple source language documents in 24 languages for four European institutions: the European Parliament’s Directorate-General for Translation; the European Court of Auditors; the Committee of the Regions of the European Union; and the European Economic and Social Committee. 

While the notice did not mention MT, it did specify output metrics for source and target languages, and contracts — with one lot per language, assigned to a primary contractor and up to four secondary contractors — are estimated to last up to 60 months. No specific budget was listed. Once awarded, the contract will become effective January 1, 2025.

Bankrupt Dutch LSP, WCS Group, Quickly Bought by France’s Powerling 

In a provisional January 2024 ruling, a Dutch judge suspended payments by LSP WCS Group to its creditors, appointing an administrator to negotiate until a later hearing a few months later. Of 14 companies under the WCS Group, only one was listed as in “suspension of payment” status; all others are listed under “bankruptcy” status. At the time, WCS Group’s website listed 3,247 active freelancers, whose next steps were unclear. 

Just a few days later, French LSP Powerling acquired WCS Group for an undisclosed amount. Powerling, which already had a presence in the Netherlands — plus France, Hong Kong, and the US — said the move was in line with the company’s goal of clearing EUR 25m in revenues by the end of 2024 through acquisitions in Powerling’s main markets.

Thursday, December 26, 2024

Does the Machine Translation Post-Editing Activity Require a Lot of Time and Effort?

For the language industry, the year 2024 will go down as a year that had multiple developments and innovations at a fast pace, but this growth came with some distinct trends on the technological front that included translation feature as a service (TaaF), the emergence of multimodal AI, and retrieval augmented generation (RAG) and the use of large language models (LLM) enabled applications. 

The integration of AI tools and human skill was in the central place in the deliberations of the industry specialists even as the different size companies had their perspectives. The responses of the readers and viewers as revealed in the weekly Slator polls are snapshots of the sentiments, preferences and scopes across the industry. 

1. Is it Time for Language Service Providers to Change Their Mindset? 

The language service sector has survived difficult times in the past but it was not business as usual for an industry that started 2024 on the wrong foot as reports of some firms filing for bankruptcy around that time surfaced. This is a pointer that the adequate provision of funding and accessing the latest generation of ai tools does guarantee permanence in the business. The German AI company Lengoo, for example, went bankrupt in March 2024 after the WCS Group of Holland went bankrupt in December 2023 (which was purchased by Powerling later).

In the poll carried out by Slator Weekly, a figure of 52.1% of the respondents stated that they believe that there will be more bankruptcies of LSPs, while only 3.4% feel that this is not likely to happen. During the poll, more than half of the respondents in this particular poll held the opinion that there would be an inevitable increase in the number of LSP bankruptcies in 2024. Respondents who expressed the feeling that recruitments would also be impressive but really insufficient were almost close to 31.1%, while 13.4% stated that it was barely possible. 

2. What is Post-editing of Machine Translation in the Market? 

The phrase-matching interpretation of translation, however, is somewhat reduced in importance as MT broadens into more creative areas such as literature and marketing - It stands to reason that the volume of MTPE activities has now eclipsed human translation editing in volume. This statement, made by the Société française des traducteurs (SFT), was not more than a few hours old and was retracted immediately, but not before it acknowledged that 70% of its members regard post editing as unnecessary owing to somewhat the low pay given to the job and the boring nature of the work. 

LawBuilder.ai supports this claim; in the polls of July, 2024, for instance, 61.2% of the participants affirmed that most of the time post editing is just boring and dull work - 23.5% said they do so “once in a while” when they feel it is needed while 10.2% said anything related to MTPE was annoying because the tools were not up to their standard with 5.1% showing even some iota of interest in carrying out the work.

 

3. Would Shakespeare Grudge the Other Bard’s Translations?

 William Shakespeare’s works reflect British culture, and thus his works have not been translated during his lifetime, however, Gemini which was initially called Google Bard is capable of translating all of Shakespeare's works into several foreign languages within minutes. Although in the research of Burns and Swerve translation conducted in the year of 2021 scholars showed a bias in preferring human literation translation, the progress in LLMs technology in January 2024 forecasts a competency switch in different translation techniques and style.

 It is a fact that AI translation is a threat of entering the creative fields of writers and the social media and publishers have begun to speak about its use and even promote it. However, readers were split on their expectations regarding the pace at which AI translation may be effectively deployed in creative works within a period of 2-3 years. To the question of whether MT use will become widespread in literary translation in 2-3 years, about two-thirds (31.9%) of the respondents viewed it as improbable. Around one-fifth (20.2%) held the contrary view leaving the rest to be evenly divided across likely (17.0%), uncertain (16.0%), and unlikely (14.9%). 

4. Is Translation Quality Evaluation a Solved Problem?

Despite the advancement in MT as well as the development and availability of automated metrics like BLEU and COMET, human evaluation is essential to determine the quality of translations. The new ISO 5060 Standard fulfils this need by indicating how translation output may be evaluated by humans regardless of its source.

The standard consists of seven quality categories including terminology and style, while error severity is also allocated to the mistakes. As much as ISO 5060 focuses on the harmonization of approaches toward evaluation, as of February 2024, only 6.8% of polled believe that this is a solved issue, with 72.7% believing the area needs further research and 20.5% who believe the language and the type of text in the translation process.

5. Has ChatGPT Changed Google Search Behavior?

OpenAI launched the prototype of SearchGPT in July 2024, offering a direct AI-powered "answer experience" instead of traditional search results. It, however, raises several questions on the accuracy of answers and self-referencing as AI-generated content becomes a part of search results.

Has this new release changed how Slator readers search? According to our poll, nearly two-thirds of respondents (65.7%) primarily use the Google search engine. Less than a quarter (22.9%) of readers reported using Google Search a bit, and ChatGPT more often. The rest (11.4%) said they mostly use ChatGPT for questions.

6. Has AI (LLMs, etc.) altered your work life over the last 24 months?

In episode #221 of SlatorPod, Spence Green, Lilt CEO, discussed how the need for AI in localization has become increasingly imperative. This involves the potential of custom LLMs, RAG, and AI orchestration to automate tasks, customize content, process huge volumes of data, and increase ROI.

As companies like Reddit truly and materially demonstrate their faith in AI localization, the general adoption and adaptation of the language industry are challenged. According to a Slator poll, more than one in three localization professionals (37.5%) have not introduced AI into their daily work practices yet, while more than one in five, namely 23.6%, have all-in. Two equal cohorts – at 19.4% each – use AI language tech "somewhat" or "a little," respectively. 

7. Will AI increase or decrease demand for language learning in the long term?

OpenAI introduced the multimodal GPT-4o model and demonstrated a live speech-to-speech translation (S2ST) demo with Italian and English. Reactions on X ran a storm, pronouncing an end to translators and language learning as we have known them. Social media jitters turned into quakes and caused some investors to sell shares of the language learning platform, Duolingo.

 

The responses of the readers to the question of whether AI would increase or decrease demand for language learning in the long term were rather split, with more than one-third (36.9%) predicting that demand will increase and another (35.4%) that it will decrease. A little over a quarter (27.7%) believe demand will stay the same.

8. What role would prepare you best to lead an LSP?

The world's largest LSP, TransPerfect, has been growing continuously on the basis of acquisitions and diverse offerings, including technology. But the real engine in the company is, of course, people such as Jin Lee, appointed as co-CEO in January 2024.

At that point, Lee was a 20-year veteran of the company, having joined as a project manager and having been Senior VP for Global Production before his co-CEO appointment. In this light, we asked readers what roles best prepare them for running an LSP, and the largest cohort (46.0%) believes it is project management. Other readers selected sales and language experts (14.3% each), and finance/admin and language ops (11.1% each).

9. Are you experiencing a summer slowdown in business?

There was no traditional summer slowdown for the language industry in 2024 — at least, not in the northern hemisphere. July alone was busy with significant investment activity, from early-stage funding to major acquisitions.

 

Capital kept pouring into some sectors, including AI dubbing and captioning, plus language tech. Yet, when we asked readers if they were seeing a summer slowdown, 40.3% said they were definitely experiencing a cooling period. For the rest, the summer was either fairly stable (33.9%) or busier than ever (25.8%).

 10. How has your business year gone so far?

News of bankruptcies hit the language industry at the start of 2024 and may even have shocked some Slator readers into action to avoid a similar fate. The Language Service Provider Index (LSPI) showed indications of stability for some companies and actual growth for the Super Agencies.

Although the LSPI only includes about 300 companies which volunteer their data for the survey, the February 2024 edition seemed to foreshadow the mixed bag LSPs experienced for the balance of 2024. While some companies were indeed pushed out of business, acceleration in M&A was also clear. Readers self-reported that, as of February 2024, business had thus far been great (27.6%) or good (25.9%), flat (20.7%), not great (15.5%), or bad (10.3%).

Language Discordance Raises Risk of Hospital Readmissions, U.S. Study Finds

  A June 2024 meta-analysis published in   BMJ Quality & Safety   was recently brought back into the spotlight by Dr. Lucy Shi, who disc...