Showing posts with label ai. Show all posts
Showing posts with label ai. Show all posts

Tuesday, March 11, 2025

New Research Explores How to Boost Large Language Models’ Multilingual Performance

In a February 20, 2025 paper, researchers Danni Liu and Jan Niehues from the Karlsruhe Institute of Technology proposed a way to improve how large language models (LLMs) perform across different languages.

New Research Explores How to Boost Large Language Models’ Multilingual Performance

They explained that LLMs like Llama 3 and Qwen 2.5, show strong performance in tasks like machine translation (MT) but often struggle with low-resource languages due to limited available data. Current fine-tuning processes do not effectively bridge the performance gaps across diverse languages, making it difficult for models to generalize effectively beyond high-resource settings.

The researchers focus on leveraging the middle layers of LLMs to enable better cross-lingual transfer across multiple tasks, including MT.

LLMs consist of multiple layers. The early (or bottom) layers handle basic patterns like individual words, while the final (or top) layers focus on producing a response. The middle layers play a key role in capturing the deeper meaning of sentences and how different words relate to each other.

Liu and Niehues found that these middle layers “exhibit the strongest potential for cross-lingual alignment,” meaning they help ensure that words and phrases with similar meanings are represented in a comparable way across languages. Strengthening this alignment helps the model transfer knowledge between languages more effectively.

By extracting embeddings (i.e., representations of text in vector form) from the model’s middle layers and adjusting them so that equivalent concepts are closer together across languages, the researchers aim to improve the model’s ability to understand and generate text in multiple languages.

Alternating Training Strategy

Rather than relying solely on task-specific fine-tuning, they introduce an “alternating training strategy” that switches between task-specific fine-tuning (e.g., for translation) and alignment training. Specifically, an additional step — middle-layer alignment — is integrated into the fine-tuning process to ensure that the representations learned in one language are more transferable to others.

Tests showed that this method improved both translation accuracy and performance across both high-resource and low-resource languages. Liu and Niehues noted that the models were also able to generalize their performance to languages not included in the initial alignment training.

One significant advantage of this method is its modular nature: “task-specific and alignment modules trained separately can be combined post-hoc to improve transfer performance” without requiring full model retraining. This makes it possible to improve existing models with enhanced multilingual capabilities while avoiding the high computational costs of retraining from scratch.

Additionally, this approach is faster and more cost-effective since “a few hundreds of parallel sentences as alignment data are sufficient.”

The researchers have made the code available on GitHub, allowing others to implement and test their approach.

Wednesday, March 5, 2025

CEOs React as Trump Declares English the Sole Official Language of the US

In response to President Trump’s executive order designating English as the official language of the US, SlatorPod gathered Dipak Patel, CEO of GLOBO, and Peter Argondizzo, CEO of Argo Translation, to discuss its implications for the US language industry.

The discussion highlighted that language access has long been a key part of US policy, particularly in healthcare, education, and legal services. Dipak pointed out that eliminating language services would create inefficiencies, making it harder for medical professionals to provide accurate care.

CEOs React as Trump Declares English the Sole Official Language of the US

Peter emphasized the broader uncertainty the order creates as many organizations rely on federal funding for language services, and a lack of clear guidance could lead to reduced support in schools, courts, and public services.

Both CEOs acknowledged that while this order presents challenges, the language services industry has historically adapted to change. Dipak suggested that financial pressures may push the industry to innovate, potentially accelerating AI adoption in interpreting.

CEOs React as Trump Declares English the Sole Official Language of the US

While the long-term impact remains unclear, the consensus is that language access will persist — driven by business needs and market demand.

Wednesday, January 8, 2025

Sony Aims to Improve AI Translation for Indian Language Entertainment Content

In an December 29, 2024 paper by Sony Research India researchers Pratik Rakesh Singh, Mohammadi Zaki, and Pankaj Wasnik comes a framework specifically designed to "improve entertainment content translations" in Indian languages.


They "believe it is the first of its kind," using an amalgamation of context awareness along with style adaptation to produce not only accurate translations but also entertaining for the targeted audience.

The researchers explained that traditional machine translation MT systems usually struggle to handle entertainment content because they mostly translate sentences in isolation. It leads to "disconnected" translations that can't really capture the emotional depth or cultural references behind the original dialogue. This has a particular pronounced effect in entertainment, where all these interconnected conversations and subtle cues in the narrative are so vital.

The challenge, in entertainment translation, lies in preserving the context, mood, and style of the original content while also including creativity and considerations of regional dialects, idioms, and other linguistic nuances," researchers explained.

To tackle this challenge, the researchers developed CASAT: the Context and Style Aware Translation, which combines the two concepts during the translation process.

The CASAT framework starts with segmenting the input text — like dialogues from movies or series — into smaller sections known as "sessions." Sessions are dialogues that are consistent in their genre or mood, such as comedy or drama. This segmentation allows CASAT to focus on the specific emotional and narrative elements of each session.

For every session, CASAT estimates two critical components: context and style. The former is said to be the narrative framework that wraps the dialogue, while the latter denotes the emotional tone and cultural nuances, like seriousness, excitement, or even humor. Understanding these, the framework will be able to make translations that effectively reach the deep recesses of the target audience's psyche.

To facilitate this, CASAT adopts a context retrieval module that gets relevant scenes or dialogues based on the relevant vector database retrieved, so this translation is grounded in appropriate narrative frameworks, and it applies a domain adaptation module to infer insights from sessions and sentences-based dialogues to realize the intended emotion tone and the intent.

Once the context and style are estimated, CASAT generates a customized prompt that is a combination of these elements. The customized prompt is then passed to an LLM that generates translations not only accurate but also carrying the intended emotional tone and cultural nuances of the original content.

Superior Performance

Metrics for CASAT's effectiveness, such as COMET scores and win ratios, have been used to test its performance. CASAT, on the other hand, surpassed baseline LLMs and MT systems like IndicTrans2 and NLLB, providing much better translations in terms of content and context.
"Our method exhibits superior performance by consistently incorporating plot and style information compared to directly prompting creativity in LLMs," the researchers said.

They found that context alone substantially improves translation quality, while including style alone has a minimal improvement. Combining the two improves quality the most.

The researchers noted that CASAT is language and model-agnostic. "Our method is both language and LLM-agnostic, making it a general-purpose tool," they concluded.

Sunday, December 22, 2024

The Year in Review and 2025 Predictions!

Hosted by Florian Faes and Esther Bond, with guest Anna Wyndham, in their SlatorPod year-end 2024 episode, key language industry trends over the course of the past year, including trends, drivers, and predictions, 2025, will be discussed.


First, language industry news of the week: LXT acquired clickworker with the goal of doubling revenues by 2025 by expanding its AI data capabilities. Esther also shares how EzDubs, a speech translation startup, raised USD 4.2m in seed funding.

Florian comments that RWS published revenues for 2024 that are stable with £180m from AI-powered products and services. Additionally, YouTube announced the rollout of AI dubbing, enabling content creators to reach new language-speaking audiences, but admitted limitations at this point, including poor voice quality.

https://youtu.be/CtrVDikK7lE

In their discussion, the trio talked about the UK House of Lords inquiry into court interpreting and translation, highlighting pay issues for interpreters, quality issues, and how AI is being deployed for quality assurance.

Reflecting on 2024, Anna outlines three major trends: speech-to-speech translation, "translation as a feature," where translation capabilities are integrated into everyday software like project management tools, and the evolution of localization roles toward AI-driven skills.

Looking forward, Anna foresees rapid adoption of AI by the public sector given the cost constraints and the need for scalability, whereas Florian envisions further breakthroughs in machine translation quality estimation and, possibly, IPOs in the language tech industry. Esther predicts higher levels of M&A activity in the industry, where niche providers seek stability and scalability in a competitive market.

Friday, December 20, 2024

Stoquart Buys Peer Belgian LSP ETC Europe

Stoquart, an language services provider based in Belgium, has acquired Brussels-based ETC Europe, which holds the status of being a translation agency accredited by the European Union and other governmental and international organizations.


The transaction was closed on 24 October 2024 after Stoquart's takeover of French competitor Version Internationale in 2023.

The founding managing director of Stoquart Translation Services, Dimitri Stoquart, found contact person ETC Europe General Manager Angelina Janssen due to meetings with the Belgian Association of Translation Companies or BQTA.

He stated that Janssen suggested Stoquart form a consortium with ETC Europe and another language service provider, VerbiVis, to respond to the European Commission's TRAD23 RFP. This resulted in Stoquart achieving second place for English-French translation.

In 2024, he mentioned that Janssen wanted to step back and suggested that Stoquart assume control of ETC Europe. Before the acquisition, shares of ETC Europe were divided among three shareholders; Stoquart has taken over all the shares.

"It was worth joining forces," Stoquart explained. "We have gained both institutional and private clients, along with an increasing number of multilingual projects."

In doing so, ETC Europe further creates new sources of income for Stoquart. The LSP, which now operates as ETC Europe or Stoquart, has recently entered three sizeable contracts with a number of Europe's biggest institutions.

This bodes well for Stoquart, which has faced an accumulated revenue decline of 30% in both 2023 and 2024.

"With this acquisition and the revenues from the European Parliament contract, we will be able to regain our 2022 revenue levels," Stoquart stated. 

Strong In-House Resources and Powerful Brands

Stoquart now has around 50 people working for her globally. Janssen will stay until the end of 2024 and will remain available as needed in the near future. (Besides nearly 30 in-house linguists, Stoquart engages between 150-180 freelancers monthly.)

Similar to Version Internationale, ETC Europe holds a strong reputation in the institutional sector. The company will retain its brand identity and limit integration with Stoquart to the essentials required for seamless operations, focusing primarily on activities in the LSP's main office.

Based on Stoquart's location, a big portion of its work is with all variants of French and Dutch, but the company also handles German, Italian, and Spanish. Stoquart now finds itself branching out into other European languages for institutional work, too.

Most clients are found in the US, Ireland, CzechiaSpainFrance, Belgium, the UKGermany, and Denmark. Stoquart said the LSP specializes in fields where human expertise is required, such as IT, financelegallife sciences, and the defense industry.

Stoquart's technology approach combines off-the-shelf tools, such as Studio and Phrase, and proprietary tools, including an app that allows users to access several machine translation engines. Stoquart is now expanding into additional European languages for institutional work as well.

Sunday, September 8, 2024

Highlights from SlatorCon Silicon Valley 2024

On September 5, 2024, more than 150 language industry and technology leaders gathered at Hotel Nia in Menlo Park, Silicon Valley.

The event offered a friendly and relaxed environment, encouraging networking and reconnections among participants. Attendees from over a dozen countries and four continents emphasized the importance of in-person Slator events in addition to virtual ones. The expo hall was also buzzing with activity.

Esther Bond, Head of Advisory at Slator, kicked off the event with a warm welcome, outlining the day's presentations and panels, and encouraging delegates to network and engage with each other.

Key Takeaways from SlatorCon Silicon Valley 2024

Florian Faes, Managing Director of Slator, opened the sessions by presenting key insights from Slator’s latest research on the language industry's current state. He discussed practical applications of large language models (LLMs) in localization workflows and shared predictions for the next few years.

RWS took the stage for the first presentation, with Vasagi Kothandapani and Mark Lawyer discussing the diversification of services into AI solutions. They emphasized the role of content as a driving force for digital transformation, business innovation, enhancing customer experience, corporate growth, global engagement, and market evolution.

Key Takeaways from SlatorCon Silicon Valley 2024

The day's first panel, moderated by Esther Bond, focused on investment strategies.

Andrew Doane of K1 Investment Management and Aditya Govil from VSS Capital Partners explored the influence of AI on the language technology sector, with particular emphasis on the healthcare and B2B SaaS industries. They also discussed the role of private equity in the language tech space and shared insights on strategic considerations for investments and acquisitions.

Helena Batt, who oversees localization operations for the TED Conferences, took the podium next to provide unique insights on the organization’s implementation of AI dubbing for TED Talks. Among the technical challenges encountered, Batt mentioned preserving vocal characteristics and emotional nuance, and achieving seamless lip sync.

Betting on Technology

The Language AI Stack panel, moderated by Anna Wyndham, Slator's Head of Research, featured insights from Georg Ell of Phrase and Hameed Afssari of Uber. They discussed AI as a technology stack, focusing on the practical applications of large language models (LLMs) in localization, including machine translation (MT), workflow optimization, and managing linguistic assets.

A second technology panel, led by Florian Faes, explored the interpreting field. Oddmund Braaten from Interprefy, Fardad Zabetian from KUDO, and Jeremy Woan from CyraCom International shared their perspectives on how automation transforms interpreting services.

Another panel, moderated by Alex Edwards, Slator Senior Research Analyst, offered insights on localization systems integration, global 24/7 services, and enterprise program management. Panelists included Pavel Soukenik from Acolad, Nitin Singhal from SnapLogic, and Agustín Da Fieno Delucchi from Microsoft.

Silvio Picinini from eBay Localization delivered a thought-provoking presentation, exploring two scenarios: applying AI to existing localization processes or reimagining those processes entirely, and the potential outcomes of each approach.

Florian Faes concluded the event with closing remarks, inviting attendees to join SlatorCon Remote in November 2024 or meet in person again at SlatorCon London in 2025. More detailed follow-up coverage is forthcoming.

Thursday, June 13, 2024

Humanless LSP as a Fun Weekend Project

Florian and Esther discuss the language industry news of the week, giving a recap of SlatorCon London and exploring some use cases from the Slator Pro Guide: Language AI for Consumers.

Florian talks about Andrew Ng’s recent project on agentic machine translation, which involves using large language models (LLMs) to create a virtual language service provider (LSP).

The duo touched on Apple’s recent Worldwide Developer Conference, where Apple Watch is set to get a translation widget and also recently announced a new translation API.


Florian shares RWS’s half-year financial results, where despite declines in revenue, the company’s stock rose by 20%, likely due to investor perception of AI-enabled services and new product offerings like Evolve and HAI gaining traction.


Esther talks about DeepL’s USD 300m funding round, which valued the company at USD 2bn, a testament to the growing interest in AI models. She also covers Unbabel’s launch of TowerLLM, which claims to outperform competitors like Google Translate and DeepL.

In Esther’s M&A corner, Keywords Studios eyes a GBP 2.2bn deal from Swedish private equity firm EQT, Melbourne LSP Ethnolink buys Sydney-based competitor Language Professionals, and ZOO Digital acquires Italian dubbing partner LogoSound.

Esther gives a nod to the positive financial performances of companies like ZOO Digital and AMN’s language services division, with more mixed results for Straker.


Sunday, June 9, 2024

Here’s a New Dataset for Emotion-Aware Speech Translation

Imagine a world where translations don't just convert words but also capture the emotions behind them. This is the promise of MELD-ST, a new dataset introduced in May 2024 by researchers from the Technical University of Munich, Kyoto University, SenseTime, and Japan's National Institute of Informatics. This dataset is designed to revolutionize speech translation by ensuring that emotional context is preserved, enhancing both speech-to-text (S2TT) and speech-to-speech translation (S2ST) systems.

Background

Emotion plays a critical role in human conversation, yet most translation systems struggle to accurately convey the emotional tone of the original speech. While text-to-text translation (T2TT) has seen some progress in emotion-aware translation, speech translation remains a largely uncharted territory. The introduction of MELD-ST aims to fill this gap.

The Creation of MELD-ST

MELD-ST builds upon the existing Multimodal EmotionLines Dataset (MELD), which features dialogues rich in emotional content. By adding corresponding speech data from the TV series "Friends," MELD-ST offers audio and subtitles in English-to-Japanese and English-to-German language pairs. This dataset includes 10,000 utterances, each annotated with emotion labels, making it a valuable resource for studying emotion-aware translation.

Features of MELD-ST

What sets MELD-ST apart is its inclusion of emotion labels for each utterance, allowing researchers to conduct detailed experiments and analyses. The dataset features acted speech in an emotionally rich environment, providing a unique resource for initial studies on emotion-aware speech translation.

The Significance of Emotion in Translation

Consider the phrase "Oh my God!" Its translation can vary significantly based on the emotional context—surprise, shock, excitement. Accurately translating such phrases requires an understanding of the underlying emotions to ensure the intended intensity and sentiment are preserved, which can differ across cultures.

Technical Details of MELD-ST

MELD-ST comprises audio and subtitle data with English-to-Japanese and English-to-German translations. Each utterance is annotated with emotion labels, enabling researchers to explore the impact of emotional context on translation performance.

Research Methodology

The researchers tested MELD-ST using the SEAMLESSM4T model under various conditions: without fine-tuning, fine-tuning without emotion labels, and fine-tuning with emotion labels. Performance was evaluated using BLEURT scores for S2TT and ASR-BLEU for S2ST, along with metrics such as prosody, voice similarity, pauses, and speech rate.

Findings on S2TT

Incorporating emotion labels led to slight improvements in S2TT tasks. The researchers observed that fine-tuning the model improved the quality of translations, with BLEURT scores indicating better alignment with the emotional context of the original speech.

Findings on S2ST

However, for S2ST tasks, fine-tuning with emotion labels did not significantly enhance results. While fine-tuning improved ASR-BLEU scores, the addition of emotion labels did not yield notable benefits. This highlights the complexity of accurately conveying emotions in speech translations.

Challenges and Limitations

The study faced several limitations. The use of acted speech, while useful, may not fully represent natural conversational nuances. Additionally, the dataset's focus on a specific TV series limits the diversity of speech contexts. Future research should address these limitations and explore more natural speech settings.

Future Directions

To advance emotion-aware translation, researchers propose several strategies. These include training multitask models that integrate speech emotion recognition with translation, leveraging dialogue context for improved performance, and refining datasets to encompass more varied and natural speech environments.

Access and Availability

MELD-ST is available on Hugging Face and is intended for research purposes only. Researchers and developers can utilize this dataset to explore and enhance emotion-aware translation systems.

Conclusion

MELD-ST represents a significant step forward in the field of speech translation, offering a valuable resource for incorporating emotional context into translations. While initial results are promising, continued research and development are essential to fully realize the potential of emotion-aware translation systems.


Language Discordance Raises Risk of Hospital Readmissions, U.S. Study Finds

  A June 2024 meta-analysis published in   BMJ Quality & Safety   was recently brought back into the spotlight by Dr. Lucy Shi, who disc...