Showing posts with label Slator2025. Show all posts
Showing posts with label Slator2025. Show all posts

Wednesday, February 12, 2025

Researchers Present DOLFIN, a New Test Set for AI Translation for Financial Content

On February 5, 2025, a team of researchers from Grenoble Alpes University and Lingua Custodia, a France-based company specializing in AI and natural language processing (NLP) for the finance sector, introduced DOLFIN, a new test set designed to evaluate document-level machine translation (MT) in the financial domain.


The researchers say that the financial domain presents unique challenges for MT due to its reliance on precise terminology and strict formatting rules. They describe it as “an interesting use-case for MT” since key terms often shift meaning depending on context.

For example, the French word couverture means blanket in a general setting but hedge in financial texts. Such nuances are difficult to capture without larger translation units.

Despite strong research interest in document-level MT, specialized test sets remain scarce, the researchers note. Most datasets focus on general topics rather than domains such as legal and financial translation.

Given that many financial documents “contain an explicit definition of terms used for the mentioned entities that must be respected throughout the document,” they argue that document-level evaluation is essential. 

DOLFIN allows researchers to assess how well MT models translate longer texts while maintaining context. 

Unlike traditional test sets that rely on sentence-level alignment, DOLFIN structures data into aligned sections, enabling the evaluation of broader linguistic challenges, such as information reorganization, terminology consistency, and formatting accuracy.

Context-Sensitive

To build the dataset, they sourced parallel documents from Fundinfo, a provider of investment fund data, and extracted and aligned financial sections rather than individual sentences. The dataset covers English-French, English-German, English-Spanish, English-Italian, and French-Spanish, with an average of 1,950 segments per language pair. 

The goal, according to the researchers, was to develop “a test set rich in context-sensitive phenomena to challenge MT models.”

To assess the usefulness of DOLFIN, the researchers evaluated large language models (LLMs) including GPT-4o, Llama-3-70b, and their smaller counterparts. They tested these models in two settings: translating sentence by sentence versus translating full document sections. 

They found that DOLFIN effectively distinguishes between context-aware and context-agnostic models, while also exposing model weaknesses in financial translation.

Larger models benefited from more context, producing more accurate and consistent translations, while smaller models often struggled. “For some segments, the generation enters a downhill, and with every token, the model’s predictions get worse,” the researchers observed, describing how smaller LLMs failed to maintain coherence over longer passages.

DOLFIN also reveals persistent weaknesses in financial MT, particularly in formatting and terminology consistency. Many models failed to properly localize currency formats, defaulting to English-style notation instead of adapting to European conventions.

The dataset is publicly available on Hugging Face.

Authors: Mariam Nakhlé, Marco Dinarelli, Raheel Qader, Emmanuelle Esperança-Rodier, and Hervé Blanchon

Monday, February 10, 2025

Off-Screen Drama Pits AI Dubbing Against French Voice Actors

How do US actor Sylvester Stallone, France’s minister for gender equality Aurore Bergé, and a multilingual/multibillion voice AI company collide in a tense drama? Since early January 2024, multiple online media sources have highlighted a clash that began with news that “Armor,” a film starring Stallone, would feature AI dubbing.

For 50 years, Alain Dorval was the familiar voice of Stallone in French-dubbed films, but he passed away in February 2024. Minister Bergé happens to be Dorval’s daughter. Enter ElevenLabs, which in January 2024 reached a USD 3bn valuation, and found itself at the center of a weeks-long controversy over the cloning of Dorval’s voice.

Bergé publicly opposed (article in French) the use of her father’s digitally recreated voice, despite acknowledging a prior agreement to a test. “It was just a trial run, with an agreement strictly guaranteeing that my mother and I would have final approval before any use or publication. And that nothing could be done without our consent.”

According to Variety, which has followed the story since the partnership around “Armor” between Lumiere Ventures and ElevenLabs came to light, Bergé’s move galvanized the French actors’ guild (FIA, in French). 

FIA’s representative, Jimmy Shuman, called the voice cloning attempt a “provocation” in the Variety article. That is because the union is in the midst of “negotiating agreements on limits for artificial intelligence and dubbing.”

The controversy over Stallone’s French voice underscores the potential for AI to displace voice actors, often celebrities in their own right across Europe.

ElevenLabs CEO, Mati Staniszewski, told Variety that “Recreating Alain Dorval’s voice is a chance to show how technology can honor tradition while creating new possibilities in film production.” 

Like their US counterparts after a few notable actions, voice-over artists in several European countries are taking a proactive stance through their unions, including AI clauses in their contracts to restrict AI voice use to specific projects or outright banning work for studios that do not offer adequate protections.

Per the latest Variety article on the subject, voice actor Michel Vigné will be the voice of Stallone for the French release. According to IMDB, Vigné has already voiced Stallone in French in the past.

The larger issue remains: the film industry acknowledges that AI voice cloning technology is rapidly advancing and the drama around Armor’s French dubs serves as a symbol of things to come in Europe and beyond.

One decision that perhaps many voice actors will need to grapple with is whether they want their voice to be immortalized with AI or simply be replaced by it or by another actor.

Trump Makes English the Only Official Language of the US, Revokes Clinton Language Access Order

A new Executive Order published on March 1, 2025, by the US White House designates English as the only official language of the United Stat...