Thursday, January 9, 2025

US Government RFP Seeks Translation Into Four Native American Languages

The United States government has issued an unusual RFP for translation services: The target languages are all indigenous to the US.


The contracting agency is the Office of Indian Economic Development (OIED), which falls under the Bureau of Indian Affairs that governs programs concerning federally recognized American Indian Tribes. OIED has allied with the Department of Agriculture, or USDA, in this contract. This will provide a means whereby diverse agencies can request translation into Native languages.

The RFP features a set-aside for Indian Small Business Economic Enterprises, meaning that only companies meeting certain revenue and ownership requirements may apply. OIED would prefer to award a single contractor work for all four languages.

"This is a one-year project that will respond to federal agency requests for ongoing and diverse Native Language translation that will be specific to the federal agency needs," the RFP states, noting the contract may be extended more than once, but only for an additional period of up to six months. Work covered under the contract is between January 20, 2025 – January 19, 2026.

The ultimate goal is to make available the range of content from official documents, and signage, to Web sites of the "widest possible audience of the Tribal Nations."

There are 574 federally recognized Tribal Nations. Of those, 229 are located in the state of Alaska. The other 345 Tribal Nations are spread across 35 other states.

This would, in turn deal with "more prevalent native languages", most likely the ones which are spoken more frequently.

Stats and Translation Requests

The four target languages are Yup’ik (Central dialect), Cherokee (Western dialect), Ojibwe (Western dialect), and Navajo. The contract estimates that each language will require 610 hours of translation — a somewhat uncommon way of pricing translation — for a total of 2,440 hours.

According to the American Community Survey for 2009-2013, Navajo is the most-spoken indigenous language in the US, with nearly 167,000 speakers, 35,250 of whom self-report as speaking English less than very well. The latter would be considered individuals with limited English proficiency (LEP). 

The other three languages have fewer speakers overall, and fewer individuals with LEP, including about 6,000 speakers of the Alaska Native language Yup’ik; 1,460 speakers of Cherokee; and 1,100 speakers of Ojibwe. 

With relatively small populations of people with LEP, the impetus for the RFP goes beyond numbers.

Indeed, the outgoing Biden-Harris Administration issued on December 9, 2024 a “10-year National Plan on Native Language Revitalization,” described as charting “a path to help address the United States government’s role in the loss of Native languages across the continental United States, Alaska, and Hawai’i.”

Some Tribal Nations have resources to handle (certain) translations on their own. The Cherokee Nation Translation Department, for instance, offers free translations for nonprofit uses related to education, health, and legal services. But there are limits. 

“Due to the large volume of requests, Cherokee Nation Translation does not accept unsolicited documents such as poetry, scripts, screenplays, and book manuscripts for translation,” its website states. Nor does it translate tattoos or “names in Cherokee for children, family members, [or] pets”. 

For up-to-date information about language services and technology tenders, subscribe to our Growth, Pro, or Enterprise plan and get access to the RFP Center.


Wednesday, January 8, 2025

Sony Aims to Improve AI Translation for Indian Language Entertainment Content

In an December 29, 2024 paper by Sony Research India researchers Pratik Rakesh Singh, Mohammadi Zaki, and Pankaj Wasnik comes a framework specifically designed to "improve entertainment content translations" in Indian languages.


They "believe it is the first of its kind," using an amalgamation of context awareness along with style adaptation to produce not only accurate translations but also entertaining for the targeted audience.

The researchers explained that traditional machine translation MT systems usually struggle to handle entertainment content because they mostly translate sentences in isolation. It leads to "disconnected" translations that can't really capture the emotional depth or cultural references behind the original dialogue. This has a particular pronounced effect in entertainment, where all these interconnected conversations and subtle cues in the narrative are so vital.

The challenge, in entertainment translation, lies in preserving the context, mood, and style of the original content while also including creativity and considerations of regional dialects, idioms, and other linguistic nuances," researchers explained.

To tackle this challenge, the researchers developed CASAT: the Context and Style Aware Translation, which combines the two concepts during the translation process.

The CASAT framework starts with segmenting the input text — like dialogues from movies or series — into smaller sections known as "sessions." Sessions are dialogues that are consistent in their genre or mood, such as comedy or drama. This segmentation allows CASAT to focus on the specific emotional and narrative elements of each session.

For every session, CASAT estimates two critical components: context and style. The former is said to be the narrative framework that wraps the dialogue, while the latter denotes the emotional tone and cultural nuances, like seriousness, excitement, or even humor. Understanding these, the framework will be able to make translations that effectively reach the deep recesses of the target audience's psyche.

To facilitate this, CASAT adopts a context retrieval module that gets relevant scenes or dialogues based on the relevant vector database retrieved, so this translation is grounded in appropriate narrative frameworks, and it applies a domain adaptation module to infer insights from sessions and sentences-based dialogues to realize the intended emotion tone and the intent.

Once the context and style are estimated, CASAT generates a customized prompt that is a combination of these elements. The customized prompt is then passed to an LLM that generates translations not only accurate but also carrying the intended emotional tone and cultural nuances of the original content.

Superior Performance

Metrics for CASAT's effectiveness, such as COMET scores and win ratios, have been used to test its performance. CASAT, on the other hand, surpassed baseline LLMs and MT systems like IndicTrans2 and NLLB, providing much better translations in terms of content and context.
"Our method exhibits superior performance by consistently incorporating plot and style information compared to directly prompting creativity in LLMs," the researchers said.

They found that context alone substantially improves translation quality, while including style alone has a minimal improvement. Combining the two improves quality the most.

The researchers noted that CASAT is language and model-agnostic. "Our method is both language and LLM-agnostic, making it a general-purpose tool," they concluded.

US Government RFP Seeks Translation Into Four Native American Languages

The  United States  government has issued an unusual  RFP for translation  services: The target languages are all indigenous to the US. Th...