Thursday, May 22, 2025

IIT Bombay Explores Accent-Aware Speech Translation

 

IIT Bombay Translation

In a May 4, 2025 paper, researchers at IIT Bombay introduced a new approach to speech-to-speech translation (S2ST) that not only translates speech into another language but also adapts the speaker’s accent.

This work aligns with growing industry interest in accent adaptation technologies. For example, Sanas, a California-based startup, has built a real-time AI accent modification tool that lets users change their accent without changing their voice. Similarly, Krisp offers AI Accent Conversion technology that neutralizes accents in real time, improving clarity in customer support and business settings.

While Sanas and Krisp focus on accent adaptation alone, the IIT Bombay researchers explore how accent and language translation can be combined in a single model.

“To establish effective communication, one must not only translate the language, but also adapt the accent,” the researchers noted. “Thus, our problem is to model an optimal model which can both translate and change the accent from a source speech to a target speech,” they added.

Scalable and Expressive Cross-Lingual Communication

To do this, they proposed a method based on diffusion models, a type of generative AI typically associated with image generation — DALL-E 2, which creates realistic images based on the user’s text input, is an example of diffusion models — but their applications extend to other domains, including audio generation.

They implemented a three-step pipeline. First, an automatic speech recognition (ASR) system converts the input speech into text. Then, an AI translation model translates the text into the target language. Finally, a diffusion-based text-to-speech model generates speech in the target language with the target accent.

So, the core innovation lies in the third step, where the researchers used a diffusion model for speech synthesis. In this case, instead of creating images, the model generates mel-spectrograms (i.e., visual representations of sound) based on the translated text and target accent features, which are then turned into audio. For this, the researchers used GradTTS, a diffusion-based text-to-speech model, as the foundation of their system.

They tested their model on English and Hindi, evaluating its ability to generate speech that reflects both the correct translation and target accent. “Experimental results […] validate the effectiveness of our approach, highlighting its potential for scalable and expressive cross-lingual communication,” they said.

The researchers acknowledged several limitations, but they still see this as a promising starting point. “This work sets the stage for further exploration into unified, diffusion-based speech generation frameworks for real-world multilingual applications,” they concluded.

Authors: Abhishek MishraRitesh Sur ChowdhuryVartul BahugunaIsha Pandey, and Ganesh Ramakrishnan

Wednesday, May 21, 2025

AI Tech Consulting Firm Quansight Acquires Cobalt Speech and Language


AI Tech Consulting Firm Quansight Acquires Cobalt Speech and Language

On May 6, 2025, open source technology consulting firm Quansight announced that it had acquired Cobalt Speech and Language, a provider of automatic speech recognition (ASR), transcription, natural language understanding, and other voice technologies in multiple languages. The deal closed April 10, 2025. 

According to Quansight CEO Travis Oliphant, the purchase was for cash, earn-out, and equity in Quansight portfolio companies. 

“Quansight builds AI systems and has key developers who know how to build the tools behind AI (PyTorchJAXTensorflow, and NumPy),” Oliphant told Slator. “Cobalt builds language systems that use these tools.”

Oliphant said that Quansight decided to acquire Cobalt, rather than build its own speech technologies in-house, based on the strength of Cobalt’s team, which could help maintain a certain speed of development. Of course, he acknowledged that the prospect of acquiring Cobalt’s customers was also attractive.

Massachusetts-based Cobalt was founded in 2014 by CEO Jeff Adams, known as the “father of Alexa” for his work on Amazon Echo. A press release on the acquisition quoted Adams as saying that Cobalt has “always focused on delivering highly customized speech and language tools that work in the real world, not just in the lab.”

Oliphant told Slator that Cobalt’s approximately 15 employees will join Quansight’s team of 80.

Cobalt currently offers several voice-enabled technologies, including Cobalt Transcribe for speech recognition and transcription. Its end-to-end speech recognition engines are powered by deep neural networks (DNNs), and clients can choose from two different DNN models based on their needs.

Hybrid models use separately tunable acoustic models, lexicons, and language models for maximum flexibility and customization for various use cases.

End-to-end models, meanwhile, directly convert sounds to words within the same DNN. This version works for general use and tends to produce more accurate transcriptions (based on word error rates) than the hybrid models.

Speech recognition is available in English, Spanish, French, German, Russian, Brazilian Portuguese, Korean, Japanese, Swahili, and Cambodian, though Cobalt is “always looking for partners to develop, sell, and/or market speech technology in other languages,” according to Cobalt Transcribe FAQs.

Other services include Cobalt Speech Intelligence, which analyzes audio to glean demographic information about speakers, such as age, gender, and regional accent, plus emotion.

Investments and Intersections

As a consulting firm, Quansight specializes in solving data-related problems with open-source software and services, including AI, data and machine learning engineering, RAG, and large language models (LLMs), among others. 

Quansight, founded in 2018, has previously invested in pre-seed rounds for two other companies: Savimbo, a certifier of fair-trade carbon, biodiversity, and water credits; and Mod Tech Labs, an AI platform for 3D content creation. 

Quansight Initiate, an early-stage VC firm also headed by Oliphant, has invested in five open source tech startups since its 2019 founding.

“Quansight recently completed a restructuring of subsidiary companies,” Oliphant explained to Slator. “Going forward, M&A activities will focus on OpenTeams (for AI growth), OS BIG, Inc. dba OpenTeams Incubator (investment and M&A), Cobalt Speech and Language (speech and language technology and services), and Quansight, PBC to continue with the community-driven open-source aspect of its business.”

“All of our companies now have either existing or prospective intersections with the language industry,” he added.

Monday, April 7, 2025

Language Discordance Raises Risk of Hospital Readmissions, U.S. Study Finds

 A June 2024 meta-analysis published in BMJ Quality & Safety was recently brought back into the spotlight by Dr. Lucy Shi, who discussed its findings in an article for The Hospitalist. The study, conducted by Chu et al., examined the link between language discordance and unplanned hospital or emergency department (ED) readmissions.

US Study Finds that Language Discordance Increases Risk of Hospital Readmissions

The researchers also evaluated whether interpretation services could help reduce disparities in these outcomes between patients who speak a non-dominant language and those who do not. Their analysis was based on a literature search of PubMed, Embase, and Google Scholar, initially conducted on January 21, 2021, and updated on October 27, 2022.

Extensive research has shown that patients and families with non-dominant language preferences often face challenges in communication, understanding medical information, and accessing care. Language discordance can contribute to adverse events and poorer outcomes during critical care transitions, such as hospital discharge.

The authors of the paper note that previous research on the effects of language discordance on hospital readmissions and emergency department (ED) revisits has produced mixed results — differences they partially attribute to variations in study criteria and methodologies.

The studies included in the meta-analysis were primarily conducted in Switzerland and English-speaking countries such as the the USAustralia, and Canada. These studies reported data on patient or parental language skills or preferences and measured outcomes such as unplanned hospital readmissions or ED revisits.

To maintain consistency, the authors excluded non-English studies, those lacking primary data, and studies that did not stratify patient outcomes by language preference or use of interpretation services. Ultimately, the analysis included data from 18 adult studies focused on 28- or 30-day hospital readmissions, seven adult studies on 30-day ED revisits, and five pediatric studies examining 72-hour or seven-day ED revisits.

Findings
The meta-analysis revealed that adult patients with language discordance had higher odds of hospital readmission. Specifically, the data showed a statistically significant increase in 28- or 30-day readmission rates for adults with a non-dominant language preference (OR 1.11; 95% CI: 1.04 to 1.18).

Importantly, the impact of interpretation services was notable. In the four studies that confirmed the use of interpretation services during patient-clinician interactions, there was no significant difference in readmission rates. In contrast, studies that did not specify whether interpretation services were provided showed higher odds of readmission for language-discordant patients.

Adult patients with a non-dominant language preference also faced higher odds of emergency department (ED) readmission compared to those who spoke the dominant language. Specifically, the meta-analysis found a statistically significant increase in unplanned ED visits within 30 days among language-discordant adults.

However, this trend was not observed in studies where the use of interpretation services was verified. The authors concluded that “providing interpretation services may mitigate the impact of language discordance and reduce hospital readmissions among adult patients.”

For pediatric patients, the analysis indicated that children whose parents were language-discordant with providers had higher odds of ED readmission within 72 hours and seven days, compared to children whose parents spoke the dominant language fluently.

That said, the authors noted that a meta-analysis for pediatric hospital readmissions was not conducted due to the limited number of studies and inconsistencies in study design. The individual pediatric studies reviewed did not yield statistically significant results.

The study highlights key limitations in the current evidence base — particularly regarding pediatric readmissions and the effectiveness of language access interventions on clinical outcomes. Variability in how language discordance is defined and measured across studies was also identified as a limitation.

The authors recommend developing a more standardized approach to identifying patients facing language-related barriers to care and determining whose language preferences — whether the patient’s or a parent’s — are most influential in shaping clinical outcomes.

Thursday, April 3, 2025

Dealmaker Stuns with $10M Donation for Translation Education

Florian and Esther discuss the language industry news of the week, breaking down Slator’s 2025 Language Service Provider Index (LSPI), which features nearly 300 LSPs and reports 6.6% combined growth in 2024 revenues, totaling USD 8.4bn.

Florian touches on a surprise USD 10m donation from private equity executive Mario Giannini to launch a new MA translation and interpreting program at California State University, Long Beach. The duo talks about McKinsey’s State of AI report, which continues to classify translators as AI-related roles and shows that hiring them has become slightly easier.

In Esther’s M&A corner, TransPerfect announced two acquisitions, Technicolor Games and Blu Digital Group, further expanding its presence in gaming and media localization. In Israel, BlueLion and GATS merged to form TransNarrative, and Brazilian providers Korn Translations and Zaum Langs joined forces under the Idlewild Burg group.

Meanwhile, in funding, Teleperformance invested USD 13m in Sanas, a startup offering real-time accent translation for call centers to improve global communication. Lingo.dev raised USD 4.2m, while Dubformer secured USD 3.6m to develop the ‘Photoshop of AI dubbing’.

Florian shares insights from Slator’s 2025 Localization Buyer Survey, which found that over half of buyers want strategic AI support from vendors and many cite inefficient automation as a key challenge.

Wednesday, March 26, 2025

Private Equity Executive Donates $10M to Launch New MA in Translation Program

Amid growing uncertainty about translation careers due to AI advancements and sensationalized headlines, one California university is celebrating a transformative donation.

On March 11, 2025, California State University, Long Beach (CSULB) announced that Mario Giannini, Executive Co-Chairman of private equity firm Hamilton Lane, has gifted $10 million to establish a Master of Arts program in Translation and Interpreting. The program is set to launch in the fall of 2026.

This isn’t Giannini’s first significant contribution to CSULB. In January 2017, he donated $1.75 million to establish The Clorinda Donato Center for Global Romance Languages and Translation Studies, named after its current director.

CSULB Center Expands Translation Studies with Additional $5.25M Gift

Housed within CSULB’s College of Liberal Arts, The Clorinda Donato Center for Global Romance Languages and Translation Studies serves as a hub for both pedagogical research and instruction in Romance languages.

Students can pursue a minor or graduate certificate in translation studies, while a major in translation is available through collaboration with the Department of Linguistics. The Center has also hosted internship programs, with growing demand from students across disciplines—ranging from speech-language therapy and the arts to politics—seeking on-campus experience.

In 2022, Giannini contributed an additional $5.25 million to The Center, marking the largest gift in the history of CSULB’s College of Liberal Arts. A write-up at the time highlighted The Center’s uniqueness, stating, “The Center is unique in the State of California, offering world-class training in translation studies at state university prices.”

Born in France to Italian-speaking parents, Giannini graduated from Cal State Northridge in 1973 with a BA in English and has credited the CSU system as “a huge influence” on his career. He currently serves as CEO of Hamilton Lane and sits on the firm’s investment committees.

According to Director Clorinda Donato, who also teaches Italian and French, most of the new funds will be used to create scholarships for students admitted to the master’s program.

CSULB to Revive Translation Program with Advanced AI Integration

CSULB’s original translation and interpreting program, founded in the 1980s by renowned legal interpreter Alexander Rainof, was retired when he stepped down. CSULB later proposed reviving the program and presented the idea to Mario Giannini, who was chosen for his connection as a CSU alumnus.

The current undergraduate program, along with the new two-track MA program, aims to train students in diverse areas such as audiovisual, community, educational, legal, literary, and medical translation and interpreting. According to Director Clorinda Donato, the curriculum will take an applied approach by integrating advanced AI, large language models (LLMs), and data science.

“We will proudly offer generous funding to each year’s cohort of prospective students to help reduce the costs of their graduate education,” Donato said. Annual tuition is $18,972 for undergraduates and $17,922 for graduate students, excluding room and board.

Friday, March 21, 2025

SlatorCon Remote March 2025 Offers Essential Insights on the Language Industry and AI

 A Pinch, a Twitch, and Everything in Between: Pinch’s Christian Safka and Twitch’s Susan Maria Howard were among the top language industry leaders who joined hundreds of attendees on March 18, 2025, for the first SlatorCon Remote conference of the year.

Kicking off the day’s events, Slator’s Head of Advisory, Esther Bond, welcomed attendees and invited Managing Director Florian Faes to share the latest findings and insights in his highly anticipated 'industry health check.

In his presentation, Faes began by reflecting on the challenges of 2024. He discussed data from Slator’s 2025 Language Service Provider Index (LSPI) and highlighted the growth of interpreting-focused companies, contrasted with the struggles faced by small, undifferentiated agencies and the rapid rise of language AI, driven by companies like ElevenLabs and DeepL.

Faes also highlighted key findings from Slator’s 2025 Localization Buyer Survey, including the challenges buyers face in implementing AI and the growing need for AI partners to address inefficiencies. He also noted the mixed outlook for the industry in the year ahead.

LLMs Are Just the Beginning

The first expert presentation was delivered by Sara Papi, a Postdoctoral Researcher at the Fondazione Bruno Kessler, who discussed the current state of research in simultaneous speech-to-text translation.

Papi highlighted discrepancies between the original definition and current practices in the speech translation field, identified through a review of expert literature. She specifically pointed out issues related to the use of pre-segmented speech and inconsistencies in terminology.

Slator’s Head of Research, Anna Wyndham, moderated the first panel of the day, featuring Simone Bohnenberger-Rich, Chief Product Officer at Phrase; Simon Koranter, Head of Global Production & Engineering at Compass Languages; and Matteo Nonne, Localization Program Manager at On.

The panelists discussed the evolving role of generative AI in localization, highlighting its shift from initial experimentation to scalable solutions that drive growth. They shared insights on how AI is transforming localization from a cost center into a strategic function by enabling customized, context-aware content adaptation and addressing challenges related to return on investment (ROI) and stakeholder expectations.

Slator’s Alex Edwards, Senior Research Analyst, moderated another panel discussion focused on the adoption of large language models (LLMs) for AI translation in enterprise workflows. Panelists Manuel Herranz, CEO of Pangeanic, and Bruno Bitter, CEO of Blackbird.io, explored whether LLMs truly represent the state of the art.

Herranz and Bitter emphasized that middleware and techniques like Retrieval-Augmented Generation (RAG) are more advanced, and highlighted the importance of fine-tuning smaller, domain-specific models. They also discussed the role of orchestration technology in effectively managing a range of AI tools.

In his presentation, Supertext’s CEO Samuel Läubli echoed insights shared by other speakers, emphasizing that LLMs generate fluent texts by considering broader context. He explored the implications of an AI-first era for translation, the rise of smaller competitive players, and the continued importance of human expertise.

Läubli highlighted that the new Supertext resulted from a 2024 merger between LSP Supertext and AI translation company Textshuttle. He remarked, “I’ve been working in this field for 10 years now, but I haven’t seen a system or AI agent that can guarantee a correct translation — and I’m quite sure I won’t see it in the next 10 years.”

Teresa Toronjo, Localization Manager at Malt, discussed collaboration within leaner localization teams, stressing the importance of diverse partnerships, scalable processes, and maintaining quality consistency with cost-effectiveness guided by experts.

If you missed SlatorCon Remote March 2025 in real-time, recordings will be available soon through our Pro and Enterprise plans.

Thursday, March 20, 2025

AI Enhances Multilingual Patient Care with Insights from Jaide Health CEO Joe Corkery, MD

Joe Corkery, MD, CEO and Co-Founder of Jaide Health, joins SlatorPod to discuss how Jaide Health is driving medical interpreting and translation with AI, bridging communication gaps for limited English proficiency (LEP) patients and improving healthcare accessibility.

With a background in computer science, medicine, and AI product leadership at Google, Joe co-founded Jaide Health with Julie Wilner, RN, in 2023 to address a long-standing need for real-time, interactive communication for the LEP patient population.

Unlike older machine translation models, which worked sentence by sentence without context, Joe shares how generative AI can maintain coherence, track gender references, and infer meaning from prior context — crucial in medical settings.

The CEO remains pragmatic about Trump’s executive order designating English as the US’s official language and revoking previous language access mandates. He argues that such policies will not change the healthcare industry’s commitment to multilingual patient care but may push hospitals to seek more cost-effective solutions — potentially accelerating AI adoption.

Looking ahead, Jaide Health is focusing on expanding into document translation, particularly for discharge instructions and patient portal messaging, areas where current solutions are slow or impractical.

EU Language Law with Professor Stefaan van der Jeught

  Stefaan van der Jeught, Professor of EU Constitutional Law at  Vrije Universiteit Brussel , and a Press Officer at the  Court of Justice o...