Thursday, May 18, 2023

How Large Language Models Prove Chomsky Wrong with Steven Piantadosi

Joining SlatorPod this week is Steven Piantadosi, Associate Professor of Psychology at UC Berkeley. Steven also runs the computation and language lab (colala) at UC Berkeley, which studies the basic computational processes involved in human language and cognition.


Steven talks about the emergence of large language models (LLMs) and how it has reshaped our understanding of language processing and language acquisition.

Steven breaks down his March 2023 paper, “Modern language models refute Chomsky’s approach to language”. He argues that LLMs demonstrate a wide range of powerful language abilities and disprove foundational assumptions underpinning Noam Chomsky’s theories and, as a consequence, negate parts of modern.

Steven shares how he prompted ChatGPT to generate coherent and sensible responses that go beyond its training data, showcasing its ability to produce creative outputs. While critics argue that it is merely an endless sequence of predicting the next token, Steven explains how the process allows the models to discover insights about language and potentially the world itself.

Steven acknowledges that LLMs operate differently from humans, as models excel at language generation but lack certain human modes of reasoning when it comes to complex questions or scenarios. He unpacks the BabyLM Challenge which explores whether models can be trained on human-sized amounts of data and still learn syntax or other linguistic aspects effectively.

Despite industry advancements and the trillion-dollar market opportunity, Steven agrees with Chomsky’s ethical concerns, including issues such as the presence of harmful content, misinformation, and the potential impact on job displacement.

teven remains enthusiastic about the potential of LLMs and believes the recent advancements are a step forward to achieving artificial general intelligence, but refrains from making any concrete predictions.

Thursday, May 11, 2023

Why Large Language Models Hallucinate When Machine Translating ‘in the Wild’

 Large language models (LLMs) have demonstrated impressive machine translation (MT) capabilities, but new research shows they can generate different types of hallucinations compared to traditional models when deployed in real-world settings. 

The findings, published in a paper on March 28, 2023, included evidence that the hallucinations were more prevalent when translating into low-resource languages and out of English and that they can introduce toxic text.

Hallucinations present a critical challenge in MT, as they may damage user trust and pose serious safety concerns, according to a 2022 research paper, though studies to improve the detection and mitigation of hallucinations in MT have been limited to small models trained on a single English-centric language pair.

This has left “a gap in our understanding of hallucinations […] across diverse translation scenarios,” explained Nuno M. Guerreiro and Duarte M. Alves from the University of Lisbon, Jonas Waldendorf, Barry Haddow, and Alexandra Birch from the University of Edinburgh, Pierre Colombo from the Université Paris-Saclay, and André F. T. Martin, Head of Research at Unbabel, in the newly published research paper.

Looking to fill that gap, the researchers conducted a comprehensive analysis of various massively multilingual translation models and LLMs, including ChatGPT. The study covered a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs.

According to the authors, this research provides key insights into the prevalence, properties, and mitigation of hallucinations, “paving the way towards more responsible and reliable MT systems.”

Detach from the Source 

The authors found that hallucinations are more frequent when translating into low-resource languages and out of English, leading them to conclude that “models tend to detach more from the source text when translating out of English.”

In terms of type of hallucinations, oscillatory hallucinations — erroneous repetitions of words and phrases — are less prevalent in low-resource language pairs, while detached hallucinations — translations that bear minimal or no relation at all to the source — occur more frequently. 

According to the authors, “this reveals that models tend to rely less on the source context when translating to or from low-resource languages.”

The rate of hallucinations exceeded 10% in some language pairs, such as English-Pashto, Tamil-English, Azerbaijani-English, English-Azerbaijani, Welsh-English, English-Welsh, and English-Asturian. However, the authors suggest that hallucination rates can be reduced by increasing the size of the model (scaling up) or using smaller distilled models.

Hallucinations and Toxicity

The authors also found that hallucinations may contain toxic text, mainly when translating out of English and into low-resource languages, and that scaling up the model size may not reduce hallucinations. 

This indicates that hallucinations might be attributed to toxic patterns in the training data and underlines the need to filter the training data rigorously to ensure the safe and responsible use of these models in real-world applications.

The authors emphasize that while massive multilingual models have significantly improved the translation quality for low-resource languages, the latest findings underscore potential safety concerns and the need for improvement.

To mitigate hallucinations and improve overall translation quality, they explored fallback systems, finding that hallucinations can be “sticky and difficult to reverse when using models that share the same training data and architecture.” 

However, external tools, such as NLLB, can be leveraged as fallback systems to improve translation quality and eliminate pathologies such as oscillatory hallucinations.

ChatGPT Surprise

The authors also found that ChatGPT produces different hallucinations compared to traditional MT models. These errors may include off-target translations, overgeneration, or even failed attempts to translate. 

Furthermore, unlike traditional MT models, which frequently produce oscillatory hallucinations, ChatGPT does not generate any such hallucinations under perturbation. “This is further evidence that translation errors, even severely critical ones, obtained via prompting an LLM are different from those produced by traditional machine translation models,” explained the authors.

Moreover, the results revealed that ChatGPT generates more hallucinations for mid-resource languages than for low-resource languages, highlighting that “it surprisingly produces fewer hallucinations for low-resource languages than any other model.”

The authors note that while the majority of the hallucinations can be reversed with further sampling from the model, this does not necessarily indicate a defect in the model’s ability to generate adequate translations, but rather may be a result of “bad luck” during generation, as Guerreiro, Martins, and Elena Voita, AI Research Scientist at Meta, wrote in a 2022 research paper.

To facilitate future research in this area, the authors have made their code openly available and released over a million translations and detection results across several models and language pairs.

US Government RFP Seeks Translation Into Four Native American Languages

The  United States  government has issued an unusual  RFP for translation  services: The target languages are all indigenous to the US. Th...