Google announced that it is implementing 110 new languages in Google Translate, the company’s translation tool, the “biggest expansion ever”, which includes Portuguese from Portugal.
By 2022, Google had added 24 new languages using ‘zero-shot’ machine translation, where a machine learning model learns to translate into another language without ever seeing an example, and announced “the 1,000 Languages Initiative, a commitment to build AI [artificial intelligence] models that will support the 1,000 most spoken languages in the world,” Google recalls.
“We’re now using AI to expand the range of languages supported” and, “thanks to our great PaLM 2 language model, we’re starting to implement 110 new languages in Google Translate, our biggest expansion ever, including Portuguese from Portugal”, he says in an online publication.
In other words, Google Translate will now distinguish between Portuguese variants (Portugal versus Brazil).
“From Cantonese to Q’eqchi’, these new languages represent more than 614 million speakers, enabling translations for around 8% of the world’s population,” says Google.
Around a quarter of the new languages “are from Africa and represent our biggest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda and Wolof”, he adds.
Among the languages now supported in Google Translate is Afar, a tonal language spoken in Djibouti, Eritrea and Ethiopia. “Of all the languages in this launch, Afar had the highest number of voluntary contributions from the community,” he says.
Then there was Cantonese, which has long been “one of the most requested languages on Google Translate”, he continues.
Other examples are Manx, the Celtic language of the Isle of Man, which almost became extinct with the death of its last native speaker in 1974, but “thanks to an island-wide revival movement, there are now thousands of speakers”, and Nko, a standardized form of the Manding languages of West Africa that unifies many dialects into a common language.
“Its unique alphabet was invented in 1949 and has an active research community that today develops resources and technology for it,” says Google in its publication.
There is also Punjabi (Shahmukhi), a variety of Punjabi written in the Perso-Arabic script (Shahmukhi) and is the most widely spoken language in Pakistan, Tamazight, a Berber language spoken in North Africa, and Tok Pisin, a “creole of English origin and the lingua franca of Papua New Guinea”.
Languages “have immense variation: regional varieties, dialects, different spelling patterns” and, in fact, “many languages don’t have a standard format, so it’s impossible to choose the ‘right’ variety”.
But “our approach has been to prioritize the most commonly used varieties in each language,” he says.
“PaLM 2 has been a key piece in this puzzle, helping Translator to learn closely related languages more efficiently, including languages close to Hindi, such as Awadhi and Marwadi, and French Creoles, such as Seychelles Creole and Mauritian Creole,” he explains.
And as technology evolves “and we continue to partner with expert linguists and native speakers, we will support even more language varieties and spelling conventions over time”.