University of Helsinki researchers teach Comp



Computers usually only understand Finnish as the normative standard known as kirjakieli. However, dialects of Finnish cause many problems when interacting with computers because it is impossible to speak a language without speaking a dialect. A research group has developed artificial intelligence (AI) models that can automatically recognize, normalize and generate Finnish dialects. The results were published in The 2021 conference on empirical methods in natural language processing.

Collecting data to help an AI understand the Finnish and Swedish dialect has been on the news lately. The methods of the research group of Mika Hämäläinen, Niko Partans, Khalid Alnajjar and Jack Rüter from the University of Helsinki go further and enable an AI to speak the Finnish dialects fluently.

As part of the paradigm of computer-aided creativity, they have developed a method to convert standard Finnish into one of the 23 Finnish subdialects. Computers should not only understand dialectical Finnish, but also be able to express themselves in a dialect.

“With our method, an intelligent system like a robot can say akku on lopussa (battery almost empty), for example in Etelä-Karjala dialect akku o lopussa, Etelä-Satakunta dialect akku ol lopus or Länsi-Uusimaa dialect akku o lopus . “Says Hämäläinen.

For example, Google Translate’s widely used algorithm cannot translate a dialect Finnish sentence Oisko sulla jotai esimerkei siit (Do you happen to have some examples of this) produce a completely wrong translation into “English” Oisko sulla something like that just because Google Translate was developed exclusively for standard Finnish. The same phenomenon can be observed with all AI tools that support Finnish, like Apple Siri or dictation in macOS.

Dialects are recognized from both spoken audio and text

Research shows that recognizing dialects is a difficult task when relying on plaintext. Dialect recognition is easier if the model also has access to audio, since many dialects are labeled with distinctive phonetic properties. Hence, the latest research published by the researchers deals with the recognition of dialects in both spoken audio and text.

“The process of normalizing dialects to standard text has many advantages. It enables the analysis of dialect materials with tools for the standard Finnish, and we can also use the normalized version as a search term if we want to find something from the dialect materials, ”says Khalid Alnajjar.

The researchers recall that the problem of understanding dialects is complex and no model can understand natural language like humans. But the models created open up many other interesting directions for research, such as the extent to which a dialect deviates from the norm and which syntactic differences exist between different language varieties.

“This will allow us to improve the current state of Finnish natural language processing solutions and create AI models tailored to individuals. For example, we have already achieved impressive results in recognizing a person’s language even in endangered languages, ”says Niko Partanen

The research group has also developed a similar normalization methodology for the dialects of Swedish (Hämäläinen et al., 2020b) and historical Finnish (Hämäläinen et al., 2021b) spoken in Finland.

The dialect generator can be tested online ( and the dialect normalizer and generator code have been published openly on Github ( You can also find the dialect identification code on Github (

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of press releases sent to EurekAlert! by contributing institutions or for the use of information via the EurekAlert system.



Comments are closed.