New voice for laryngeal cancer patients: Researchers develop unique model for speech enhancement | Business

“Electronic larynxes – a box that is placed over the larynx – are uncomfortable and the sound they generate is often not transmitted properly by the phone (dismissed as noise). Algorithms adapted to other languages ​​cannot improve the Lithuanian language, so we have the task of generating an improved language from the basics, which would allow patients to recover a voice similar to a normal one”, says Rytis Maskeliūnas, a professor at the Faculty of Informatics of the Kaunas University of Technology (KTU IF) and one of the leaders of this project.

Personal archive photo/Rytis Maskeliūnas

The LARYNGOSPEACH S-MIP-23-46 project is carried out by a team of scientists from Kaunas University of Technology (KTU) and Lithuanian University of Health Sciences (LSMU). KTU researchers are led by professor Rytis Maskeliūnas, and LSMU by professor Virgilijus Ulozas.

Unnatural voice sound

The KTU scientist says that the idea of ​​the project came from cooperation with one of the best specialists in Lithuania – LSMU professor V. Uloz and his team – and analyzing how it would be possible to alleviate the problems caused by laryngeal cancer (a fairly common malignant tumor) patients.

In the case of a widespread tumor, an operation is often performed, during which the larynx is removed, which results in the loss of the ability to form the natural voice. This loss not only affects communication, but also creates psychological and social barriers to the patient’s well-being and integration into society.

“Traditional voice synthesis methods and hardware sound unnatural (robot-like) and basically don’t solve these problems. For example, when talking on the phone, you can easily understand that something is wrong with the person. Some phones completely eliminate the “damaged” voice – you don’t even understand what is being said,” says R. Maskeliūnas.

Moreover, there is no way to restore or accurately reproduce the unique characteristics of a person’s voice, at least somewhat similar to the one before the operation.

Personal archive photo/Effects of cancer

Personal archive photo/Effects of cancer

“A qualitatively generated voice (the patient can no longer speak by himself – the speech-forming organ has been operated on) not only helps to restore communication skills, but also plays an important role in the process of psychological recovery and social reintegration,” emphasizes the KTU professor.

The model adapted to the Lithuanian language

The KTU researcher states that the goal of this project is to create a model for improving the alaring language adapted to the phonetics of the Lithuanian language, which has not been done so far. This model has specific adaptations adapted to the nuances and characteristics of the Lithuanian language.

“Lithuanian inherited the old Indo-European word formation and phonological system, which is very different from other Indo-European languages ​​such as English, German, Dutch and French, so we will not ask ChatGPT for help here,” he says.

For example, English, German, and Dutch are considered accented languages, where accents tend to occur at regular time intervals. French is a predominantly syllabic interval language in which each syllable has almost the same duration.

R. Maskeliūnas explains that the Lithuanian language, on the contrary, is characterized by a mixed rhythm, but is much closer to determining the timing of accents. Although all these languages ​​use an unfixed accent (that is, the position of the accent in a word is unpredictable), Lithuanian accentuation is more complicated due to the high accent. In addition, palatalized and non-palatalized consonants are like separate phonemes in the Lithuanian language.

“In order to integrate the unique features of the Lithuanian language, such as phonetic inventory, prosody, intonation patterns and phonotactics, which are very different from other languages ​​with alarynx speech synthesis methods, language-specific adaptations are necessary. Therefore, it is impossible for the Lithuanian language to sound intelligible, for example, by retraining the language generation model created for English or French,” he notes.

According to the KTU professor, this language improvement model solves specific linguistic challenges, namely the Lithuanian alaring language, such as the preservation of unique phonemic contrasts and the management of complex phonological processes.

Electronic throats – with disadvantages

LSMU professor V. Uloz says that after the removal of the larynx due to laryngeal cancer, a person faces unique challenges that exceed the capabilities of conventional voice generation technologies. According to him, the main tool is electronic throats. It is a box that is held close to the neck (in the throat area) and generates a “robot” voice when speaking.

“Laryngeal pathologies can significantly alter the acoustic properties of the voice, such as pitch, timbre, and rhythm, resulting in a range of voice disorders, from mild dysphonia to severe disorders that greatly affect speech intelligibility,” he shares.

V.Ulozas says that one of the main challenges of synthesizing voice pathology is the variability and specificity of voice disorders. The complexity of these changes affects the ability of people with voice disorders to communicate naturally with others, especially at a distance. For example, we often wouldn’t even be able to tell who is calling us – the robotic voice of the box often sounds quite similar.

“Artificial intelligence (AI) reconstructed speech that reflects the identity of the patient’s voice would improve their quality of life and social interaction. Each pathology affects speech differently, so an individualized approach to speech generation that can adapt to the unique characteristics of a person’s voice is needed, which is what AI can help with,” he notes.

Invention is accessible to everyone

This invention is validated by a team of medical experts led by LSMU professor V. Ulozos, who conducted a clinical study with patients and used a set of voices collected in medical practice.

Speech recordings were made at regular outpatient visits at least 6 months after surgical treatment. This period was intended to ensure adequate healing and rehabilitation. The phonetically balanced Lithuanian sentence “Turejo senelė had a gray goat” was used for the recordings.

“The project is already halfway through, the solution is currently being clinically validated in a study conducted by LSMU. Colleagues from LSMU are working on the necessary metrics, and we – KTU – are striving for even more efficient qualitative improvements of AI algorithms”, shares R. Maskeliūnas.

According to him, in the future, the aim is to carry out full clinical validation, improve the naturalness of speech and create an app for ordinary users.

“We believe that everyone who wants to will be able to use this invention, because it will be a software solution – an app or a plug-in that cleans the voice,” says the KTU professor.


#voice #laryngeal #cancer #patients #Researchers #develop #unique #model #speech #enhancement #Business
2024-08-07 01:54:58

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.