Developed by American researchers, EchoSpeech glasses are able to read the lips of the wearer and send their words to their smartphone. The device is original in that it uses sonar rather than a camera to analyze lip movements.
Researchers at Cornell University in the United States have developed a prototype of glasses that can read the lips of the wearer. Called EchoSpeech, the interface is able to recognize regarding thirty silent commands from the movements of the mouth and lips and transmit them to the user’s smartphone. Combining sonar and AI technologies, the device would be able to work following only a few minutes of training, according to the researchers.
The potential of the device is interesting. EchoSpeech might in particular serve as a voice synthesizer and transcribe the words of people who cannot emit sounds. The technology might also make it possible to communicate via smartphone in a library where one is supposed to remain silent, or on the contrary in a noisy restaurant.
Sonar rather than camera
EchoSpeech isn’t the first lip-reading technology. A year ago, researchers at Meta (Facebook) unveiled, for example, a neural network that uses both audio and visual data to understand what a person is saying (> When an AI can read lips).
One of the novelties of EchoSpeech technology is that it “reads lips” not by looking at them but by capturing their movement with a sonar system combined with AI. Concretely, the device of the glasses sends acoustic waves towards the face from which it then recovers the echo. A deep learning algorithm makes it possible to analyze this echo in real time to deduce the movements of the lips with an accuracy of 95%, according to the researchers.
The use of an acoustic sonar rather than a camera has many advantages. First of all, the device is more compact, less energy-intensive and preserves more of the private sphere. From a user experience point of view, the method avoids having to position oneself to be seen by a camera. In addition, the acoustic data being less voluminous than video data, their transmission to the smartphone requires less bandwidth and can be carried out via Bluetooth in real time. Finally, since the data is analyzed directly on the smartphone, the risk of having to send it to the cloud is eliminated.
View on topic:
> The Cornell researchers’ article: EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing
> TinyML: for a decentralized and less energy-consuming AI
> How to design quiet technologies?