Most people can navigate a foreign country with a few broken phrases and a dependence on nonverbal gestures, but as soon as they need to call for assistance, language barriers are hard to surmount. If we find ourselves stuck on the side of the road with a flat tire and the customer service agent doesn’t speak the same language, how will they be able to help us?

Researchers from Nokia have solved this problem with a real-time speech translation service. The solution is integrated into simple voice services during calls, and after a specific language pair is selected, such as English-Russian, speech is translated between the parties, allowing the callers to hear the original speech first, followed by its translation.

“Where many translation services can only operate between accepted contacts, and on certain devices, our solution works with any device capable of voice calls,” said Nokia researcher, Máté Ákos Tündik.

To prove their concept, the Nokia researchers compared their real-time translation service against natively combined or chained communication and translation Over-the-Top (OTT) providers, such as Google, Facebook and Skype. They found many of these OTT solutions can only be used between contacts that know each other, and many of these also require each party to download the same application. They also found that OTT translation services often have certain bandwidth requirements, which mostly require 4G.

Understanding the immediacy of customer service, the researchers recognized that their solution needed to work with any device and bandwidth, without any application download, in real-time.

To achieve this, the Nokia Mobile Speech Translation (NMST) solution is applied ‘on-the-fly’ when users realize after being connected that the other party does not speak their language. This is enabled by the recognition of a specific Dual Tone Multi Frequency (DTMF).

As many basic voice calls use the Circuit Switched (CS) access network, which restricts data access (as used by 4G), NMST uses the Microsoft Azure Cognitive API embedded in the Nokia Data Center for translation services. To access the translation service in the data center, the main technical requirement is to establish a simple voice call connection with the operator’s mobile network.


Figure 1 : Core Network level architecture

To translate the ‘source’ speech from one language to another, the invoked speech translation service of Microsoft goes through four steps. The first phase involves automatic speech recognition, which converts the speech to text using a deep neural network trained and optimized for normal conversations. The next step applies TrueText, a technology that normalizes the text to make it more appropriate for translation by deleting speech disfluencies. The text is then translated with models developed for spoken conversation. Finally, the text is converted to speech, and the translated audio is provided to the callers.


Figure 2: The speech translation flow

After a successful Proof of Concept demonstration, the service has been piloted within a Swedish mobile operator’s network. To further enhance the solution, the researchers would like to integrate more translation engines, allowing the Nokia Data Center to pull in additional translation sources to best suit users’ needs.

“As this technology is further developed, similar translation integrations could be used for multinational companies,” explained Máté Ákos Tündik. “This would allow co-workers in different countries to communicate with one another, bringing greater collaboration to the workplace.”

For more information on speech translation, visit the IEEE Xplore Digital Library.