Thanks to advances in speech and the development of natural languages, we hope that one day you can ask your assistant what are the best salads. In the meantime, it is possible to ask your home instrument to play a song, or open vocals, which is a feature that is already available on most devices.
If you speak Moroccan, Algerian, Egyptian, Sudanese, or any other Arabic language, which varies greatly from region to region, where some are not understood, that is another matter. If your language is Arabic, Finnish, Mongolian, Navajo, or any other language that has a lot of technical difficulties, you may feel left out.
This complex construction aroused Ahmed Ali’s interest in finding an answer. He is a senior engineer at the Arabic Language Technologies team at the Qatar Computing Research Institute (QCRI) – part of the Qatar Foundation of Hamad Bin Khalifa University and founder of ArabicSpeech, “an existing platform to benefit from Arabic speaking science and speaking technologies.”
Ali was fascinated by the idea of talking to cars, electronics, and electronics many years ago while at IBM. “Can we build a multilingual machine — an Egyptian pediatrician to self-prescribe, a Syrian teacher to help children access key components of their study, or a Moroccan chef who explains the best couscous method?” he says. However, the algorithms that run the machine cannot detect about 30 Arabic versions, not to mention comprehension. Today, most voice recognition tools are available in English as well as a few other languages.
The coronavirus has also encouraged greater reliance on word-of-mouth technologies, while modern language-changing technologies have helped people to follow home-based instruction and distance. However, even though we have been using commanding words to help buy e-commerce and improve our families, the future has many uses.
Millions of people around the world are using open-air online training (MOOC) to gain more freedom and to participate without limit. Speech recognition is one of the key features in MOOC, where students can search for specific areas of speech in the course and assist in translation through footnotes. Vocabulary technology enables digital learning to reflect spoken words as writing texts in university classrooms.
According to a recent article in Speech Technology magazine, the word-for-word market is expected to reach $ 26.8 billion by 2025, while millions of consumers and companies around the world rely on word bots not only to connect their devices or vehicles but also. as well as to promote customer service, manage health skills, and improve access and integration for those with hearing, speech, or motor impairment.
In a 2019 survey, Capgemini predicted that by 2022, more than two-thirds of consumers would choose voice assistants instead of visiting stores or bank branches; an area that can rise exponentially, considering the domestic, remote and commercial livelihoods that have plagued the world for more than a year and a half.
However, these weapons fail to reach many parts of the world. For the 30 Arab tribes and millions of people, this is a rare opportunity.
Arabic for machines
English or French word boats are not perfect. However, training machines to understand Arabic are more complex for a number of reasons. Here are three well-known problems:
- Lack of letters. Arabic languages are common languages, as they are spoken. Most of the existing text is inaccurate, meaning that it does not contain words like acute (´) or grave (`) that indicate the meaning of the letters. Therefore, it is difficult to determine where vowels go.
- Lack of resources. There is a shortage of data written in various Arabic languages. Taken together, they do not have the perfect rules of grammar, including routine or style, word pronunciation, pronunciation, and emphasis. These resources are very useful in teaching computer models, and as a result have less disrupted the well-known development of the Arabic language.
- Morphological pull. Arabic speakers fluctuate many codes. For example, in areas ruled by the French — North Africa, Morocco, Algeria, and Tunisia — the languages feature many borrowed French words. As a result, there is a large number of so-called unused words, which word recognition technologies cannot understand because the words are not Arabic.
“But the field is moving fast,” says Ali. It is a collaborative work among many researchers to move it faster. Ali’s Arabic Language Technology is leading an ArabicSpeech project to integrate Arabic translations and dialects from each region. For example, Arabic can be divided into four regional languages: North Africa, Egypt, the Gulf, and Levantine. However, considering that languages do not fit the border, this can only work as well as one language per city; for example, an Egyptian native speaker can distinguish between the Alexandrian language and the Aswan native (1,000 miles[1,000 km]on the map).
Building a professional future for all
At the moment, machines are almost as accurate as human writers, thanks in large part to the advancement of neural network, a small part of mechanical engineering learning that relies on inspired algorithms and how the human brain works, naturally and functionally. Until recently, however, speech recognition has been somewhat disturbed. The technology has a reputation for relying on a variety of acoustic modeling modules, lexicon lexicons, and language adaptation; all modules that need to be trained separately. More recently, researchers have been teaching models that convert acoustics into text, which can advance all components to the final task.
Despite this progress, Ali is still unable to command many weapons in Arabic. “It’s 2021, and I can’t communicate with many machines in my language,” he said. I mean, I now have a device that can understand my English, but the recognition of most Arabic-speaking machines has not happened. “
Making this happen is the purpose of Ali’s work, which has culminated in the first transformation of the recognition of Arabic words and dialects; which has done well so far. Called the QCRI Advanced Transcription System, the technology is used by broadcasters Al-Jazeera, DW, and the BBC to record online content.
There are a few reasons why Ali and his team have done well in building speech engines right now. In particular, he says, “There is a need for language resources. We need to develop the tools to be able to teach the model. As Ali says, “We have good infrastructure, good modules, and we have data that represents reality.”
Researchers from QCRI and Canary AI recently developed models that could achieve social cohesion in Arabic. This system shows how Aljazeera’s daily subtitling reports. Although the prevalence of human error (HER) is approximately 5.6%, this study showed that Arabic HER is very high and can reach 10% due to morphological difficulties in the language and lack of reading rules in the Arabic language. Thanks to the recent advances in in-depth study and final design, the Arabic voice recognition engine is capable of surpassing speakers in broadcast voice.
While the modernization of Modern Standard Arabic dialects seems to be progressing well, researchers from QCRI and Canary AI are busy testing the limits of dialectal translation and finding the best results. Since no one speaks Modern Standard Arabic at home, caring for the language is one of the things we need in order for our vocabulary to understand us.
This was written by Qatar Computing Research Institute, Hamad Bin Khalifa University, member of the Qatar Foundation. Not written by MIT Technology Review authors.