THE REPRESENTATION PROBLEM OF VOICE-TECHNOLOGY
The future of human-machine interaction lies in voice control. According to Gartner, by 2020 30% of web browsing sessions will be done by voice interaction. However, developers, researchers and startups around the globe working on voice-recognition technology face one problem alike: a lack of freely available voice data in their respective language to train AI-powered speech-to-text engines.
Indeed, while there are approximately 7,100 living languages in the world today, more than 50% of the Internet content is in English. Local languages are already severely underrepresented, and voice-enabled applications and services might further increase this inequality of access and inclusion.