Silent Communication and AI will Change our Society: Silent Speech Enables Fusion of AI and Humans

Dr. Junichi Rekimoto (Professor, Interfaculty Initiative in Information Studies, The University of Tokyo, CSO/Research Director, Sony Computer Science Laboratories, Inc.) has pioneered the field of better interactions between humans and computers and the extension of human capabilities through computers. His research focuses on the integration of humans and AI. In this interview, he talks about his current research and how AI can be useful to humans.

Operating a computer with silent speech

We are working on the issue of extracting the speech intent (content) in humans, and one of the approaches is called silent speech. This approach involves reading the speech content by measuring only the mouth movements using sensors while a person is not speaking. Although voice recognition is widely used, there are issues of noise and confidentiality in public places. In addition, some people have difficulties in speaking due to the removal of their vocal cords or other reasons. Silent speech provides a means of operating computers and communicating with others by simply mumbling in one’s mouth even in such circumstances.

Furthermore, if we could step up from silent speech to direct transmission of a person’s thoughts from the brain to a computer, it would truly be a “heart-to-heart” communication. Therefore, as fundamental research, we are pursuing a technology that uses AI (deep learning) to decode the intent (content) of speech from brain information obtained by electrodes implanted in the brain in collaboration with Professor Yanagisawa of Osaka University.

From the Osaka Expo ’70 to the future interface

What triggered my interest in my current field of research was the Osaka EXPO in 1970, which I visited in my childhood. I was extremely impressed when I saw the light pen (a system for operating by touching a cathode-ray tube screen with a pen-type device) at the IBM pavilion. It made me want a career in making such things. So, from a rather early stage, I was interested in the theme of interaction between computers and humans.

I also liked the anime “Cyborg 009” and thought it was very cool that a human could become a cyborg.

I am not a brain specialist myself, but when I ask what the “ultimate interface” would, the “brain interface” is always mentioned. The brain research in IoB began with the idea of working together with brain researchers to consider whether such an ultimate interface is possible and what would make us happy if it were possible.

Converting various data into phoneme data using AI

The basic process of lip reading (reading mouth movements) involves to shoot a video with an ordinary video camera, implement facial recognition, cut out an area of the lips, and put the image into the so-called AI (deep learning) neural network. By dividing the data into phoneme data (the smallest unit of speech), it can lead to speech recognition and speech synthesis. In addition to video, mouth movements are measured by ultrasonic and acceleration sensors and converted into phoneme data in almost the same process.

On the other hand, there is the problem that it is difficult to collect the data needed for brain waves decoding. In other words, we can only get data from specific subjects, such as those wearing BMIs (brain-machine interfaces), which is different from the field where big data is available, such as video. So, we are thinking of researching models that can learn from small amounts of data.

Feedback and the sense of speaking

I think that whether there truly is an intention or a “content to be conveyed” before speaking is an issue to be addressed scientifically. In lip-reading experiments, it has been demonstrated that when comparing between completely no voice production and small voice production like whispering, the latter is easier to utter. When you produce completely no voice, there is no feedback at all, so you do not know how you are moving your mouth. The presence of such feedback is the key point. This suggests that it is arguable whether we already have a complete content on what we want to convey before we speak.

Speech is complete when the motor cortex gives the command and the mouth and the throat move, but it is also important to listen to it again by yourself. I believe that the sense of speaking is generated only when one realizes, through listening, that they are indeed saying exactly what they want to say.

Perhaps, some kind of feedback may be necessary in the same way when decoding brain waves. I think it is important for so-called “heart-to-heart” computing to have a series of feedback loops, in which the decoded information is returned to the human being, rather than simply completing the decoding of brain waves.

I want to see a society where AI and humans are integrated

We assume that silent speech will be very widespread by 2050 and that its connection to AI will be very significant. For example, if you don’t know something, you can mumble it in your head and have AI answer you, and it will not be much different from you havingknown it all along. In other words, we believe that human capabilities will be extended (human augmentation) to integrate AI and humans.

As for decoding brain waves, I feel that it will take a little more time. More comprehensive research and development is needed, including decoding techniques and solving the ethical issues of implanting them in the brain, etc. Whether you will be using them by 2050 depends on the societal situation, but as for fundamental research, I think we will be able to find out what people think through decoding techniques.

I want to see how society will change through the fusion of AI and humans. I also want to create and contribute to it. On the other hand, I am aware that as powerful as any technology is, there is always the potential danger of it being used in a negative way. However, it is the human wisdom to use it in the right way. My stance is to try, despite my fears, and correct any problems encountered.

Use technology to the fullest and challenge the limits of your imagination!

I believe that the advancement of technology, including AI, will drastically change society and mankind, and young people will be the ones who witness and create such changes. I strongly encourage that you will continue to pursue what you find interesting. Technology should then encourage your imagination. I would like you to use all the tools and technologies available to you, create what you don’t have, and challenge your imagination to the limit. I think it would be a lot of fun if such a society became a reality.

Interview, Writing, and Video Editing by Space-Time Inc.