Using Inaudible Voice Commands to Control Siri and Alexa

Researchers have developed a method for sending human-inaudible ultrasonic voice commands to voice-enabled assistants such as Alexa, Siri, and Google Assistant that could be used to force the assistants to visit attacker-controlled websites or take control of other connected smart devices.

The technique is known as DolphinAttack and was developed by academic researchers at Zhejiang University in China. It takes advantage of the way that the microphones in devices such as the Amazon Echo or the iPhone work and the researchers were able to execute it by using commodity hardware that’s easily available to anyone. The attack setup included an Android phone, an external battery, and amplifier, and and ultrasonic transducer/

“DolphinAttack utilizes inaudible voice injection to control VCSs silently. Since attackers have little control of the VCSs, the key of a successful attack is to generate inaudible voice commands at the attacking transmitter. In particular, DolphinAttack has to generate the baseband signals of voice commands for both activation and recognition phases of the VCSs, modulate the baseband signals such that they can be demodulated at the VCSs efficiently, and design a portable transmitter that can launch DolphinAttack anywhere,” the research paper says.

“DolphinAttack utilizes inaudible voice injection to control VCSs silently.”

There are a number of caveats to this technique, with the most important being the need to wake up or activate the voice-controlled device. The Amazon Echo, for example, activates when a user says “Alexa” or whatever other wake word he has chosen. Siri has a similar feature that will cause it to activate when an authorized used says “Hey Siri”. In order to execute their attack, the researchers had to find a way to activate a target device, so they developed a couple of methods, including putting together a set of voice tones that can serve as a kind of brute-force tool for text-to-speech systems (TTS).

“In DolphinAttack, we prepare a set of activation commands with various tone and timbre with the help of existing TTS systems, which include Selvy Speech, Baidu, Google, etc. In total, we obtain 90 types of TTS voices. We choose the Google TTS voice to train Siri and the rest for attacking,” the papers says.

In order for the technique to work, the attacker would need to be in relatively close proximity to the target device, within a few feet. The researchers said that the attack can be mitigated by either modifying the microphone in a voice-controlled device to suppress inaudible voice commands or using a machine-learning approach in software to detect a recorded voice command.

Ultrasonic signals have been used in a number of other security and privacy related applications recently. Most notably, some mobile app developers have taken to embedding code in their apps that emit ultrasonic signals that can be picked up by other smart devices and used to track users across devices.