We are moving into an era defined by a conversational economy, characterized by an ecosystem where voice, not touch, is the main technology interface for customers. Connected through the IoT, voice recognition has allowed consumers control various aspects of their daily life through the use of voice-activated digital assistants like Google Home and Amazon Alexa. As voice takes over in the form of digital assistants leveraged by the IoT, expectations to support this growing popularity amongst consumers is placed on enterprises.
Derived from a long history of speech research, voice biometrics is a method used to authenticate a speaker by the characteristics of a speaker’s voice. Speaker identification (SI) deals with the problem of identifying the speaker from a given set based on an audio sample of the speaker. Audio is input of the unknown speaker and paired against a group of selected speakers. In the case there is a match found, the speaker’s identity is returned. This becomes very difficult very quickly when the number of records increases above several hundred. In contrast, speaker verification (SV) has the simpler task of verifying the claimed identity of a speaker from his or her voice. Thus, voice authentication is really focused on the SV problem.
When it comes to voice security, fraudsters may use a variety of artificial intelligence derived technology to create synthetic speech, or human-like speech generation, which can then be used to skip steps of authentication. For example, Lyrebird allows you to create realistic artificial voices by using as little as one minute of audio, with the potential to create higher quality synthetic voices with more data. Adobe Voco, dubbed the “photoshop for voice,” is another software that can be used to create synthetic speech. Taking approximately 20 minutes of audio, Voco can generate sound-alike speech, with the potential to be used by a fraudster.
Additionally, at Google I/O 2018, Google released Duplex, an AI system for the Google Assistant that can make real world calls and converse with a human being on behalf of a consumer to make restaurant reservations and haircut appointments. The Duplex demo during Google’s keynote demonstrated incredible improvements in speech recognition (speech-to-text technology), domain-specific natural language processing, and of course the text to speech capabilities that wowed the audience due to its synthetic speech generation.
Excitement of the technology, from Lyrebird to Google Duplex, sparked an ethical debate – in a race between AI and AI. What’s concerning is how malicious actors can now use human-like speech generation for fraudulent reasons – either to mask their own identity or worse, run “hack your throat” breaches – being able to simulate another person by synthesizing a victim’s voice, vocal tract, nasal cavities, and more by analyzing hours of open recorded audio.
Voice security can be defined as protection or detection of voice spoofing attacks, including record and replay attacks as well as synthetic speech detection, whereas voice authentication is the ability to recognize a speaker and can be used as a security function. Accuracy of voice biometrics can be broken down into false acceptance rates, false rejection rates, and equal error rates.
False Acceptance Rate (FAR) is the probability that the system incorrectly authenticates a non-authorized person, due to incorrectly matching a caller’s voice against a voiceprint for an account. The FAR is the percentage of invalid callers (ex. fraudsters) who are incorrectly authenticated. In addition, False Rejection Rate (FRR) is the probability that the system incorrectly rejects access to an authorized caller, due to failing to match the biometric input against a voiceprint for an account. The FRR is the percentage of valid callers who are incorrectly rejected.
Lastly, Equal Error Rate (EER) indicates that the proportion of false acceptances is equal to the proportion of false rejections. The lower the equal error rate value, the higher the accuracy of the biometric system. While the EER is the ultimate accuracy score for a biometric system, it’s misleading to rely on only the EER when evaluating a voice biometric solution. For example, a high-accuracy biometric system (~2% EER) literally means that 2 out of every 100 callers will be falsely accepted, and falsely rejected. No institution in their right mind would evaluate a biometric system that provided those stats. Most banks typically set FAR in the 0.1% range (1 in 1000).
With phone fraud rates rising, voice authentication and security have become an important focus for many enterprises – working to ensure they have the technology to distinguish a legitimate customer from a fraudster. Breaking out of traditional authentication solutions, voice verification encompasses recent technology and the movement towards a conversational economy and is the future when it comes to authentication and the ability to keep your customers safe.
Over the years, the need for and methods to authenticate customers have changed dramatically. Beginning with face-to-face interactions as a system for authentication, solutions expanded to include PINs, passwords, and knowledge based authentication questions, or KBAs. Even though these legacy solutions have kept authentication fairly easy for consumers, it adds a layer of friction and leaves enterprises open to fraud attacks – ultimately affecting the enterprises’ reputation with customers.
With the help from social media profiles, social engineering and the dark web, fraudsters are able to obtain detailed customer information, allowing them to manipulate their way past KBAs and into customer accounts via the phone channel. With a 113% rise in phone fraud year over year, enterprises are turning to voice biometrics to provide their customers with a higher quality protection against fraudsters.
Aside from the phone channel, voice holds true to the future of authentication. In January 2018, Apple announced the release date of their HomePod, and in that same announcement they announced that Siri is actively used on over half a billion devices. Gartner conducted consumer research between early November to late December in 2017 and found that more than one-third (36%) of all qualifying survey respondents (3000 surveyed) reported that they have used virtual personal assistant apps on their smartphones within the last three months, with 41% using the app several times a week. The growth in interest of voice assistants can also be seen in the explosive success of Amazon Alexa. Amazon has had a 266% growth in the number of Alexa skills developed in 2017. Between the close of 2017 to now, the growth has been substantial – according to Amazon’s last quarterly earnings report in April 2018, they now have over 40,000 skills in their ecosystem.
It took 13 years for televisions to reach 50 million viewers, 4 years for internet access to reach 50 million people, and 2 years for for smart speakers – paving the way to the future of voice and authentication.
Consumers became accustomed to voice technology starting with voice enabled smartphones and have recently adopted widely popular voice activated speakers and assistants like Amazon Alexa and Google Home. These assistants are simple speakers combined with a voice recognition system, allowing consumers to interact with them within their daily lives to change volume preferences in their cars, order movies on-demand, or even schedule appointments.
Aside from the widely known voice activated speakers, voice has become integrated with automobiles. With safety in mind, this integration was driven by the need to keep the public safe and create a safer reality for multi-tasking while driving. For example, drivers are able to adjust the settings of their home security systems, to communicate with their navigation system, and to tune their radio without taking their hands off the wheel.
With each advancement in voice-to-machine communication, the interaction becomes more human, expanding the types of opportunities for voice as an interface, and consequently, a new battlefield is emerging – expanding the potential for fraudster attacks and security breaches.
When it comes to security, the question isn’t as much about storing sensitive information or securing internet connections as it is about security surrounding the types of commands for your speakers. Commands like “unlock the door,” or “disable the alarm,” aren’t allowed by some vendors, just the option to lock – because the lack of security limits utility.