In a research paper 1co-authored with David Looney to be presented at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) in Seoul, South Korea on April 18, 2024, we answer the following question:  

When does the acoustic environment provide helpful information for presentation attack detection (PAD)?

In this blog, we begin by exploring the concept of a presentation attack, its detection, and the factors that make it discernible. Armed with these fundamental insights, we proceed to summarize the key findings of our research. 

What is a presentation attack?

A presentation attack consists of a voice being captured by a recording device r1 in room A, then played back and captured by a second recording device r2 in room B, as shown in Fig. 1. We only distinguish between rooms A and B to facilitate our further analysis. In practice, A and B could be in the same room. It is further assumed that device r2 is used for automatic speaker verification (ASV), and the objective of the replay is to make the ASV in r2 believe that it is the live speaker talking. 

 

 

This diagram illustrates a presentation attack scenario in which live speech is recorded in one room and replayed in another, highlighting the impact of room acoustics on liveness detection technology.Figure 1. Presentation attack: live talker in room A is recorded by mic. r1 and replayed in room B, captured by mic. r2.

 

Why can you detect a presentation attack?

There are four key factors that make it possible for live speech to be separated from replayed speech:

  1. Microphone characteristics (imperfections) of the recording device microphone
  2. Recorded speech format – sampling rate, compression, transmission artifacts
  3. Loudspeaker characteristics (imperfections) of the replay device
  4. Room acoustics of the recording room and the replay room

Does room acoustics help detection?

Now that we know what a presentation attack is and why it is possible to detect it, we look closer at the room acoustics component. Whenever speech is recorded at a distance from a microphone, it is affected by the room’s acoustics, which typically consist of noise and reverberation (due to reflections off walls and objects present in the room). In this part of the research, we focus on reverberation. 

Interestingly, the problem of replay and reverberation has been studied quite extensively in the seemingly unrelated fields of music and speech reproduction. Indeed, a recent paper 2 on this topic was published in the Journal of the Acoustical Society of America and inspired our work. 

One way to view the presentation attack vs live speech is that live speech undergoes the reverberation of one room. In contrast, the replayed speech passes through two reverberant settings (the recording room and the replay room). We exploit the fact that the reverberation of a room and a particular speaker-microphone configuration may be presented by its acoustic impulse response (AIR), showing the evolution of the reflected sound over time. In this way, we could study the separability of live versus replayed speech using properties of acoustic impulse responses that were completely independent of the speech signals.

From that point of view, both temporal and spectral properties of  AIRs are derived theoretically from various established results in room acoustics. We showed that the spectral metric known as the spectral standard deviation (SSD) of the AIR provides the best separation of live and replayed speech. But only when there are ‘sufficient’ distances between the audio source and the microphone at the time of recording and playback. What is ‘sufficient’ depends on the size of the rooms and on the reverberation of these rooms; all of these parameters are captured in what is known in room acoustics as the critical distance. 

These new theoretical insights deepen our understanding of room acoustics in PAD and pave the way for practical applications. For instance, we’ve developed a novel zero-shot convolutional neural network-based PAD approach that outperforms several baseline methods from the ASVspoof challenges. This approach is particularly powerful as it allows machine learning models to be trained without actual examples of replay attacks, demonstrating it when combined with others in PAD scenarios.

Conclusion

In conclusion, the role of room acoustics in PAD is not simple. It depends on various factors, such as the distances between the live talker and the microphone, the loudspeaker and the microphone in the replay, and the volumes and reverberation of the rooms. However, if all recording and playback occur above the so-called critical distance, reverberation alone can be a powerful tool for detecting a replay with high accuracy.  

Most importantly, the combination of data-driven machine learning development with domain-specific theoretical insights allows us to develop deepfake and replay attack detection methods that push the boundaries of the state-of-the-art.

Learn more about our liveness detection technology here.

Read the full report

 

1. N. D. Gaubitch and D. Looney, “On the role of room acoustics in audio presentation attack detection,” in Proc. ICASSP, pp. 906-910, Apr. 2024
2. A. Haeussler and S. van de Par, “Crispness, speech intelligibility, and coloration of reverberant recordings played back in another reverberant room (Room-in-Room),” J. Acoust. Soc. Am., vol. 145, no. 2, pp. 931–942, Feb. 2019.

Biometric authentication was once the most robust security measure. It used hard-to-impersonate biological traits like fingerprints to verify identity. However, nowadays, biometric authentication is not all secure. Fraudsters can bypass the authentication with 3D-printed masks, fake fingerprints, and eye replicas.

Biometric liveness detection helps seal the vulnerability of traditional biometric authentication. This security algorithm uses biometric data coupled with physiological responses like blinking to catch fraudsters.

What is Biometric Liveness Detection?

Biometric liveness detection verifies whether a biometric sample is from a live human. This security system has one principal purpose — it prevents the use of fake biometrics to impersonate or commit fraud.

Biometric liveness detection uses several cues to establish liveness; these include physiological responses like blinking and smiling. Moreover, this security system can use other cues like voice and skin texture to catch fraudsters. 

Like any security system, biometric liveness detection can be used across various sectors. Airports, financial institutions, and insurance providers can use it to enhance security. Moreover, border control and e-commerce facilities can use biometric liveness.

How Liveness Detection Works and Prevents Fraud

Biometric liveness detection uses a combination of technologies to unequivocally verify that the biometric data being presented originates from a live human being. Some of these technologies include:

1. Motion Analysis

Motion analysis is one of the technologies used in biometric liveness detection. This security algorithm analyzes various natural movements to establish liveness. For instance, in facial recognition, biometric liveness technology checks for various facial movements. 

These include blinking, smiling, or nodding the head. The advanced security measure can also use eye trackers to observe gaze direction. Images or videos used in impersonation cannot replicate these facial movements.

2. 3D Depth Sensing

3D depth sensing is another technology used in biometric liveness detection. 3D depth sensing uses technologies like time-of-flight cameras and laser scanners to determine whether a face is alive.

In particular, the 3D depth sensing technology uses facial shape to determine liveness. Moreover, 3D depth sensing can use the distance between the eyes and nose or lip curvature to establish if an individual is alive. 

3. Texture Analysis

Texture analysis is another method used to verify the liveliness of individuals. In this technique, a biometric system scrutinizes the unique patterns of an iris, face scan, or fingerprint. The patterns include fingerprint ridges valleys and iris crypts.

The characteristics mentioned above are inherent to a living person and absent in impersonation replicas. The biometric liveness detection system compares the results with expected properties to establish liveness.

4. Challenge-Response Tests

Some biometric liveness detection systems incorporate challenge responses to check for liveness. In this approach, the system prompts the user to perform specific requests that require real-time human reactions.

For instance, during facial verification, the biometric security systems might ask the subject to blink. Moreover, these advanced security algorithms can request an individual to nod or smile. Non-human entities cannot outmaneuver random requests.

Besides the actions, a biometric liveness detection system can request an answer to a question. The biometric detection systems use voice authentication to ascertain if the voice is from a live person.

5. Machine Learning

Machine learning is another technology that plays a pivotal role in biometric liveness. Security experts train ML models to spot signs of liveness in biometric samples. Some cues machine learning uses to authenticate a live sample include:

  • Blinking
  • Eyebrow movements
  • Pulse rate
  • Skin elasticity
  • Skin temperature
  • Voice

For instance, in facial detection, ML algorithms use color, texture, or blinking to determine liveness. Likewise, in the case of fingerprints, ML systems can analyze things like ridge quality and sweat pores to assess if a sample is live.

Machine learning algorithms can combine various authentication modalities to thwart spoofing. Advanced models can use voice, fingerprint, iris, and facial detection to cut the chances of impersonations. 

Types of Liveness Detection

Typically, biometric systems use two types of liveness detection — passive and active. Each type of liveness detection uses a different approach to catch fraudsters. The following is an overview of how each liveness detection method operates:

Passive Liveness Detection

Passive liveness detection determines liveness without prompting any action from the subject. These biometric systems use AI to look for common signs of biometric spoofing, including photos, videos, or masks.

Besides, passive liveness detection systems can check for signs of liveness to authenticate biometrics. For instance, the biometric system can use skin texture to verify if a sample is live. In addition to texture, passive liveness detection checks for liveness using the following aspects:

  • Color. Passive liveness detection compares the subject’s color to a reference to spot inconsistencies.
  • Depth. Besides color, passive liveness detection can assess the contours of the eyes, mouth, and nose to establish liveness.
  • Motion. Passive liveness detection systems can monitor natural facial motion patterns. These motion patterns occur when breathing, blinking, or talking.

Active Liveness Detection

Unlike passive detection, active liveness detection prompts users to perform specific actions during identity verification. In particular, the system issues random instructions to make it harder for fraudsters to bypass.

Some of the most common requests used in active liveness detection include:

  • Blinking. An active liveness detection system instructs users to blink when prompted. Afterward, the biometric system monitors for real-time eye movement to confirm liveness.
  • Facial gestures. Besides blinking, an active biometric system can prompt users to smile or nod during verification. Again, the biometric system monitors real-time facial expressions to verify liveness.
  • Voice commands. Sometimes, the active liveness biometric systems can ask the user to say specific phrases. Afterward, the system analyzes the voice inflections to ascertain if the voice is natural.

The Benefits of Liveness Detection for Contact Centers

Biometric liveness detection can be used across multiple industries. However, this security advancement has proven more valuable in contact center security. It ensures that only live and authorized individuals have access to sensitive customer information.

Apart from preventing unauthorized access, biometric liveness detection can offer the following benefits:

1. Reduced Risk of Data Breaches

Data breaches are rampant in contact centers using less sophisticated security measures. The breaches occur when fraudsters impersonate legitimate customers. In that event, agents disclose sensitive information unknowingly.

Biometric liveness detection can help eliminate these data breaches. The security algorithms use advanced authentication modalities like voice commands to stop impersonation. With this security system, contact centers can keep violations to a minimum.

2. Secured Self-Service

Self-service is a growing trend in customer support. This model allows customers to resolve issues independently, leading to shorter wait times and improved satisfaction. Furthermore, self-service frees up the hands of support staff.

Biometric liveness helps make self-service more secure. The algorithms verify the liveliness of users, ensuring that only authorized people get access to a customer account. This advanced security protects customers from fraudulent activities.

3. Lower Operational Costs

Investing in biometric liveness detection can help reduce a contact center’s running costs in many ways. For one, biometric liveness, especially voice authentication, reduces the need for manual identity validation. The automatic verification minimizes the need for live agents to verify identity.

Furthermore, biometric liveness reduces operational costs by blocking fraudulent activities. With the reduced exposure to fraud, entities won’t spend on reputation repair, compensation, and legal expenses.

4. Improved Customer Trust

Biometric liveness doesn’t just keep fraudsters off a contact center. This security measure can also help foster customer trust. The standard assures customers that a service provider treats their data with the utmost care.

As a result, the customers will trust the organization with their sensitive information. Moreover, biometric liveness reduces data breaches, an issue that could erode trust. Beyond trust, biometric liveness enhances loyalty, reduces churn, and boosts reputation.

5. Improved Compliance

Biometric liveness detection is a valuable tool in compliance. It enables contact centers to adhere to strict industry regulations set by various authorities like the Data Protection Act.

This advanced security measure helps verify the identity of clients before disclosing sensitive information. As a result, it protects support agents from revealing private information to impersonators.

The enhanced compliance doesn’t just save organizations from costly fines and legal expenses. It also helps the entities maintain a positive public image, which is crucial in the competitive business sphere.

6. Expedited Customer Verification

Biometric liveness does not only provide a higher level of security, but it also expedites the verification process. The algorithms used in this security technology can verify liveness in just a few seconds.

The expedited customer verification comes along with many benefits. It eliminates cumbersome and time-consuming knowledge-based questions, helping save time. Support agents can use the saved time on other profit-making business processes.

Use Pindrop to Improve Contact Center Security

Enhancing call center security doesn’t end after acquainting with disruptive security measures. But, you require the support of a service provider with a deep understanding of the intricate security requirements in this domain.

Our company, Pindrop, is one such partner in matters of contact center security. We use voice to fortify the authentication process so you keep fraudulent access bay. 

Request a demo to learn how we can help improve your contact center security. 

With the rise of deepfakes and sophisticated identity spoofing, it’s becoming increasingly difficult for companies to ensure that the person on the other end is who they claim to be. Identity verification has always been of paramount importance in sensitive industries like finance and security, but as fraud becomes more convoluted, it’s imperative that organizations take action quickly. That’s where biometric liveness detection comes in. 

What is Biometric Liveness Detection?

Biometric liveness detection ensures that the biometric features being presented to a system—whether it’s a fingerprint, facial scan, voiceprint, or even an iris pattern—are genuinely from the living individual and not from a fraudulent representation. 

For instance, in facial recognition systems, liveness detection could help differentiate between a real face and a high-quality photograph or a 3D mask of a face.

However, biometric liveness detection is becoming increasingly popular in contact centers too. It provides an advanced layer of security to ensure the authenticity of user interactions. It is typically integrated into a multifactor authentication (MFA) process or as an enhancement to existing security protocols.

Using attack detection vectors, voice biometrics software can be used to create a distinction between actual speakers and audio recordings, or even audio deepfakes. This helps protect against identity spoofing amongst other things.

How Does Biometric Liveness Detection Work?

Depending on the biometric modality, there are various liveness detection techniques that can be used. Here’s a brief overview of how biometric liveness detection works:

Voice Recognition

Voice biometric liveness detection is specifically tailored to ensure that the voice data presented during an authentication process is genuine and not a recording or synthetic voice. Recognizing and verifying a live voice, as opposed to a playback or synthesized voice, involves a combination of sophisticated techniques. These include:

Detecting background noise: Features like the Signal-to-Noise Ratio (SNR) can be utilized to differentiate between genuine and spoofed samples when analyzing consistency in background noises.

Analyzing natural variability: A genuine voice has slight variations, even when someone tries to repeat the same phrase identically. By analyzing these natural fluctuations in pitch, tone, cadence, and other voice characteristics, the system can determine the authenticity of the voice.

Spectral analysis: A voice signal can be decomposed into its constituent frequencies using Fourier Transform or a Wavelet Transform. Live voices and recordings might exhibit differences in their spectral characteristics, especially if the recording is played back through a secondary medium, like a speaker or a phone. Spectograms can be used to detect anomalies in recorded voices.

Temporal features: Time-domain features like Zero-Crossing Rate (ZCR) and energy contours can provide valuable information. For instance, the ZCR in a recorded voice played back through speakers might differ from a live voice.

Cepstral analysis: Mel-Frequency Cepstral Coefficients (MFCCs) are commonly used in speech and audio processing. Differences in MFCC values between genuine and spoofed voices can be indicators of non-liveness.

Deep learning and neural networks: Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) can be trained on large datasets containing both genuine and spoofed voice samples. Once trained, these networks can predict with high accuracy whether a given voice sample is live or spoofed.

Facial Recognition

Biometric liveness detection is also used for facial recognition. It can analyze patterns related to natural eye blinking, depth perception, or even the natural response to external stimuli, such as light, to determine the presence of a live person. 

For companies looking to shore up their defense strategy against deepfakes, using liveness detection techniques is extremely important.

The Benefits of Biometric Liveness Detection for Call Centers

Biometric liveness detection offers many advantages for call centers, including enhanced security and user experience, and greater operational efficiency for banks and other businesses. 

Improved Security and Fraud Detection

We’ve already discussed how bad actors and cybercriminals are becoming increasingly sophisticated. In most cases, traditional security measures just don’t cut it anymore. 

Biometric liveness detection ensures that the voice or facial data presented is from a genuine, live person and not a recording or synthetic reproduction. 

By adding this layer of authentication, call centers can drastically reduce the potential for unauthorized access or fraudulent activities, ensuring the security of both the company and its clients.

Faster, More Efficient Authentication Process

Contact centers today are all about efficiency. Agents are trained to process and handle as many calls as possible, with many using specific KPIs to track productivity. 

But, if you’ve ever called your bank, you probably know just how arduous the authentication process can be. Traditional authentication methods included asking for a T-PIN or requiring customers to answer a series of verification questions. 

As you can imagine, requiring customers to remember multiple passwords or answer an array of security questions can be cumbersome and time-consuming. Biometric liveness detection offers a faster and more intuitive authentication process. 

All a person has to do is speak, and the software uses voice profiling and recognition to determine if the person on the other end is actually genuine or not. Banking fraud detection technologies are quite sophisticated, which is why they require such extensive authentication. 

Improved Customer Experience

The benefits outlined above ultimately culminate in a significantly improved customer experience. In today’s world, people value quick and hassle-free interactions. 

By integrating liveness detection, call centers can provide an authentication method that’s not only secure but also user-friendly. Eliminating the need for customers to recall passwords or PINs, or answer multiple security questions, results in a smoother and more positive customer experience.

Operational Cost Savings

While the initial investment in biometric liveness detection technology might seem substantial, the long-term cost savings can be significant. 

By reducing call times through quicker authentication and cutting down on fraud-related losses, call centers can achieve substantial operational cost reductions. Additionally, the reduction in fraud also means fewer resources spent on fraud investigation and resolution.

Fostering Customer Trust

Data breaches or different types of fraud attempts can ruin a company’s goodwill and have a significant impact on their bottom line. 

Companies that demonstrate a commitment to advanced security measures can strengthen their reputation and trustworthiness in the eyes of customers. By adopting biometric liveness detection, call centers convey that they prioritize customer security, which can bolster brand loyalty and trust.

Pindrop Helps Companies Improve Security and Efficiency

Pindrop’s Deep Voice™ Biometric Engine is the leader in voice recognition biometrics. The platform makes it easy to detect fraudsters and ensure maximum security with neural network-based voice recognition. 

It allows companies to match a caller’s voice to an established voiceprint, confirming their identity immediately. Its advanced anomaly detection system ensures that any playback tricks or voice alterations are detected, thus ensuring interactions with actual humans. Request a demo today to learn more.

Biometrics is the automated recognition of individuals using unique characteristics of one’s identity to do so. The most common spoofing attack is within emails, but there are many others as fraudsters get more savvy to replicate one’s identity. And in a recent study, 80% of hacking-related breaches still involve compromised and weak credentials. 

So what can individuals and companies do to protect themselves better when extortion of over 33 million records is expected to occur by 2023, and ransomware or phishing attacks occur every 11 seconds? The answer could be biometric liveness detection.

What is Biometric Liveness Detection?

Biometric liveness detection combines those individual characteristics of one’s identity that can be hacked with the ability to use extra layers to ensure facial and voice detection is more accurate. It involves using all the unique characteristics an individual holds with additional layers of recognition to ensure accuracy, making it more complex for spoofing to occur.

How Biometrics Liveness Detection Helps in Identity Proofing

Liveness detection prevents biometric spoofing by using an authentication process that verifies whether the user is a live person. As Pindrop has found in many of its technologies, like deepfake detection, technology must evolve quickly to ensure that machines are much better at biometric fraud detection than humans. 

Here are five steps to understanding how biometrics liveness detection prevents spoofing.

Step 1: Learn About Liveness Detection in Biometrics Basics

Liveness detection is used to detect the spoof attempt by determining whether or not it’s an actual human or a fake in real time. Biometrics is the automated recognition of individuals using unique physical characteristics. Here’s how the two work together to create added security within technology using the example of voice biometrics.

How Liveness Detection Helps in Voice Recognition Biometrics

One in 857 calls analyzed by Pindrop were identified as fraudulent. This represented a 40% increase in fraudulent activity in just 12 months and should alarm any financial or other institution looking to protect its assets. But what is voice biometrics exactly? It’s a technology that verifies the identity of the speaker. Liveness detection determines in real-time whether a call is legitimate through voice authentication.

Liveness Detection and Facial Recognition Together

Voice recognition biometrics is becoming extremely efficient and powerful at detecting and preventing spoofing. Machines proved more effective than humans in tests of all five types of images, scoring 0% error rates across all 175,000 images. Computers were ten times quicker to recognize a photo of a live person versus a spoof. 

While conversely, it took humans 4.8 seconds per image to determine liveness, it only took computers .5 seconds per image. This provides strong evidence for organizations to trust automation to prevent fraud while keeping company efficiency high. Employees can then focus on more severe or unique fraud attempts at the business instead.

Step 2: Understand Biometric Liveness Detection Methods

The second step in understanding how biometric liveness works to prevent spoofing is understanding the active versus passive liveness detection categories. The fundamental difference between the two is that active liveness performs a series of ‘challenge-response’ actions. In contrast, passive liveness conducts a series of checks without any awareness from the user. 

What is Active Liveness Detection?

Active liveness detection determines whether the face or voice presented is a natural person, requiring the user to input more information or challenging them in a series of areas. They prompt the user to perform actions that cannot easily be spoofed. For instance, multifactor authentication is an example of a series of factors the user needs to do before providing access.

What is Passive Liveness Detection?

Passive liveness detection occurs more naturally in the background without any user input. This could be done using algorithms to determine identity image testing, such as skin and border textures or other means to determine if it is not a spoof. There are also crucial indicators machines can pick up to quickly choose false representation in this way where human input could not.

Step 3: Realize the Benefits of Liveness Detection for Contact Centers

Previous data shows that the rate of phone fraud in corporate call centers can jump up to 45 percent in just a few years. And if one in every 1700 calls was a fraudster — those calls can cost organizations as much as $27M annually.

4 Benefits of Liveness Detection Within Call Centers

  • Preventing Spoofing Attacks in Contact Centers

Before 2020, call centers typically saw fraud rates of one out of every 770 calls, but in 2020, the ratio rose to one out of 1,074. This rise is nuanced but begins with how call center activity has changed in the past two years. For instance, some call centers saw calls increase by 800% and last 14% longer than pre-pandemic rates. Some argue that it was impossible to interact in person through various protocols that came with a nationwide pandemic. Today, this requires more layers of security to create efficiency as call centers get flooded with higher calling rates.

  • Improving Multifactor Authentication

One way to create this added layer is through multifactor authentication. It means utilizing voice biometric authentication, which includes various data points to ensure the caller is genuine. This could entail voice, device, and behavior as three common data points. Machine learning is also adding extra layers as security gets more personalized.

  • Saving Time and Money

Liveness detection in call centers also keeps cost per call low by ensuring the time agents are on the phone and improving customer experience through greater personalization. The more machines can do to detect spoofing before it happens, the higher the likelihood that personnel can focus on other areas with higher importance to the business.

  • Productivity Gained Due to Faster Call Handling Times

The more seamless your contact center, the higher the customer experience and satisfaction overall. Voice biometrics can make a big difference in doing so.

Step 4: Implement AI and Machine Learning to Improve Liveness Detection for Your Business

Various options within Pindrop greatly help prevent fraudsters from getting through and spoofing one’s identity. One example is through call verification scores. This eliminates any spoof risk through validation data and a PIN score to provide your team with a green, red, or grey assessment. Another is analyzing data from call history, telcos, proprietary research, and intelligence derived from over 5 billion calls. 

Ensure you have a solution that can prevent any fraud before it happens. And in the meantime, educate across the business on the latest solutions in anti-fraud techniques to ensure all of your employees are up to date.

Amidst the advancements in voice biometrics technology, recent strides in generative AI have raised concerns about the performance of voice authentication. Deepfakes, capable of mimicking anyone’s voice with remarkable realism, have emerged as a prevailing threat to speaker verification systems. At Pindrop, our unwavering commitment to combating voice fraud sets us apart as industry-leading experts. In this article, we’ll explore and answer the questions raised against voice biometrics by the University of Waterloo study. Continue reading to understand how Pindrop’s cutting-edge Liveness Detection system surpasses all others, effectively mitigating the risks posed by signal-modified deepfakes.

Questions raised against voice biometrics

Voice anti-spoofing detection systems, also known as countermeasures (CM), have been developed to detect and thwart deepfake attempts. Recent industry developments have posed two questions regarding the ability of CM systems to address emerging challenges. First, whether CM systems struggle to identify synthetic content from new Text-To-Speech (TTS) systems, making zero-day attacks harder to detect. Second, tell-tale signs left by TTS systems in synthetic audio can be masked through signal modifications, rendering synthetic content virtually undetectable by CM systems. 

Pindrop answered the first question by showcasing how Pindrop’s system is effective at detecting zero-day attacks created using Meta’s new Voicebox system [link]. The University of Waterloo published a study [link] on the second topic, which we have addressed below.

About University of Waterloo’s study

Researchers at the University of Waterloo undertook a study to address the impact of signal modifications applied to synthetic audio, aimed at bypassing countermeasures. According to the study, TTS systems leave behind tell-tale signs in the synthetic audio they generate. CM systems identify whether the audio is synthetic or live depending on these tell-tale signs. 

Waterloo team’s thesis is that malicious actors can remove these tell-tale signs by applying certain signal masking modifications. They conducted experiments with 7 signal modifications to machine speech, aiming to erase the distinctions between genuine and machine-generated speech, thereby bypassing countermeasures. These signal modifications included:

  1. Replacing leading and trailing silences with silence from genuine audio
  2. Removing inter-word redundant silences in the machine speech utterance
  3. Spectral modification to boost the center of the speech spectrum
  4. Adding an echo effect
  5. Applying pre-emphasis
  6. Noise reduction to eliminate unnatural noise in machine audio
  7. Adversarial speaker regularization

 

The resulting signal-modified deepfakes are difficult to detect by the CM systems that rely on identifying these signals in the first place. The results of the study indicated that certain signal modifications could deceive specific combinations of Automatic Speaker Recognition (ASV) and CM systems, with success rates ranging from 9.55% to 99%.

The research conducted by the team at the University of Waterloo shed light on the potential challenges that countermeasures face in detecting these modified synthetic utterances. It underscored the need for advanced and resilient solutions like Pindrop’s Liveness Detection system as highlighted below.

Pindrop’s response and test results

At Pindrop, we recognize the potential risks associated with signal-modified deepfakes. To validate this, we reproduced the signal modifications used in the Waterloo study and rigorously tested our system against them. The results were significant, as our system successfully detected the deepfakes, outperforming even the best ASV+CM system used by the Waterloo team.

Our Liveness Detection system demonstrated remarkable performance against adversarially modified spoofed utterances. Comparing the detection accuracy with the best systems from the Waterloo paper, our system significantly outperformed by a good margin on all modifications. Additionally, when combined with voice authentication, our accuracy on full attacks (F1-F7) soared from 98.3% to an unmatched 99.2%. This exceptional accuracy showcases the effectiveness and reliability of Pindrop’s solution in mitigating the risks posed by signal-modified deepfakes.

This table shows Pindrop’s Liveness Detection accuracy compared to the worst and best systems.

Attack type Worse reported FAR in the Waterloo paper Best reported System in the Waterloo paper Pindrop’s System
F1 84.4% 95.2% 99.2%
F1-F2 58.4% 96.6% 99.2%
F1-F3 56.5% 95.4% 99.6%
F1-F4 53.0% 94.5% 99.8%
F1-F5 46.0% 92.0% 99.8%
F1-F6 42.2% 92.0% 97.3%
F1-F7 (All Attacks) 38.2% 88.0% 98.3%

What does this mean for call center teams?

In the pursuit of enhanced security measures, Pindrop’s Liveness Detection system emerges as a powerful ally for call center teams. Our system’s strength lies in its sophisticated technology, extensive training on diverse datasets, and advanced signal processing capabilities. 

How Pindrop’s Liveness Detection can build you an impenetrable defense

We take pride in our system’s performance, as it achieved a remarkable 99.2% detection success rate against the attacks replicated from the study. Moreover, our sophisticated approach, trained on diverse spoofed audio with extensive data augmentation, empowers our system to perform exceptionally well in real-world scenarios, even against zero-day attacks.

The Waterloo study is essential as it demonstrates the feasibility of new attacks that eliminate differences between genuine and machine speech. It underscores the need for constant innovation to outpace malicious actors. The above results prove how Pindrop’s profound research expertise can mitigate current and future voice authentication attacks. In addition, Pindrop’s emphasis on a multi-factor authentication system that combines voice biometric authentication with deepfake liveness detection ensures heightened security for our customers. By leveraging acoustic cues, behavioral cues, and other metadata, our system becomes more robust and reliable in detecting voice fraud.

As we continue testing against the Waterloo data set and collaborate closely with research teams, we remain committed to staying vigilant against emerging threats. At Pindrop, we’re dedicated to delivering innovative solutions and protecting valuable resources, contributing to our position as leaders in voice authentication fraud protection.

We’d like to thank the Waterloo research team for their insight and assistance in replicating the attacks from their study for our testing purposes. Pindrop welcomes the opportunity to collaborate with research teams across academia and industry to further improve voice authentication and deepfake detection. 

3 Key Takeaways

  • The Waterloo study highlights the threat of new attacks that remove differences between genuine and machine speech and demonstrates the need for constant innovation to stay ahead of malicious actors. 
  • Pindrop detected these deepfakes with 99.2% accuracy which surpassed all other solutions tested while proving our effectiveness at detecting signal-modified deepfakes.
  • Pindrop’s multi-factor system, including voice biometrics, liveness detection, behavior analysis, and device authentication, effectively defends against deepfake fraud in call centers and beyond.

 

Voice security is
not a luxury—it’s
a necessity

Take the first step toward a safer, more secure future
for your business.