Returns are a standard part of retail, but they’re not without risks. Fraudulent returns can cost businesses a significant amount of losses annually. While restricting returns might seem like the only way to fight against retail fraud, there are better ways to help reduce fraud losses that don’t sacrifice the customer experience. 

Leveraging an advanced voice biometrics analysis solution can help protect customer accounts, spot fraudulent returns, and streamline the call experience. This article will explore the types of return fraud and how to combat it with advanced voice security.

Understanding return fraud

Return fraud involves customers exploiting return policies for personal gain. It comes in various forms, from returning stolen items to abusing liberal return policies. 

According to the National Retail Federation, return fraud costs billions annually and contributes to operational inefficiencies. Retailers often face challenges balancing customer satisfaction with fraud detection.

The most common types of fraud in retail include:

  • Receipt fraud: Customers use fake receipts or receipts from other items to return merchandise
  • Wardrobing: Buying an item, using it briefly, and returning it as “new”
  • Stolen goods returns: Returning stolen goods for refunds or store credits
  • Refund fraud: Manipulating the system to receive more than the value of the returned item

What is voice biometrics in retail?

Voice biometrics is a technology that identifies individuals based on unique vocal characteristics. It analyzes various features of a person’s voice, such as pitch, tone, and rhythm.

This technology can help protect retail contact centers from refund fraud, offering a secure and efficient means of verifying customer voices during transactions, including returns.

Unlike traditional authentication methods, such as passwords, voice biometrics provide an additional layer of security by leveraging something inherently unique to each individual—their voice. When used in tandem with other authentication factors, this advanced technology can assist retailers in combating fraudulent returns while helping create a faster and simpler returns process.

How voice biometrics can detect return fraud

Voice biometric analysis brings multiple benefits to retailers, helping to reduce fraud and improve operational efficiency. 

Real-time authentication

With voice biometrics, you can authenticate customers in real-time, helping to ensure that the person initiating a return is the purchaser. This technology can be particularly useful in contact centers, where authenticating customers through traditional methods is more challenging.

By using multifactor authentication, stores can drastically reduce fraudulent return attempts. This process also minimizes disruptions for genuine customers, maintaining a smooth and efficient return experience.

Fraud detection

Voice biometrics can identify suspicious behavior patterns by the individual attempting the return.

Multifactor authentication 

You can use voice biometrics as part of a multifactor authentication (MFA) approach, combining content-agnostic voice verification with other verification methods like PINs or SMS codes. 

With this approach, even if one method fails, or if some credentials are lost or stolen, you still have a method to detect fraudulent activity.

Secure transactions

Voice biometrics can help create a secure environment for customers during their transactions. Once the system receives authentication information on the customer, it can securely process the return, significantly reducing the chances of refund fraud. This helps protect the retailer from loss and can provide customers with peace of mind, knowing their information is securely handled.

Accelerating return transactions

When using traditional authentication methods, customers can often find the process tedious. Voice biometrics help speed up return transactions, as customers can skip more lengthy verification procedures.

This helps create a faster, hassle-free return process, contributing to a better overall customer experience.

Data protection

Retailers can use voice biometrics to enhance data protection protocols, maintaining their consumers’ trust.

Implementing voice biometrics in your retail system

Integrating voice biometrics into your retail system in a way that’s effective and user-friendly requires careful planning.

Evaluate current systems 

Start by evaluating your existing return processes and fraud detection strategies. Understanding where current vulnerabilities lie will help identify how voice biometric analysis can fill those gaps.

Select a reliable voice biometrics solution provider

Partnering with a reliable voice biometrics provider is crucial. Look for vendors with experience in retail security, a track record of success, and robust data protection measures.

Integrate voice biometrics seamlessly into retail systems

Ensure that voice biometrics integrate smoothly with your existing retail systems. This will reduce disruption during the implementation phase and allow both customers and staff to adapt quickly to the new system.

Train staff on using voice biometrics system 

Training your staff members on how to use the voice biometrics system effectively is critical. Otherwise, no matter how good the technology is, there’s an increased risk of human error that could eventually lead to return fraud. 

Training should include knowing when and how to use the technology and troubleshooting potential issues to prevent delays in the returns process.

Monitor system performance and optimize processes 

After implementation, regularly monitor the system’s performance to ensure it functions as expected. Make necessary adjustments to optimize the system’s capabilities and improve its accuracy and efficiency in supporting fraud prevention efforts. 

Additional benefits of voice biometrics in retail

Beyond helping prevent return fraud, voice biometrics offer additional advantages that enhance the overall retail experience.

  • Reduced fraud costs: By minimizing fraudulent returns, retailers can significantly reduce the financial losses associated with them. This helps merchants optimize their operations, improve profitability, and focus resources on serving genuine customers.
  • Convenience: Voice biometrics streamline the return process by eliminating the need for physical IDs or receipts. Customers can complete their returns quickly and easily, leading to a better shopping experience.
  • Trust and loyalty: Implementing voice biometrics builds trust with customers, as they feel confident that their identities and transactions are secure. This increased level of trust enhances customer loyalty and encourages repeat business.
  • Transparency: Maintaining transparency with customers about the use of voice biometrics for fraud detection can foster confidence. Clear communication regarding how voice analysis is used will help consumers understand the purpose and benefits of this technology.

Adopt a voice biometrics solution to help prevent return fraud

Return fraud is a serious issue affecting retailers worldwide, leading to losses of billions of dollars each year. While strict return policies may be somewhat helpful, retailers need to find better, customer-friendly alternatives. One such approach is voice biometrics, which offers additional defenses against fraudulent returns while improving the customer experience.

Voice biometric solutions can help merchants secure their return processes, reduce fraud costs, and build stronger relationships with customers. Adopting such a technology may seem like a significant shift, but its long-term benefits, both in fraud detection and customer trust, make it the perfect choice for small and large retailers.

More and more incidents involving deepfakes have been making their way into the media, like the one mimicking Kamala Harris’ voice in July 2024. Although AI-generated audio can offer entertainment value, it carries significant risks for cybersecurity, fraud, misinformation, and disinformation.

Governments and organizations are taking action to regulate deepfake AI through legislation, detection technologies, and digital literacy initiatives. Studies reveal that humans aren’t great at differentiating between a real and a synthetic voice. Security methods like liveness detection, multifactor authentication, and fraud detection are needed to combat this and the undeniable rise of deepfake AI. 

While deep learning algorithms can manipulate visual content with relative ease, accurately replicating the unique characteristics of a person’s voice poses a greater challenge. Advanced voice security can detect real or synthetic voices, providing a stronger defense against AI-generated fraud and impersonation. 

What is deepfake AI?

Deepfake AI is synthetic media generated using artificial intelligence techniques, typically deep learning, to create highly realistic but fake audio, video, or images. It works by training neural networks on large datasets to mimic the behavior and features of real people, often employing methods such as GANs (generative adversarial networks) to improve authenticity.

The term “deepfake” combines “deep learning” and “fake content,” showing the use of deep learning algorithms to create authentic-looking synthetic content. These AI-generated deepfakes can range from video impersonations of celebrities to fabricated voice recordings that sound almost identical to the actual person.

What are the threats of deepfake AI for organizations?

Deepfake AI poses serious threats to organizations across industries because of its potential for misuse. From cybersecurity to fraud and misinformation, deepfakes can lead to data breaches, financial losses, and reputational damage and may even alter the public’s perception of a person or issue.

Cybersecurity 

Attackers can use deepfake videos and voice recordings to impersonate executives or employees in phishing attacks. 

For instance, a deepfake voice of a company’s IT administrator could convince employees to disclose their login credentials or install malicious software. Since humans have difficulty spotting the difference between a genuine and an AI-generated voice, the chances of a successful attack are high.

Voice security could help by detecting liveness and using multiple factors to authenticate calls. 

Fraud 

AI voice deepfakes can trick authentication systems in banking, healthcare, and other industries that rely on voice verification. This can lead to unauthorized transactions, identity theft, and financial losses.

A famous deepfake incident led to $25 million in losses for a multinational company. The fraudsters recreated the voice and image of the company’s CFO and several other employees. 

They then proceeded to invite an employee to an online call. The victim was initially suspicious, but seeing and hearing his boss and colleagues “live” on the call reassured him. Consequently, he transferred $25 million into another bank account as instructed by the “CFO.”

Misinformation

Deepfake technology contributes to the spread of fake news, especially on social media platforms. For instance, in 2022, a few months after the Ukraine-Russia conflict began, a disturbing incident took place. 

A video of Ukraine’s President Zelenskyy circulated online, where he appeared to be telling his soldiers to surrender. Despite the gross misinformation, the video stayed online and was shared by thousands of people and even some journals before finally being taken down and labeled as fake.

With AI-generated content that appears credible, it becomes harder for the public to distinguish between real and fake, leading to confusion and distrust.

Other industry-specific threats

The entertainment industry, for example, has already seen the rise of deepfake videos in which celebrities are impersonated for malicious purposes. But it doesn’t stop there—education and even everyday business operations are vulnerable to deepfake attacks. For instance, in South Korea, attackers distributed deepfakes targeting underaged victims in an attack that many labeled as a real “deepfake crisis.”

The ability of deepfake AI to create fake content with near-perfect quality is why robust security systems, particularly liveness detection, voice authentication, and fraud detection, are important.

Why voice security is essential for combating deepfake AI

Voice security can be a key defense mechanism against AI deepfake threats. While you can manipulate images and videos to a high degree, replicating a person’s voice with perfect accuracy remains more challenging.

Unique marker

Voice is a unique marker. The subtle but significant variations in pitch, tone, and cadence are extremely difficult for deepfake AI to replicate accurately. Even the most advanced AI deepfake technologies struggle to capture the complexity of a person’s vocal identity. 

This inherent uniqueness makes voice authentication a highly reliable method for verifying a person’s identity, offering an extra layer of security that is hard to spoof. 

Resistant to impersonation

Even though deepfake technology has advanced, there are still subtle nuances in real human voices that deepfakes can’t perfectly mimic. That’s why you can detect AI voice deepfake attempts by analyzing the micro-details specific to genuine vocal patterns.

Enhanced fraud detection

Integrating voice authentication and liveness detection with other security measures can improve fraud detection. By combining voice verification with existing fraud detection tools, businesses can significantly reduce the risks associated with AI deepfakes.

For instance, voice security systems analyze various vocal characteristics that are difficult for deepfake AI to replicate, such as intonation patterns and micro-pauses in speech. These systems can then catch these indications of synthetic manipulation.

How voice authentication mitigates deepfake AI risks

Voice authentication does more than just help verify identity—it actively helps reduce the risks posed by deepfake AI. Here’s how:

Distinct voice characteristics

A person’s voice has distinct characteristics that deepfake AI struggles to replicate with 100% accuracy. By focusing on these unique aspects, voice authentication systems can differentiate between real human voices and AI-generated fakes.

Real-time authentication

Voice authentication provides real-time authentication, meaning that security systems can detect a deepfake voice as soon as an impersonator tries to use it. This is crucial information for preventing real-time fraud attempts.

Multifactor authentication

Voice authentication can also serve as a layer in a multifactor authentication system. In addition to passwords, device analysis, and other factors, voice adds an extra layer of security, making it harder for AI deepfakes to succeed.

Enhanced security measures

When combined with other security technologies, such as AI models trained to detect deepfakes, voice authentication becomes part of a broader strategy to protect against synthetic media attacks and fake content.

Implementing voice authentication as a backup strategy

For many industries—ranging from finance to healthcare—the use of synthetic media, such as AI-generated voices, has increased the risk of fraud and cybersecurity attacks. To combat these threats, businesses need to implement robust voice authentication systems that can detect and help them mitigate deepfake attempts.

Pindrop, a recognized leader in voice security technology, can offer tremendous help. Our solutions come with advanced solutions for detecting deepfake AI, helping companies safeguard their operations from external and internal threats.

Pindrop® Passport is a robust multifactor authentication solution that allows seamless authentication with voice analysis. The system analyzes various vocal characteristics to verify a caller. 

In real-time interactions, such as phone calls with customer service agents or in financial transactions, Pindrop® Passport continuously analyzes the caller’s voice, providing a secure and seamless user experience.

Pindrop® Pulse™ Tech goes beyond basic authentication, using AI and deep learning to detect suspicious voice patterns and potential deepfake attacks. It analyzes content-agnostic voice characteristics and behavioral cues to flag anomalies, helping organizations catch fraud before it happens. 

Pindrop® Pulse™ Tech provides an enhanced layer of security and improves operational efficiency by spotting fraudsters early in the process. For companies that regularly interact with clients or partners over the phone, this is an essential tool for detecting threats in real time. 

For those in the media, nonprofits, governments, and social media companies, deepfake AI can pose even more problems, as the risk of spreading false information can be high. Pindrop® Pulse™ Inspect offers a powerful solution to this problem by providing rapid analysis of audio files to detect synthetic speech. 

The tool helps verify that content is genuine and reliable by analyzing audio for liveness and identifying segments likely affected by deepfake manipulation. 

The future of voice security and deepfake AI

As deepfake AI technologies evolve, we need appropriate defense mechanisms.

Voice authentication is already proving to be a key factor in the fight against deepfakes, but the future may see even more advanced AI models capable of detecting subtle nuances in synthetic media. With them, organizations can create security systems that remain resilient against emerging deepfake threats.

Adopt a voice authentication solution today

Given the rise of deepfake AI and its growing threats, now is the time to consider implementing voice security in your organization’s security strategy. 

Whether you’re concerned about fraud or the spread of misinformation, voice authentication provides a reliable, effective way to mitigate the risks posed by deepfakes.

Often, technological advances in the healthcare industry are viewed in a positive light. Faster, more accurate diagnoses, non-invasive procedures, and better treatment support this view. More recently, artificial intelligence (AI) has improved diagnostics and patient care by assisting in the early detection of diseases like diabetic retinopathy. But these same technologies made room for a new, alarming threat: deepfakes.

As GenAI becomes more accessible, deepfakes in healthcare are increasingly prevalent, posing a threat to patient safety, data security, and the overall integrity of healthcare systems.

What are deepfakes in the healthcare industry? 

“Deepfakes in healthcare” refers to the application of AI technology to create highly realistic synthetic data in the form of images, audio recordings, or video clips within the healthcare industry.

Audio deepfakes that reproduce someone’s voice are emerging as a specific threat to healthcare because of the industry’s dependence on phone calls and verbal communication. Whether used to steal patient data or disrupt operations, audio deepfakes represent a real and growing danger.

AI deepfakes are a growing threat to healthcare

Deepfake technology being used to steal sensitive patient data is one of the biggest fears at the moment, but it is not the only risk present. Tampering with medical results, which can lead to incorrect diagnoses and subsequent incorrect treatment, is another issue heightened by the difficulty humans have spotting deepfakes.

A 2019 study generated deepfake images of CT scans, showing tumors that were not there or removing tumors when these were present. Radiologists were then shown the scans and asked to diagnose patients.

Of the scans with added tumors, 99% were deemed as malignant. Of those without tumors, 94% were diagnosed as healthy. To double-check, researchers then told radiologists the CT scans contained an unspecified number of manipulated images. Even with this knowledge in mind, doctors misdiagnosed 60% of the added tumors and 87% of the removed ones.

Attackers can also use GenAI to mimic the voices of doctors, nurses, or administrators—and potentially convince victims to take actions that could compromise sensitive information.

Why healthcare is vulnerable to deepfakes

While no one is safe from deepfakes, healthcare is a particularly vulnerable sector because of its operations and the importance of the data it works with.

Highly sensitive data is at the core of healthcare units and is highly valuable on the black market. This makes it a prime target for cybercriminals who may use deepfake technology to access systems or extract data from unwitting staff.

The healthcare industry relies heavily on verbal communication, including phone calls, verbal orders, and voice-driven technology. Most people consider verbal interactions trustworthy, which sets the perfect stage for audio deepfakes to exploit this trust.

Plus, both healthcare workers and patients have a deep trust in medical professionals. Synthetic audio can perfectly imitate the voice of a doctor, potentially deceiving patients, caregivers, or administrative staff into taking harmful actions.

How deepfakes can threaten healthcare systems

Deepfakes, especially audio-based ones, pose various risks to healthcare systems. Here are four major ways these sophisticated AI fabrications can threaten healthcare.

1. Stealing patient data

Healthcare institutions store sensitive personal data, including medical histories, social security numbers, and insurance details. Cybercriminals can use audio deepfakes to impersonate doctors or administrators and gain unauthorized access to these data repositories. 

For example, a deepfake of a doctor’s voice could trick a nurse or staff member into releasing confidential patient information over the phone, paving the way for identity theft or medical fraud.

2. Disrupting operations

Deepfakes have the potential to cause massive disruptions in healthcare operations. Imagine a fraudster circulates a deepfake of a hospital director, instructing staff to delay treatment or change a protocol.

Staff might question the order, but that can cause a disruption—and when dealing with emergencies, slight hesitations can lead to severe delays in care.

3. Extortion

Scams using deepfake audios are sadly not uncommon any more. Someone could create a fraudulent audio recording, making it sound like a healthcare professional is involved in unethical or illegal activities.

They can then use the audio file to blackmail the professionals or organizations into paying large sums of money to prevent the release of the fake recordings.

4. Hindered communication and trust

Healthcare relies on the accurate and timely exchange of information between doctors, nurses, and administrators. Deepfakes that impersonate these key figures can compromise this communication, leading to a breakdown of trust. 

When you can’t be sure the voice you’re hearing is genuine or the results you’re looking at are real, it compromises the efficiency of the medical system. Some patients might hesitate to follow medical advice, while doctors might struggle to distinguish between legitimate communications and deepfakes.

Protecting healthcare systems from deepfakes

Healthcare deepfakes are a threat to both patients and healthcare professionals. So, how can we protect healthcare systems? Here are a few important steps.

Taking proactive measures

Catching a deepfake early is better than dealing with the consequences of a deepfake scam, so taking proactive measures should be your first line of defense. One of the most useful tools in combatting deepfakes is voice authentication technologies like Pindrop® Passport, which can analyze vocal characteristics like pitch, tone, and cadence to help verify a caller. 

Investing in an AI-powered deepfake detection software is another effective mitigation option. Systems like Pindrop® Pulse™ Tech can analyze audio content to identify pattern inconsistencies, such as unnatural shifts in voice modulation. AI-powered tools learn from newly developed deepfake patterns, so they can help protect you against both older and newer technologies.

Remember to train your staff. While humans are not great at detecting synthetic voices or images, when people are aware of the risks deepfakes pose, they can better spot potential red flags. 

These include unusual delays in voice interactions, irregular visual cues during telemedicine appointments, or discrepancies in communication. You can also conduct regular phishing simulations to help staff identify and respond to suspicious communications.

Implementing data security best practices

Proactive measures are the first lines of defense, but you shouldn’t forget about data protection.

Multifactor authentication (MFA) is a simple but strong data protection mechanism that can help confirm that only authorized individuals can access sensitive healthcare systems. With it, a person will need more than one form of verification, so if someone steals one set of credentials or impersonates someone’s voice, there will be a second line of defense.

Encrypting communication channels and even stored data is another vital aspect of data security. In healthcare, sending voice, video, and data across networks is common, so encrypting communication is a must. Protecting stored data adds an extra layer of security, as even if a third party gains access, they would still need a key to unlock it.

Remember to update and monitor your data security practices regularly.

Safeguard your healthcare organization from deepfakes today

When artificial technology first came to the public’s attention, its uses were primarily positive. In healthcare, for instance, synthetic media was, and still is, helpful in researching, training, and developing new technologies. 

Sadly, the same technology can also take a darker turn, with fraudsters using it to impersonate doctors, gain access to sensitive patient data, or disrupt operations. Solutions like Pindrop® Passport and the Pindrop® Pulse™ Tech add-on offer a powerful way to authenticate voices and detect audio deepfakes before they can infiltrate healthcare communication channels.

By combining proactive detection tools with strong data security practices, healthcare providers can better protect themselves, their patients, and their operations from the devastating consequences of deepfakes.

THANKS FOR DOWNLOADING

Download the report below

You will learn how to mitigate the impact of deepfakes on customer interactions, integrate deepfake detection into authentication systems with proactive measures and execute key considerations when implementing deepfake protection initiatives.

Click here to download the report. 

THANKS FOR DOWNLOADING

The report is on its way to your email

If it’s not in your inbox, check your spam or junk folder—sometimes even the clearest signals don’t land where they’re meant to be heard.

After this bank, which manages more than $73 billion in assets, leveraged Pindrop for authentication, the next phase of its customer service delivered deeper fraud prevention.

The Challenge

This bank wanted to understand more about the risky authentication attempts that were targeting its contact center. The bank also wanted to quantify the various types of fraud bad actors were applying.

The Solution

This bank wanted to understand more about the risky authentication attempts that were targeting its contact center. The bank also wanted to quantify the various types of fraud bad actors were applying.

Fraud Prevention

Risk is alerted quickly

Insightful Intelligence

Rapid response foils attacks

Quantify Attack Types

Understand and prevent the risk

Better Together

Results are amplified with two solutions

Curiosity spawns new way to deliver business impact

Customers are at the center of every conversation at this bank. In fact, customer-centricity is the top company value. This has been the case for over 170 years and that focus on customers and why this bank chose Pindrop’s authentication solution for its contact center.  With Pindrop Passport in place, this bank became more curious about the calls that had a higher risk value.  The contact center leadership was very curious to define exactly how much fraud was occurring in the phone channel.  This bank already provides a significant amount of anti-fraud education to its consumers and commercial clients, yet the bank was inspired to pursue even more ways to protect assets and deliver new means of value by enabling Pindrop Protect®.

Pindrop Protect quickly surfaced fraud patterns and volumes that were highly actionable. The user interface is so easy to use, the bank’s fraud analysts can see the details about each call and what factor was influencing the highest risk scores. With the insights from the phone channel that Pindrop Protect provides, the analyst can immediately block transactions and requests from occurring in other channels such as in-app and wire transfers as well as IVR services. In less than one year, the bank measured its ability to prevent fraud attacks on over $56M in assets with Pindrop.

Safeguarding never stops

Due to the post-pandemic rise in fraudulent activity, it is typical for a bad actor to attack multiple times and with a variety of methods. With Pindrop Protect, approximately one in five calls into the contact center are flagged for risk. The forms of fraud run the gamut from counterfeit checks, unauthorized transfers, fraudulent check orders, new debit card requests, web banking attacks, and online account transfer fraud.

Thanks to Pindrop’s Advisory Services consulting, the bank’s fraud prevention team is always at the forefront of the latest insights. Together, Pindrop and the bank are constantly innovating and adapting to stay well ahead of fraudsters.

What’s next

The bank and Pindrop will continue to evolve both the authentication and anti-fraud programs to lead the way with the latest best practices to drive even higher business results. By having both Pindrop® Passport and Pindrop® Protect in place, the two solutions amplify the effect of one another.  As for the next horizon, the bank will synthesize more of its internal and Pindrop data intelligence to deliver new ways to support the ease of experience for its genuine callers and detect risk across its portfolios. 

Fortify your Business Against Deepfakes

Deepfakes, capable of mimicking anyone’s voice with remarkable realism, have emerged as a prevailing threat to businesses and consumers. Bad actors can use deepfakes to impersonate real customers, leading to increases in fraud losses, breaches of sensitive customer data, damage to brand reputation and more.

With these attacks on the rise, cutting-edge fraud detection software with security measures like multi-factor authentication and real-time liveness detection is your key to combating the impacts of deepfakes.

Join Elie Khoury and Amit Gupta of Pindrop for an executive fireside chat to discuss how deepfakes are impacting your business and what you can do to safeguard your contact center operations and customers.

The Deepfake Dilemma: Navigating AI Deception in Your Organization

The diversity of deepfakes that the Pindrop team encounters is both fascinating and alarming. From AI-generated phone calls into call centers to update account information, to deepfake scams using celebrities’ likeness to endorse phony products on social media – the availability, effectiveness, and increasing use of online text-to-speech tools underscores the critical need for robust detection mechanisms.

In this webinar, Pindrop brings together our Chief Product Officer Rahul Sood and the Founder of TrueMedia.org Dr. Oren Etzioni to discuss the rapidly evolving landscape of deepfake technology and its far-reaching implications for enterprises and media, as well as the spread of misinformation.

During the discussion, we also provided a sneak peek into exciting new product developments for Pulse® Inspect™, the latest addition to Pindrop’s cutting-edge deepfake detection Pindrop® Pulse™ product family.

Voice Deepfakes in the Contact Center: Ask Us Anything

The manipulation of audio and generative AI technology have become increasingly sophisticated, raising concerns for businesses and consumers around digital deception, fraud attempts and disinformation. Last month, we launched our industry-leading solution, Pindrop® Pulse, which delivers real-time audio deepfake detection in the contact center.

Whether you’re a seasoned IT professional or curious to learn more about the mechanics behind deepfake detection, join our panel of Pindrop voice security experts for a live Ask Me Anything (AMA) session where you’ll have the opportunity to ask questions and gain invaluable knowledge around how your business can navigate the complexities of this evolving digital landscape.

Human or Machine? Safeguarding Contact Centers Against Deepfakes

Generative Artificial Intelligence (AI) that can create new content is a major technological breakthrough. ChatGPT, introduced by OpenAI, is one of the most famous examples of this trend. However, the underlying idea of leveraging fast-learning AI models to create synthetic audio and content is already having far-reaching consequences in the world of fraud.

In Part 3 of the VISR webinar series, our experts will dive into the dark side of AI and its profound impact on our digital landscape. This session will unravel the unsettling consequences of leveraging fast-learning models for the creation of synthetic audio and content, leading to a surge in fraudulent activities.

Fake President Fraud: The Deepfake Threat You Should Prepare For

Deepfakes went viral in 2019 as Steve Buscemi’s face was imposed on Jennifer Lawrence’s body. As a presidential election approaches, the threat of this sophisticated technology becomes more serious. An emerging category called Fake President Fraud is targeting high-profile figures. This presentation will explain how fraudsters are creating synthetic voices, the implications and future threats.

The Real Threat of Deepfakes to Your Contact Center

Deepfake attacks are constantly evolving. Attend our upcoming webinar and arm yourself with knowledge on the latest deepfake trends and the tools that can help catch them.

From Detection to Prevention: Advanced Strategies to Prevent Contact Center Fraud in 2024

In 2023, data breaches reached an all time high of 3,2051, 78% higher than the previous year. Leveraging generative AI technology, fraudsters are employing advanced tactics, including bots and deepfakes, and exploiting vulnerabilities in outdated systems. In the face of rising contact center fraud, it’s crucial to implement robust and modern fraud prevention strategies.

Watch Pindrop fraud & authentication experts for a comprehensive webinar where we dug into essential fraud-fighting techniques that can protect your contact center from becoming a fraudster’s playground.

Webinar

Voice Theft: How Audio Deepfakes Are Compromising Security

As generative AI advances, Pindrop Pulse® provides a groundbreaking solution to combat audio deepfakes, restoring customer trust and enhancing Pindrop’s product suite, with insights from CPO Rahul Sood and VP Amit Gupta in an informative session on its impact and capabilities.

 

Hear from Pindrop’s CPO and VP of Product, Research & Engineering as they share their research and insights on how Pindrop Pulse® is leading the battle against deepfake audio deception.

Discover Amit and Rahul’s insights on the rise of voice deepfakes and their impact across different industries.

Gain actionable strategies to mitigate risks in 2024 and how Pindrop provides protection.

Learn about recent high-profile deepfake incidents, including the deceptive Biden robocalls.

Meet the Experts

Amit Gupta

VP, Product, Research & Engineering

Rahul Sood

Chief Product Officer, Pindrop

WEBINAR

Detecting Deepfakes: How to Combat Fraud

Join ethical hacker Samy Kamkar and Pindrop VP Amit Gupta in an exclusive live Q&A session to discuss the rise of cyber fraud, the impact of deepfakes  and evolving security trends, followed by a discussion with Yves Boudreau on how cyber fraud affects businesses.

Learn about cyber fraud and security trends with Samy Kamkar and Amit Gupta in a live Q&A.

Find out how fraud impacts businesses through a fireside chat with Amit Gupta and Yves Boudreau.

Discover how to strengthen your company’s fraud security measures from this webinar.

Meet the Experts:

Amit Gupta

VP Product Management, Research, and Engineering

Yves Boudreau

Head of Customer Engineering, Google Cloud

Samy Kamkar

Security Researcher and Co-Founder, Openpath Security

WEBINAR

Deepfake + Voice Clone Deep Dive
with Voicebot.ai

Deepfakes are one of the most controversial applications of GenAI technology. While some view it as a harmless tool, others recognize the greater threats. Pindrop and Voicebot.ai surveyed more than 2,000 U.S. consumers to uncover their perception of deepfake and voice clone technology. This collaboration led to an extensive and insightful report, which we’ll break down in this webinar.

Consumer sentiment around deepfakes and voice clones

Which industries have the highest consumer concern around deepfake risks

How pop culture plays a role in AI technology sentiment

Strategies to combat the threat of deepfakes and voice clones

Meet the Speakers

Amit Gupta

VP, Product Management, Research & Engineering

Brett Kinsella

Founder, Editor, CEO & Research Director

Deepfakes and the Escalation of Political Conflict

Imagine watching a political candidate deliver a controversial speech or a world leader engaging in an incendiary conversation only to discover none of it was real. This is the unsettling reality of political deepfakes, which are becoming increasingly common.

We were already living in an era where facts are under constant siege, but with the rise of deepfake attacks, this is becoming even more of an urgent threat.

Political deepfakes are fueling misinformation, undermining democratic institutions, and contributing to political polarization. 

In this article, we’ll explore the potential impact of deepfakes on the 2024 U.S. elections and deepfake incidents worldwide. We’ll also highlight strategies, including deepfake detection, to combat them.

What are political deepfakes?

Political deepfakes are AI-generated synthetic media that manipulate video, audio, or images of political figures to deceive viewers, often for malicious reasons

Deepfakes can fabricate speeches or alter footage of election debates. They allow hyper-realistic alterations that can go undetected by the untrained eye, becoming a potent tool for those looking to deceive voters and undermine election integrity.

The multifaceted impact of deepfakes on political landscapes

Weaponizing misinformation: Propaganda in the digital age

There’s no denying how deepfakes are becoming propaganda tools, allowing malicious actors to manipulate political narratives. Its use in spreading falsehoods and undermining democratic processes has escalated globally. 

Some recent examples of political deepfakes worldwide include*:

  • Moldova: Pro-Western President Maia Sandu has been the subject of repeated deepfake attacks. One particularly damaging video surfaced just before local elections, falsely showing Sandu endorsing a pro-Russian party and announcing her resignation.
  • Taiwan: A deepfake video circulated on TikTok earlier this year showed U.S. Representative Rob Wittman promising stronger U.S. military support for Taiwan, stoking fears of U.S. interference in the region.
  • Slovakia: Just before the parliamentary elections, audio clips surfaced online, featuring fabricated discussions about raising beer prices and rigging votes, potentially influencing public opinion right before the critical election.
  • Bangladesh: Opposition lawmaker Rumeen Farhana, a vocal critic of the ruling party, was falsely depicted wearing a bikini, sparking outrage in the conservative, majority-Muslim nation. “They trust whatever they see on Facebook,” Farhana said.

*Source: Election disinformation takes a big leap with AI being used to deceive worldwide.

Undermining trust in democratic institutions and media

Trust in the media plays a pivotal role in upholding democracy, ensuring that individuals are informed and engaged in civic discourse.  This year in the U.S., the FCC outlawed AI-generated robocalls aimed at discouraging voters.

When voters cannot discern real from fake, they create an environment of skepticism. Eventually, this lack of trust in reliable information leads to disengagement and disillusionment with their governments.

The polarization of trust is particularly evident in the U.S. According to recent data, trust levels in mass media vary significantly based on political affiliation. Such disparities highlight the gap in how different groups perceive the media’s credibility, further intensifying political polarization.

Influencing electoral processes and voter manipulation

Voter manipulation is a particularly concerning consequence of political deepfakes. Deepfakes can confuse voters by spreading disinformation, ultimately influencing electoral outcomes. They also target groups with less awareness about the emerging technology.

The widespread availability of such content raises ethical questions about election integrity and highlights the need for robust deepfake detection software to mitigate these risks.

Examples of political deepfakes in the 2024 election

Political deepfakes have already played a prominent role in the lead-up to the 2024 elections. 

For instance, deepfake detection software uncovered manipulated videos of political figures, such as the deepfake of VP Kamala Harris and the Biden robocall

Learn how the Pindrop deepfake detection technology revealed the text-to-speech (TTS) engine behind Biden’s AI robocall.

Strategies to combat political deepfakes

Deepfake detection software

Cutting-edge deepfake detection software like Pindrop® Pulse Inspect is essential in the fight against political deepfakes. These tools can detect inconsistencies in audio and video, helping to verify the authenticity of media before it reaches the public.

The real-time capabilities of voice liveness detection are critical in preventing manipulated media from spreading unchecked.

Integrating multiple detection methods

A comprehensive approach to combating deepfakes involves integrating multiple detection methods, including:

  • Real-time detection
  • Continuous assessment
  • Noise resilience, reverberation, and compression
  • Explainable analysis
  • Zero-day attack

The layered protection offered by the Pindrop® Pulse™ technology supports more accurate detection and faster response times.

Cross-sector collaboration in addressing deepfake threats

Tackling the challenges posed by political deepfakes requires collaboration across various sectors. Governments, media companies, and cybersecurity experts must work together to create a unified front against deepfake threats. 

Developing universal standards for synthetic media authentication and media verification protocols can help safeguard democratic processes.

Mitigate the threat of political deepfakes today

As we look toward the future, governments, media companies, and individuals must take proactive steps to defend against the rise of political deepfakes.
By leveraging the right technology, we can mitigate the impact of political deepfakes. Learn more about how Pindrop® Pulse Inspect can help you verify the authenticity and trustworthiness of content.

Misinformation is not a new issue, but the rise of AI-generated content, in written, video, audio, and image formats, has brought it into a new light. A few decades ago, misinformation in journalism meant someone had to craft a well-thought-out lie, publish it in a journal, and disseminate it to the target audience. Now, all it takes is good AI software and a carefully planned prompt, and you’ll get a video, image, or audio, representing someone in a situation they’ve never been in.

Identifying the difference between reality and a deepfake is becoming increasingly difficult, so many are wondering—how can we trust journalism and the media in general?

The rise of deepfakes in journalism and media

Deepfake technology made its first appearance in November 2017, when a Reddit user shared an algorithm that could create realistic fake videos. Not long after, in early 2018, Jordan Peele shared fake videos of Obama. It was the first signal that the new AI technology could pose a serious threat by blurring the line between fact and fiction at a never-before-seen level.

While some of the first instances were created for entertainment purposes only and likely caused no real harm to those involved, it didn’t take long until things took a turn for the worse. Some started using deepfake videos to create elaborate scams, making people believe they were receiving messages from family members in distress

Then, videos of politicians began emerging, in an attempt to alter the public perception. As a 2024 systematic review briefly summarizes, “Deepfakes can cause political and religious strains between countries, deceive people, and impact elections.”  A good example is the recent incident with a deepfake of Kamala Harris, made to look like a presidential campaign video, which raised serious questions about future election integrity.

Notable deepfake incidents in journalism

Since its rise, deepfake technology has made way for a series of worrying incidents in the media. Recently, South Korea experienced a real deepfake crisis that saw hundreds of schools and universities targeted. 

The perpetrators created and distributed fake videos targeting underaged victims. This incident drew attention to the broader implications of deepfake technology in terms of privacy and safety, particularly for vulnerable populations.

The U.S. hasn’t been sheltered either, with fake images of Donald Trump being arrested circulating online just a few months ago. 

Last year, a CNN news anchor Anderson Cooper was the victim of a deepfake video where he appeared to be talking about Donald Trump in a less-than-flattering manner. Another AI-generated video portrayed Gayle King, a CBS Morning co-host, presenting a product she’d never used.

Challenges deepfakes pose to journalism

With the deepfake technologies improving each day, fact-checking and ensuring content authenticity is becoming harder for journalists worldwide. We’re at a point where nothing is off-limits and the risk to security is increasing. 

Abbas et al. state in their review that “satellite imagery can even be manipulated to incorporate things that do not exist: for instance, a fake bridge over a river, which can confuse military experts. In this way, deepfakes might mislead troops to target or defend a non-existent bridge.” Journalists must be more aware than ever of the challenges deepfakes pose.

Threat to Media Credibility

Journalism relies on trust, and the introduction of highly realistic but fake content makes it difficult for both journalists and audiences to confidently determine the truth. 

After seeing even one manipulated video presented as authentic by media outlets, the public will have serious doubts about their integrity, even when their reports are accurate.

Spread of misinformation & disinformation

Deepfakes make it increasingly easy to spread false information. With written content, journalists and readers are more likely to take everything with a grain of salt, looking for confirmation. But when looking at a video or an image, people are likely to believe what they see and they’re less likely to assume they’re looking at AI-generated content.

Erosion of public trust

“Fake news” is a common accusation nowadays and deepfakes only increase the problem, leading to skepticism toward the press. When audiences are unsure whether they can trust what they see or hear, the foundation of journalism—delivering accurate and reliable information—is severely compromised.

Impact of deepfakes on journalistic practices

Deepfakes are without a doubt a challenge for journalists. They impact not only how the public perceives the media, and their trust in news outlets, but journalistic practices as a whole.

Verification challenges

The first step to combat synthetic media and misinformation is to fact-check everything again and again. Traditional tools like checking sources, cross-referencing reports, or analyzing video metadata can prove insufficient. Journalists will need to use more advanced techniques and fact-checking tools to ensure they publish genuine information.

Legal and ethical considerations

The legal and ethical ramifications of deepfakes in journalism are profound. For one, media outlets have an ethical obligation to prevent disseminating manipulated materials. Additionally, publishing fake content could expose journalists to legal liability for defamation or other legal claims.

News organizations must develop rigorous standards for detecting and reporting deepfakes, finding a balance between the need for speed in the 24-hour news cycle and ensuring accuracy.

Need for new skills and tools

Traditional journalistic knowledge might not be enough in the landscape created by AI and deepfakes. Advanced fact-checking tools are needed to detect fake content.

Ethical considerations in the era of synthetic media

In 2018, a video went viral in India, showing a man on a motorbike kidnapping a child in the street. The video created panic and anger and led to 8 weeks of violence, during which 9 innocent people lost their lives. In the end, the video was proven to be fake.

AI has brought on what some call information warfare, with many trying to manipulate content to their advantage. Media outlets and journalists have a responsibility to prevent the spread of deepfakes, both for themselves and their reputation, but also for the public’s safety.

Maintaining Integrity in Journalism

Journalists must adhere to strict ethical standards to ensure that their reporting remains accurate and trustworthy. This includes having thorough verification processes, being transparent about potential uncertainties, and correcting errors swiftly when they occur. 

Media outlet responsibilities

If the incidents that took place so far showed us anything, it’s that media outlets have a significant responsibility in combating the rise of deepfakes. 

There is an increasing need for internal protocols and tools that can identify and prevent the spread of fake content. Media organizations also need to educate their employees and audiences about deepfakes and how to recognize them.

Strategies to combat deepfakes in journalism

We need to help stop the spread of misinformation and fake content, but how can we do that? Here are a few strategies.

Technological solutions for deepfake detection

To combat deepfakes effectively, you need to prioritize technological solutions to detect forgeries. AI-powered tools, such as deepfake detectors, analyze subtle inconsistencies in videos and audio that are imperceptible to the human eye or ear. 

These tools use algorithms to scan for signs of manipulation, such as unnatural facial movements or discrepancies in lighting, which can indicate that a video is not real.

AI-Powered Detection Tools

AI technologies help create deepfakes, but they can also help fight them. They can analyze massive amounts of data in real-time, identifying patterns that suggest manipulation. 

For instance, Pindrop® PulseTM Inspect can analyze audio files and detect synthetic voices. 

Content authentication

Another strategy for combating deepfakes is content authentication, which involves verifying the source and integrity of digital media at the moment of creation. 

An example would be blockchain technology, which you can use to create a digital imprint for videos and images so that they can’t be tampered with. Implementing these methods can help journalists prove their content is authentic, improving the public’s trust. 

Media literacy & public education

No matter how much journalists try, some deepfakes might still make their way to the public. That’s where media literacy and public education come in. By helping people understand the risks of deepfakes and how to recognize them, we can diminish the impact these have and reduce the instances when they cause real harm.

Collaboration between tech companies & news organizations

A lot of the responsibility for deepfakes falls on news organizations as they must stop their spread. But on their own, they might lack the resources to fight this phenomenon. Tech firms are leaders in AI development and can support journalists against deepfake creators. 

A collaboration between these two sectors can help create more advanced detection tools and shared protocols to verify the authenticity of media content.

The future of journalism in the age of deepfakes

Since synthetic media, algorithmic content creation, and fake news have become so prevalent, many wonder what the future of journalism will look like. Can journalism continue to exist as we know it? Will it have to change or will it disappear altogether? 

News outlets will need to invest in continuous training and education for their staff, equipping them with the skills necessary to identify and combat deepfakes. Emerging technologies, such as AI-driven verification systems and blockchain-based content authentication, will play a central role in this effort.

In the long term, the future of journalism will likely depend on its ability to preserve trust while embracing new tools and strategies to counter disinformation. Despite the challenges, journalists can stay one step ahead of the curve, helping ensure the public receives reliable information.

Combat deepfakes in journalism today

The threat posed by deepfakes is undeniable, but journalists, media organizations, and technology companies have the tools and expertise to help combat it. Simple, yet high-quality solutions are already within reach. 

One such example is Pindrop® PulseTM Inspect, a tool that helps you analyze your audio data and determine if the voice is genuine or synthetic. Pindrop® PulseTM Inspect offers real-time detection of synthetic voices so you can use it even when you’re running against the clock and be one step closer to delivering real information.

ON-DEMAND Webinar

The Deepfake Dilemma: Navigating AI Deception in Your Organization

The range of deepfakes encountered by the Pindrop team is both intriguing and concerning, from AI-generated phone calls to call centers to celebrity impersonations used in scams on social media. The growing availability and effectiveness of online text-to-speech tools highlight the urgent need for strong detection systems to combat these threats.

Understand the current state of deepfake technology and its rapid evolution

Uncover the potential impacts of deepfakes on businesses and democratic institutions

Gain insights into the specific disinformation tactics used in recent and ongoing elections

Discover best practices for maintaining public trust in your brand in the face of sophisticated AI-generated content

Preview upcoming Pindrop technologies and methodologies for deepfake detection and prevention

Your expert panel

Join us for this timely and important discussion as we navigate the complex landscape of AI-generated deception and its impact on the future of business.

Rahul Sood

Chief Product Officer, Pindrop

Dr. Oren Etzioni

Founder, TruMedia.org

Written in collaboration with Bennett Borofka, Partner Solutions Architect at Amazon Web Services.

Is your contact center vulnerable to fraud attacks?  

In Pindrop’s 2024 Voice and Security Intelligence Report, we project that more than 1 in every 730 calls into call centers will be fraudulent by the end of 2024. The rise of accessible generative AI tools is also making way for a rise in deepfake attacks, or the fraudulent use of synthetic media to replicate a person’s likeness, including their voice.  

Now is the time to help protect your business with technological solutions designed to help detect fraud, authenticate callers, and spot deepfakes.


Amazon Connect is the cloud contact center service from Amazon Web Services (AWS). It provides a seamless, omnichannel experience that is easy to deploy and scales quickly. As an AWS Partner, Pindrop has successfully launched Pindrop® Solutions with several Amazon Connect Customers and many Pindrop customers are running on AWS. Check out this customer case study to better understand the benefits of Pindrop solutions in your contact center.

About Pindrop Solution’s integration with Amazon Connect

Pindrop Solutions’ integration with Amazon Connect brings authentication, fraud detection and deepfake detection right to agents’ desktops during voice interactions. This allows Amazon Connect agents to receive near real-time caller verification and fraud detection during each live call. Some of the specific enhancements Pindrop solutions bring to Amazon Connect are:

  • Caller enrollment method: Callers can be enrolled with the Pindrop solution while they are in IVR and with the agent.
  • Caller enrollment time: Callers can be passively enrolled with the Pindrop solution in approximately 10 seconds, and they don’t have to enroll again.
  • Caller authentication time: Callers are authenticated in approximately 2 seconds. Authentication status is presented via APIs in near real-time to Amazon Connect.
  • Fraudster detection: A deep, cumulative set of sources and methods are used to detect fraudulent callers: voice, device, behavior, metadata, and connections. Fraud risk data is available throughout the call.
  • Fraud case management: A suite of tools for identifying and tracking fraud cases.
  • Liveness detection: A liveness score, showing a real-time assessment of potential synthetic or recorded voices versus genuine, live human speech.  Liveness data is available throughout the call.

Pindrop Solutions are fully SaaS-based and available on AWS Marketplace

The platform deploys easily to new and existing instances of Amazon Connect, making it possible for customers to realize quick time-to-value in adding robust security capabilities to their contact center. Customers deploying Pindrop Solutions integrate the platform’s authentication, fraud and deepfake data directly into their agent desktops for live analysis.

At AWS, security is “job zero” and our number one priority for customers. Pindrop undergoes the AWS Foundational Technical Review process to qualify their software meets specific guidelines for security, reliability and operational excellence of the platform. Pindrop Solutions integrate with Amazon Connect using a collection of services within customers’ AWS accounts. Pindrop offers guidance to Amazon Connect customers through the deployment and configuration of this integration during implementation:

  • AWS Lambda: A collection of AWS Lambda functions are configured within Amazon Connect Flows to securely query the Pindrop API during initial voice contact. This provides the Pindrop solution with details about the contact and returns authentication and risk scores, which are presented back to Amazon Connect in the API response. These authentication and risk scores are presented live in the agent’s desktop.


Deepfakes are a rising fraud threat for contact centers. That’s why it’s imperative to deploy a comprehensive solution that can detect fraud at various points in the contact center experience, authenticate callers, and analyze audio for synthetic voice. Pindrop Solutions offer this–all within your existing Amazon Connect environment. 

Our offerings

If you already use Amazon Connect to manage your contact center experience, you can add Pindrop Solutions with ease. Here’s an overview of our solutions:

Pindrop® Pulse Technology

Fortify trust and integrity between you and your customers with our industry-leading deepfake detection technology. Independently tested with a 96.4% accuracy rate, according to an NPR study on audio deepfake detection. 

Pindrop® Protect

With near real-time fraud alerts and risk assessments for inbound calls, you can reduce fraud losses by detecting repeat, known fraudsters as well as new fraud through anomaly detection.

Pindrop® Passport

Legacy authentication systems are time-consuming for your agents and customers. With cutting-edge multi-factor authentication that can passively authenticate in the IVR or at the agent, you can fortify your contact center with effective, seamless safety measures.

Why Pindrop?

Pindrop Solutions are industry-leading voice security tools with proven results. From fraud detection to spotting deepfakes to authenticating callers, our technology is helping stop fraudsters in their tracks.

With our robust integration with Amazon Connect, you can implement these tools seamlessly–bringing important, thorough call analysis to your agents’ screens. 

To learn more about our product integration and solutions, request a demo with a member of our team. 

Audio deepfakes created by advanced text-to-speech (TTS) and voice conversion (VC) systems are increasingly prevalent as more easy-to-use commercial and open-source tools become available. On the Pindrop research team, much of our focus has been on mitigating the threat of fraud, disinformation, and misinformation through reliable detection of synthetic speech. However, recently, we’ve also started investigating ways to identify the engine behind a given deepfake.

In January 2024, we released a research blog detailing how Pindrop uncovered the text-to-speech (TTS) engine behind an election interference robocall imitating President Joe Biden. Our results indicated that the call was most likely created by a popular commercial TTS tool, allowing for subsequent investigations by journalists, fact check organizations, and law enforcement.

We also recently published a blog about a deepfake of Elon Musk. In that case, we were able to identify ElevenLabs as the voice cloning vendor that was used to create the deepfake. We informed them so that they could complete further investigation.

Since then, our research on this topic has progressed, and a paper with our findings was recently accepted at the Interspeech 2024 conference. Keep reading for a brief overview of this research.

Research: Source Tracing of Audio Deepfake Systems [link to paper]

By Nicholas Klein, Tianxiang Chen, Hemlata Tak, Ricardo Casal, Elie Khoury. To be presented at Interspeech 2024 in Kos, Greece on September 3, 2024.

Key contributions

  • We leverage open source deepfake detection systems for source tracing, predicting the acoustic model and vocoder with state-of-the-art accuracy, while additionally predicting the input type (text or speech) with near perfect accuracy.
  • We devise and publish a new source tracing benchmark for more robust evaluation of source tracing systems composed of a large number of recent TTS systems.

Component-based source tracing

As covered in our previous blog on robustness against zero-day attacks, voice cloning systems are typically built from common generative AI building blocks. TTS and VC systems commonly have a conversion model that is responsible for producing output acoustic features and a vocoder that transforms those acoustic features to output waveforms. As many novel deepfake systems tend to reuse existing building blocks, we adopt a generalizable approach of predicting the conversion model and vocoder. Additionally, we propose identifying the input type (text or speech).

Source tracing methods

In our paper, we experiment with two strategies for leveraging existing state-of-the-art deepfake detection systems for the task of component classification. Here, we present the method that we refer to as two-stage source tracing.

The two-stage approach splits training into two steps. First, a front-end model is trained for the standard binary deepfake detection task.

Next, the front-end weights are frozen and lightweight classification heads are trained on the embeddings for each separate component classification task.

For the classification heads, we use the simple feed forward architecture from the back-end model of the ResNet deepfake detection system.1 

While the two-stage approach is limited to the information that the binary-trained deepfake detection learns, it is very attractive in practice: in addition to the reduction in computational costs, existing binary systems can be trained on significantly more data than we have component labels for.

Results on an existing benchmark

As an anchor to previous work, we evaluate our methods on the ASVspoof2019-based protocol designed by Zhu et al.2 Utterances in this protocol are from the ASVspoof 2019 LA dataset which contains synthetic examples generated using a set of different TTS and VC systems.3 We use the same categories as Zhu et al. for the acoustic model and vocoder classification tasks. Additionally, we create a new “Input type” task which is helpful to separate between TTS and VC systems.

Our key takeaways

  • We reach 99.9% accuracy on the newly-proposed input type prediction task for the ASVspoof protocol.
  • We achieve state-of-the-art performance on both the acoustic model and vocoder classification tasks with the SSL (E2E) method with an accuracy of 99.4% and 84.6%, respectively. We attribute this improvement over previous work to the fact that their method utilized a multi-task objective causing their models to compromise on the performance of each task.
  • The more restricted but practical two-stage approach yields competitive accuracy.

Ongoing work

As of today, our source tracing solutions have expanded on our Interspeech 2024 work to enable the discrimination of a much larger number of deepfake engines, including popular commercial and open-source systems. We are planning to share more updates about this in the near future.

This work is a good step forward into providing an additional level of intelligence and understanding of deepfake detection. We believe that it will be very useful to journalists, fact checkers, and law enforcement for a pointed follow-up investigation when synthetic speech is detected.

References

1T. Chen, A. Kumar, P. Nagarsheth, G. Sivaraman, and E. Khoury, “Generalization of audio deepfake detection,” in Proc. Odyssey 2020 The Speaker and Language Recognition Workshop, 2020.

2T. Zhu, X. Wang, X. Qin, and M. Li, “Source tracing: Detecting voice spoofing,” in Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022.

3X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K. A. Lee et al., “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Computer Speech & Language, vol. 64, p. 101114, 2020.

Nearly six months ago, we launched our Pindrop PulseTM solution, a cutting-edge deepfake detection technology for our enterprise customers to help detect AI-generated voices in their call centers. Since then, we have collaborated with news organizations, governments, the music and entertainment industry, and corporate security teams to assess hundreds of suspected deepfakes. From AI-generated robocalls aimed at voter suppression to sophisticated smear campaigns, and from general misinformation in conflicts worldwide to attempts to distort public perception—each case underscores the critical need for robust deepfake detection mechanisms.

The implications of these deepfakes are profound: they threaten the integrity of news organizations, social media platforms, and elections worldwide. The potential for misinformation to sway public opinion and disrupt social order is a stark reality that we now face. 

In response to these grave threats, we’re thrilled to announce Pindrop PulseTM Inspect in Preview, an Audio Deepfake Detection Solution to assist fact-checkers, misinformation experts, security departments, trust and safety teams, and social media platforms. As a forensics tool, Pindrop Pulse is designed to detect AI-generated speech in audio or video media, including both digital media (e.g., deepfakes on social media) and phone call media (e.g., voicemails). Users log into the web application, upload their media files, and within seconds, receive a determination on whether the content contains AI-generated speech. Additionally, users can integrate the Pindrop Pulse award-winning deepfake detection technology programmatically into their own workflows via our simple-to-use APIs.

A Rapidly Growing Problem

Simply stated, ‘deepfakes’ are AI-altered images, text, video, and audio files.

Specifically for speech, this means creating highly realistic audio clips that can convincingly mimic someone’s voice by training an AI-model from their publicly available speech. 

This problem is growing for several reasons. First, the technology has advanced so significantly that the quality of synthetic speech is remarkably high. Second, commercial platforms offering these services have become incredibly affordable. And, the number of available tools for deepfake creation, i.e. Text-to-Speech (TTS) and Speech-to-Speech (STS) have exploded over the past two years that there are now close to 2000 open source Text-to-Speech tools on Huggingface alone.

Humans are notoriously bad at detecting deepfakes. In a study, humans were only able to detect fake audio 54.5% of the time, and in the real world, distinguishing between genuine and fake audio is even more challenging. Scammers who are creating these deepfakes are becoming increasingly sophisticated, often adding background noise or music, or using very short clips of speech to make detection more challenging. These fraudsters are continuously evolving their techniques, making it imperative for us to stay one step ahead in the fight against misinformation.

Over the past 13 years, Pindrop has built a platform based on real-time analysis of +5 billion audio interactions. We have over 270+ patents on voice and security, and 25 patents on audio deepfake detection alone. Today, we’re proud to package our experience and technology into a tool that helps combat the most deceptive audio deepfakes, particularly for the news media or organizations that rely on the accuracy of their content to maintain customer trust and the credibility of their organization.

Good AI to Fight Bad AI in the Media

Pindrop has partnered with some of the market and technology leaders fighting misinformation online. For example, TrueMedia.org was among the first adopters to test our solution in their workflows and reported that the Pindrop Pulse audio deepfake detection had better accuracy than other alternatives in detecting synthetic speech. 

According to Oren Etzioni, CEO of TrueMedia.org,“TrueMedia.org is a non-profit, non-partisan AI project to fight disinformation in political campaigns by identifying manipulated media. Our comprehensive evaluation found Pindrop’s audio deepfake detection has better accuracy than other alternatives in detecting synthetic speech. We are excited to partner with Pindrop in this mission, and add Pindrop’s deepfake detection technology in the solution for our customers and users across the world.”

Pulse Inspect offers trust and safety teams a forensics tool to enhance their disinformation detection workflows.

  • Best-in-class Performance: Pindrop has trained its deepfake detection model on over 370 deepfake generation tools with over 20M statements (both genuine and synthetic), enabling us to achieve over 99% accuracy against previously seen deepfake models and 90% of “zero-day” attacks that use new or previously unseen tools. We’ve also had third parties confirm that our solution had over 40 percentage points higher accuracy than competing solutions on audio. 
  • Resilience: News and social media are global businesses and need support to detect deepfakes across various languages. Pindrop PulseTM Inspect is language agnostic and its underlying training models have been tested and validated on over 40 languages that cover over 90% of the internet’s spoken languages. This technology offers resilience to adversarial attacks such as addition of noise, reverberance or speech changes. 
  • Breadth of Audio: The same Pindrop Pulse technology that identifies over a million social engineering attempts in the call center has now expanded to digital media. Pulse Inspect supports both phone call audio (8kHz) and high-fidelity social media audio (44.1kHz). It also provides detection capabilities irrespective of whether synthetic speech is created using text to speech, speech to speech or voice conversion techniques.
  • Video Support: Pulse Inspect supports audio deepfake detection in videos. The platform analyzes video files for AI-generated speech by extracting audio content out of video media types. 

Explainability: Pulse Inspect offers segmental analysis of uploaded media to aid in the detection of partial deepfakes. This feature provides a visual indicator to users to help determine which segment in a long-form media file is synthetically generated vs. segments which most likely do not contain synthetic speech.

Free trial

With Pulse Inspect in Preview, we invite those who are responsible for identifying and reporting on deepfakes to evaluate our technology, at no cost.

Request access to a free trial here.

1. https://www.pindrop.com/blog/pindrop-named-a-winner-in-the-ftc-voice-cloning-challenge
2. https://synthical.com/article/c51439ac-a6ad-4b8d-82ed-13cf98040c7e
3. https://www.pindrop.com/blog/exposing-the-truth-about-zero-day-deepfake-attacks-metas-voicebox-case-study
4. In the NPR study, Pindrop detected 81 out of possible 84 (96.4%) voice samples correctly, compared to the nearest competitor who detected 47 out of 84 (56% – excludes samples identified as inconclusive).
5. Statista: Languages most frequently used for web content as of January 2024
6. Terms and conditions apply.

WEBINAR

Advanced Strategies to Prevent Contact Center Fraud in 2024

In 2023, data breaches reached an all time high of 3,2051, 78% higher than the previous year. Leveraging generative AI technology, fraudsters are employing advanced tactics, including bots and deepfakes, and exploiting vulnerabilities in outdated systems. In the face of rising contact center fraud, it’s crucial to implement robust and modern fraud prevention strategies.

Liveness Detection: Ensure that your interactions are with genuine humans, not sophisticated bots or recorded messages.

Multi-Factor Fraud Prevention and Authentication: Pair liveness detection with device recognition, behavior analysis, and more to increase your fraud detection capabilities.

Early Risk Detection: Address potential fraud threats before they escalate.

Negative Voice Matching: Identify fraudsters when tactics are used to change or mask the calling phone number.

Continuous Fraud Detection: Automate your comprehensive fraud risk profiles and increase the accuracy of fraud prediction.

Don’t miss this opportunity to enhance your fraud protection strategies and safeguard your organization!

Your expert panel

Tara Garnett

Sr. Product Manager, Authentication Products, Pindrop

Timothy Mohan

Senior Director, Fraud Prevention & Authentication Operations, Pindrop

WEBINAR

The Real Threat of Deepfakes to Your Contact Center

Deepfake attacks are constantly evolving. Attend our upcoming webinar and arm yourself with knowledge on the latest deepfake trends and the tools that can help catch them.

From the conversation with Amit Gupta, VP of Product at Pindrop, and Bennett Borofka, Partner Solutions Architect at Amazon Web Services, you can expect to:

Learn about the rise of deepfakes and the threats they pose to your contact center

Gain an understanding of Pindrop’s fraud and deepfake detection solutions and how they can help mitigate fraud losses

Discover how Amazon Connect and Pindrop have teamed up to make the integration process efficient and worthwhile

Meet the Experts

Amit Gupta

VP Product, Pindrop

Bennett Borofka

Partner Solutions Architect, Amazon Web Services

News consumption is changing, especially during election cycles

Scrolling on social media for hours on end has yet another unforeseen consequence: it’s altered the way that the American public consumes the news—and, by extension, statements from political leaders. According to the Pew Research Center, “half of US adults [are getting] news at least sometime from social media.” When we consume our news on social media, we may assume that the information we’re seeing is honest and credible. Yet, as a recent parody that uses AI-generated voice cloning of VP Kamala Harris implies, we can’t always believe what we’re hearing. 

As AI evolves, one troubling fact is emerging: global leaders and average citizens alike can fall victim to voice cloning without their consent. Though the industry is looking towards safety measures like watermarking and consent systems, those tactics may not be enough.  

How it started

At 7:11 PM ET on July 26, 2024, Elon Musk reposted a video on X from account @MrReaganUSA. In a follow-up video, @MrReaganUSA acknowledged that, “the controversy is partially fueled by my use of AI to generate Kamala’s voice.” Our research was able to determine more precisely that the audio is a partial deepfake, with AI-generated speech intended to replicate VP Harris’s vocal likeness alongside audio clips from previous remarks by the VP. 

As of July 31, 2024, Musk’s post was still live and had over 133M Views, 245K reposts, and 936K likes.  Another parody video of VP Harris was posted to X by @MrReaganUSA on July 31, 2024. 

Our analysis of the deepfake

When our research team discovered Musk’s post, they immediately ran an analysis using our award-winning Pindrop Pulse technology to determine which parts of the audio were manipulated by AI. Pulse is a tool designed for continuous assessment, producing a segment-by-segment breakdown and analyzing for synthetic audio every 4 seconds. This is especially useful in identifying AI manipulation in specific parts of an audio file—helping to spot partial deepfakes.

Graph showing detecting of synthetic voice in the VP Harris parody. Analysis results show that the audio is likely to contain synthetic speech.

Synthetic vs. non-synthetic audio

After denoising the audio to reduce the background music, Pulse detected fifteen 4-second segments as “synthetic” and six 4-second segments that were not synthetic, which leads us to believe that this is likely a partial deepfake. 

With Pulse’s liveness detection capability, our research team found three clips of VP Harris’s previous remarks in the parody video. Each clip, however, was removed from its original context. Listen below:

 


This audio was taken from a real speech, but altered to repeat in a loop. 

 

VP Harris misspoke in this speech. That audio was used here. 

 

This audio is also from a real speech

Tracing the source and identifying inadequate AI safety measures

Our research team went one step beyond this breakdown: they identified the voice cloning AI system that was used to create the synthetic voice. Our source attribution system identified a popular open-source text-to-speech (TTS) system, TorToise, as the source. TorToise exists on GitHub, HuggingFace, and in frameworks like Coqui. It’s possible that a commercial vendor could be reusing TorToise in their system. It’s also possible that a user employed the open source version. 

This incident demonstrates the challenges with watermarking to identify deepfakes and their sources, an issue Pindrop has raised previously. While several of the top commercial vendors are adopting watermarking, numerous open-source AI systems have not adopted watermarking. Several of these systems have been developed outside the US, making enforcement difficult.

Pindrop’s technology doesn’t rely on watermarking. Instead, Pulse detects the “signature” of the AI generating system. Every voice cloning system leaves a unique trace, including the type of input (“text” vs “voice”), the “acoustic model” used, and the “vocoder” used. Pulse analyzes and maps these unique traces against 350+ AI systems to determine the provenance of the audio. Pindrop used this same approach in previous incidents, including the Biden Robocall deepfake in January, which Pulse determined was created by ElevenLabs, a popular commercial TTS system. 

Through additional research, we identified three platforms that offer AI-generated speech that mimics VP Harris’s voice. Those include TryParrotAI, 101soundboard, and jammable. We also found that 101soundboard seems to be using the TorToise system.

Some commercial vendors are considering adopting measures, like consent systems, to mitigate the misuse of voice cloning; however, with open-source AI systems, these measures are difficult to enforce. While implementing consent systems is a step in the right direction, there isn’t a consistent standard or third-party validation of these measures. 

Why information integrity must be top-of-mind

While this audio was labeled as a “parody” in the original post, now that it’s available online, it can be reshared or reposted without that context. In other situations online, like accepting cookies on a website or verifying your identity with your online bank, governments have established laws to protect consumers. However, AI and deepfakes are a new and rising threat–with little to no guardrails to prevent misuse. 

That’s why maintaining the integrity and authenticity of information that’s shared online—especially as we near the 2024 election—should be a top priority. Not doing so can be damaging to public trust and the belief in our most important and foundational systems. 

Putting up protections to help preserve truth

Good AI is sorely needed to mitigate the societal effects of bad AI. As a leader in the voice security space for over a decade, Pindrop is leading the fight against deepfakes and misinformation, with the goal of helping to restore and promote trust in the institutions that are the bedrock of our daily lives. Our Pulse solution offers a way to independently analyze audio files and empower organizations with information to determine if what they’re hearing is real. Read more here about our deepfake detection technology and how we’re leading the way in fighting bad AI.

 

Disclaimer

This is an actively developing investigation. The information presented in this article is based on our current findings and analysis as of August 1, 2024. Our team is actively staying alert, investigating and uncovering new trends in deepfake and voice security-related incidents. Follow us on LinkedIn or X for any new insights.

Truth in the age of AI

With generative AI, bad actors can spoof audio and video of anyone—especially of global leaders. As deepfakes rise, we at Pindrop have been focused on answering the question: is this human or AI? 

When an online post has indicators of authenticity, like credible account activity, it can be nearly impossible to tell what’s real and what’s not. A recent deepfake of Elon Musk appears to be a straight-forward cryptocurrency scam, but the aftereffects demonstrate the complexities of deception and forces us to evaluate the importance of information validity. We aim to help solve that problem—and get to the truth—using Pindrop Pulse, our audio deepfake detection technology.

How it started

On Tuesday, July 23, 2024 at 10:30 pm ET, members of Pindrop’s research team discovered what appeared to be a live stream of Elon Musk on YouTube. We quickly determined that the live stream was actually a 6-minute 42-second AI-generated audio loop that mimics Elon Musk’s voice, discusses current U.S. politics and the 2024 election, and its potential effects on the future of cryptocurrency. The deepfake then urges the audience to scan a QR code, go to a “secure site,” and “effortlessly double” their cryptocurrency. 

At one point, the stream attracted 140K viewers and was live for at least 17 hours.  

Why did the scam appear credible?

We’re used to looking for clues that signal authenticity in the media we consume, like verification badges, account activity, and more. But those signals are easily spoofed and can’t always be trusted. Here’s how this scam made itself appear real: 

  • Account legitimacy: The fraudulent account had a complete profile with a verification badge, 162K subscribers, over 34M total views, and was active since 2022. The account closely resembled Tesla’s official account, so at first glance, viewers may have struggled to spot that the account was fraudulent. 
  • Reputable speaker: By choosing Elon Musk, a vocal leader in the cryptocurrency space, the fraudsters added a sense of legitimacy to their scam–helping them better trick viewers. 
  • Staying close to the truth: The statements in the video are similar to previous remarks made by Musk. By repeating or slightly adjusting Musk’s previous remarks, the video likely raised fewer red flags for viewers. 

Leveraging liveness detection to spot the deepfake

Our audio deepfake detection technology, Pindrop Pulse, generated a segment-by-segment breakdown of a portion of the audio, with real-time results every 4 seconds. Pulse detected segments of synthetic voice in the audio—and concluded that it was likely a deepfake. We reached out to YouTube to report the video and, as of July 24, 2024 at 1:30 pm ET, the account had been taken down.

Graph showing the detection of synthetic voice in an audio clip.

With our new source attribution technology, Pulse was also able to identify ElevenLabs as the voice cloning vendor that was used to create the deepfake, as we had done previously with the President Biden Robocall incident earlier this year. We reached out to ElevenLabs for them to investigate further.

Defending against deepfakes

Generative AI is powerful—and can be a force for good—but it can also be weaponized to deceive us. When our senses aren’t enough to validate the truth, we need to turn to technology that can assist us. Pindrop Pulse, our advanced audio deepfake detection technology, integrates liveness detection to help distinguish between live and synthetic audio. This technology empowers you with information to assess if what you’re hearing is real—and helps bring trust to the forefront of the media we consume.

Pindrop is at the forefront of voice security innovation, especially in combating the sophisticated threats introduced by deepfake technology. We’re proud to present the Pindrop® Pulse Deepfake Warranty, a first-of-its-kind warranty to support trust and security in voice communications. This pioneering warranty is part of Pindrop’s commitment to innovation and its customers’ safety, providing reimbursement in the event of certain losses due to synthetic voice fraud (terms and conditions apply).

Detect synthetic voice fraud

Integrating the full Pindrop® Product Suite into your operations for eligible calls is a key component to unlock access to the Pulse Deepfake Warranty1. This warranty reimburses Pindrop customers for certain losses from synthetic voice fraud2, with reimbursement levels correlated to your organization’s baseline annual subscription call volume on the date of the synthetic voice fraud. The Pindrop® Product Suite is comprised of:

  • Pindrop® Protect: A sophisticated fraud detection system that scrutinizes a wide range of indicators for suspicious behavior throughout the fraud event lifecycle, using voice biometrics, device analysis, and behavioral patterns to flag potential fraud.
  • Pindrop® Passport: This multi-factor authentication solution leverages voice biometrics, Phoneprinting® technology , and behavioral analysis to accurately verify callers to help ensure secure and user-friendly access for genuine customers.
  • Pindrop® Pulse: At the forefront of combating deepfake and synthetic voice threats, Pulse employs advanced liveness detection and voice analysis to identify synthetic voice attacks in real time, combating the latest deepfake threats.

The Pindrop® Product Suite, supported by the Pulse Deepfake Warranty, delivers a robust framework for protecting voice interactions against the sophisticated landscape of fraud and reinforces your organization’s security posture. 

Key aspects:

  • Reimburses against synthetic voice fraud losses on eligible calls that occur in the IVR or in the contact center when the Pindrop Scores do not alert to that risk.  
  • Offers reimbursement up to $1 million, with reimbursement caps tied to annual subscription call volumes.
  • Available at no additional cost to Pindrop customers who have a 3-year subscription to the entire Pindrop Product Suite.

Why Choose Pindrop?

Since its inception in 2011, Pindrop has pioneered voice security technology, serving global leaders across various industries. Our comprehensive suite of products is designed to authenticate legitimate customers and detect fraudulent activities. By doing so, Pindrop solutions help fortify your defenses against fraud and enhance the overall customer experience through secure, seamless interactions. This dual focus on security and user experience sets Pindrop apart, making it a trusted partner in the ongoing battle against voice fraud and the emerging challenges of synthetic voice technologies.

As deepfake technology advances, posing new challenges in cybersecurity, Pindrop is dedicated to helping its customers with effective detection strategies. The Pindrop Pulse technology is an integral part of the Pindrop® Product Suite, and it boasts a 99% deepfake detection rate with minimal false positives. The Pulse Deepfake Warranty embodies our confidence in the Pindrop Product Suite’s ability to detect synthetic voice fraud.3

Elevate your security strategy

The Pulse Deepfake Warranty allows you to confidently approach the fight against synthetic voice fraud. Supported by Pindrop’s sophisticated detection technology and up to $1 million in reimbursement4, your organization can better face the challenges posed by deepfake attacks.

Discover how the Pulse Deepfake Warranty backstops our award-winning technology. Contact us to schedule a consultation with one of our experts today.

 

1. Additional terms and conditions apply.
2. Additional conditions apply. Eligible reimbursement amounts vary depending on subscription call volume. See Warranty terms for details.
3. https://www.pindrop.com/blog/unmatched-performance-pindrops-liveness-detection-and-the-waterloo-study
4. Eligible reimbursement amounts vary depending on subscription call volume. See Warranty terms for details.

Singapore will increase security regarding deepfake scams in the future. Dr. Tan Wu Meng (Jurong GRC) said the issue of deepfakes is a serious matter for all democracies and called on the government to look at ways of electronically watermarking content as proof that it is accurate in the future.

Emphasizing the critical role of authenticity for the functioning of any democracy, Dr. Tan thinks, “If we can no longer easily discern what is real and what is not, a functioning democracy becomes impossible. No government, irrespective of political affiliation, can effectively govern without this essential foundation for democratic discourse.”

A recent incident involving Prime Minister Lee Hsien Loong, whose likeness was used to promote investment products, is another example of the computing power of deepfakes and how realistic and believable they’ve become.

Here’s what you need to know about recent changes and how Singapore will increase security around handling and protecting against false news and scams.

  1. Under OCHA, designated online service providers could be required to implement further measuresThe Online Criminal Harms Act (OCHA) was passed in July 2023, allowing the government to issue directions to online platforms to prevent potential scam-related accounts or content from reaching Singapore users. The country is working with partners to beef up security and provide other measures to improve the accuracy of online media.

    They also mandate that public education programs on digital media and information literacy, cybersecurity, and scam efforts must be adequate to protect and maintain a safe digital society. Some of these added programs include the National Library Board’s (NLB) signature S.U.R.E. (Source. Understand. Research. Evaluate.) campaign, the Cyber Security Agency of Singapore (CSA)’s national cybersecurity campaign “Unseen Enemy,” and the Singapore Police Force and the National Crime Prevention Council’s anti-scam campaign “I can ACT against scams.” The Scam Public Education Office (SPEO) was also set up in 2023 to drive anti-scam public education efforts and expand outreach.

  2. A $20M initiative passed in the Singapore Parliament on Jan 10th, 2024The main goal of this new initiative is to grow new capabilities to keep pace with deepfakes and prevent misuse. It will bring together companies and technologies to increase the ability to create a safer internet.

    The goal is to detect fake claims and dangerous deepfakes. These research efforts will also help develop new technologies needed to protect against harmful information and misuse on the internet.

    In a year of record elections, with at least 40 countries and territories heading to the polls, this has raised concerns about how such manipulated content can influence voters.

  3. Funds will be used to boost the research capabilities of groups like CATOSThe Centre for Advanced Technologies in Online Safety (CATOS), launched in the first half of this year, aims to enhance industry collaboration and knowledge exchanges in deepfake detection. The Agency for Science, Technology, and Research (A*STAR) will host the group, focusing on building and customizing tools to detect harmful content, including deepfakes and non-factual claims, and test technologies like watermarking and content authentication.

    The CATOS annual event, the Online Trust and Safety (OTS) Forum, will be held on 15th May and will feature international experts and showcase the first version of technology solutions for trial and adoption. Over 100 academics and 30 professionals participated in CATOS’s work. In the future, they’ll draw on capabilities from local and global research institutes to strengthen collaborations on deepfake technology.

    How is Pindrop Involved? Pindrop will showcase its expertise at the OTS Forum on 15th May at the Ritz Carlton on the topic of “Real-time audio deepfake detection.” As an IMDA-accredited company, Pindrop is uniquely positioned to address the multifaceted challenges posed by deepfakes at scale. Visit our complimentary booth to connect and explore our innovative solutions.

Raising awareness around the role of deepfakes and understanding of regulatory framework 

Pindrop is honored to be a part of the IMDA Accreditation, which provides Singapore-based ICM companies with a streamlined procurement process for government ICT projects. Technological advancements, ongoing research in deepfake detection and attribution, and collaboration between industry stakeholders and policymakers are some steps it will take to prevent deepfakes from going mainstream and disrupting elections, media trust, and more. 

Learn more about Pindrop Pulse and how we’re helping countries take action to address the multifaceted challenges posed by deepfakes.

Discussions surrounding the authenticity of content and claims of AI-generated media often occur with a very low bar of scientific analysis. To help establish a minimum standard of analysis and explainability, Pindrop is sharing its analysis of a recent deepfake incident at Pikesville High School in Baltimore. Our analysis, unlike the claims of other deepfake detection vendors, shows that this audio is not AI-generated but has been doctored.

A Case Study in Objective Analysis: The Pikesville High School Incident

On January 16, 2024, a recording surfaced on Instagram purportedly featuring the principal at Pikesville High School in Baltimore, Maryland. The audio contained disparaging remarks about Black students and teachers, igniting a firestorm of public outcry and serious concern.

Given the gravity of the accusations and the potential repercussions for the principal’s career and the community of Pikesville High School, it was critical to rigorously verify the authenticity of the audio. The situation was further complicated as several deepfake detection vendors and media outlets quickly declared the recording a deepfake, often without providing substantial evidence or detailed analysis, based on subjective evaluations of tonal quality or delivery style (prosody). Such rushed conclusions without detailed scientific explanations risked serious misjudgments and could unjustly sway public opinion and impact legal proceedings. Our intent is not to substantiate or refute any law enforcement conclusions. We’re primarily focused on the liveness detection of the specific audio shared publicly.  

In light of these developments, Pindrop undertook a comprehensive investigation, conducting three independent analyses to uncover the truth:

  • Deepfake (Liveness) Detection of the January Audio Sample: We sought to determine if the audio displayed characteristics typically associated with synthetic speech.
  • Deepfake (Liveness) Detection of a Public Speech Sample from November 2018: We established a baseline for Mr. Eiswert’s voice characteristics by analyzing a verified live speech.
  • Comparison of Both Audio Samples for Voice Similarity: The critical final analysis compared the controversial January recording with the 2018 speech to assess whether both could originate from the same person.

 

The results of our thorough investigation led to a nuanced conclusion: although the January audio had been altered, it lacked the definitive features of AI-generated synthetic speech. Our confidence in this determination is supported by a 97% certainty based on our analysis metrics. This pivotal finding underscores the importance of conducting detailed and objective analyses before making public declarations about the nature of potentially manipulated media.

How We Came to Our Conclusion


Deepfake Analysis

Pindrop analyzed the audio using Pindrop® Pulse, our deepfake detection engine. As you can see from an image of the UI of our solution, Pindrop® Pulse broke down the 46-second audio segment into 11 segments that were 4 seconds each. 8 of the 11 segments were scored as live, and three as indeterminate. Pindrop® Pulse’s analysis shows an overall deepfake score of 20. The deepfake score for an audio can range from 0 to 100. A score of 0 means that Pindrop® Pulse is almost certain the audio is not synthesized, while a score of 100 means that Pindrop® Pulse is almost certain the audio is synthesized. At the threshold of 20, the confidence of the system in determining that this is not synthesized is 97% (based on a similar cohort).

 


For comparison, we conducted a similar deepfake analysis on a verified genuine recording of the school principal from November 2018. The resulting deepfake score is 12. At this threshold, Pindrop’s system’s confidence in determining that it is not synthesized is 99%.

 


Voice Similarity Analysis

Having concluded that the audio was not AI-generated, we next tested it for human impersonation. We conducted a voice similarity analysis comparing the verified genuine recording with the contentious audio sample. After applying dereverberation and noise reduction techniques, the comparison revealed nearly identical voice characteristics in both samples, with a likelihood of approximately 99%. 


Spectral Analysis

Having concluded that the audio was based on the genuine voice of the school principal, we tested it for other manipulations.  We employed spectral analysis to examine audio files visually. In a spectrogram, the vertical axis represents frequency (in Hertz), the horizontal axis represents time, and the brightness indicates amplitude.

 


Upon review, we detected six short segments of complete silence interspersed with noisy speech—each silent segment about 100 milliseconds long. Such patterns are not typical in natural speech, suggesting tampering and digital splicing.

 

Embracing Feedback and the Pursuit of Continuous Improvement

At Pindrop, we value transparency and are committed to continuous improvement. We actively seek and incorporate feedback from diverse fields—academia, industry, and government—to enhance our techniques and refine our analyses. Our confidence in the liveness of the Jan 2024 audio sample is 97%, indicating a 3% chance of our assessment being wrong. Our motivation for sharing this analysis is to invite technical review and to understand the arguments as to why our assessment could be wrong so we can continue to make our detection technology more robust.  

Conclusion

In an age where digital authenticity is constantly challenged, Pindrop remains a steadfast ally to media companies, offering robust tools and analyses to help ensure the content they distribute is verified and truthful. By sticking firmly to a data-centric approach, we can help illuminate the truth and foster a culture of accountability and precision in media reporting.

Interested in how our deepfake detection capabilities can help your organization? Chat with an expert today.

Several of our customers have asked Pindrop if our recently launched deepfake detection product, Pindrop® Pulse, can detect synthetic content generated by OpenAI’s1 new Voice Engine. Pindrop has now responded by becoming the industry’s first deepfake detection engine to announce 98.23% accuracy for OpenAI’s Voice Engine 2022 using a dataset of 10,000 samples from OpenAI’s available text-to-speech engine. This demonstrates that deepfake detection, combined with Pindrop’s voice analysis and a multi-factor approach continues to be a robust authentication strategy. We encourage OpenAI to release a representative dataset of its Voice Engine 2024 to allow Pindrop to extend our benchmarking to the latest release and provide clarity to the industry that has invested hundreds of millions of dollars in building robust multi-factor authentication systems using voice biometrics.

Understanding OpenAI’s Voice Engine Announcement 

On March 29, 2024, OpenAI published a blog post on Navigating the Challenges and Opportunities of Synthetic Voices. In this write up, OpenAI included 17 audio samples of their text-to-speech (TTS) model called Voice Engine. Their recommendation is to phase out voice-based authentication as a security measure for accessing bank accounts and other sensitive information. 

As researchers, when we run an experiment or analyze data, we need a large set of data to support conclusive (i.e., statistically “significant”) findings. No matter what you’re studying, the process for evaluating significance is the same – you start by stating a null hypothesis (i.e., the thing that you’re trying to disprove) and collect a sample set of data to test the hypothesis. Since OpenAI only released 17 audio samples, the 2024 dataset is not representative enough to make a statistically conclusive determination. Therefore, for our research we focused on OpenAI’s first TTS engine that was released in 2022. This release included 2 TTS engines – tts-1 and tts-1-hd and allowed users to generate synthetic content from six voices pre-created by OpenAI. On March 29, 2024, OpenAI announced2 the ability to generate synthetic content by cloning the voice sample provided by the end-user. OpenAI did not clarify if the 2024 release uses the same TTS engine as the 2022 release or if it reuses some of the underlying components. If the 2024 release uses the same TTS engine as the 2022 release or reuses its components, then Pindrop expects that its accuracy for the 2024 release will be similar to the accuracy for the 2022 release without any additional adaptation.  

Pindrop® Pulse’s detection accuracy of OpenAI’s Voice Engine  

Pindrop’s researchers generated 10,000 audio samples using OpenAI 2022 TTS Voice Engine’s six pre-made voices with both the models provided by OpenAI – the standard tts-1” model and the “tts-1-hd” model. Pindrop’s dataset of 10,000 generated audio samples has an average duration of 4.74 seconds of net speech per sample. The figure below plots the distribution of the net speech.

We analyzed this dataset using Pindrop® Pulse3 deepfake detection system. Using the default thresholds, Pulse correctly detected 98.23% of the deepfakes, and mistakenly labeled only 0.14% of audio samples as “not synthetic” (i.e., false negative), while the remaining 1.63% were deemed inconclusive. We also tested the 17 samples from OpenAI’s Voice Engine 2024. Pulse performed at a similar level of accuracy. The OpenAI dataset included utterances in multiple languages and our product’s accuracy was robust across these languages. It’s important to note that these results were based on the Pulse model never being trained on OpenAI’s 2022 or 2024 Voice Engines. We expect our product’s accuracy to improve beyond 99% once we adapt our model on a larger dataset from these engines, based on our experience with other new models (e.g., Meta Voicebox4). 

We went one step further and analyzed the audio from the OpenAI dataset to try to deduce the possible subcomponents used in OpenAI’s TTS system. Our analysis suggests that it is likely composed of an XTTS variant (e.g. XTTS v15 or XTTS v26) and a GAN-based non-autoregressive neural vocoder (e.g., HiFi-GAN or Multiband Mel-GAN) with a likelihood of 75%. While OpenAI does not provide details about what TTS architecture they used, this additional analysis can further provide insights into further improving the Pindrop Pulse liveness detection system.  

Needless to say, we were expecting these results. The reason is that Pindrop® Pulse has been tested against 350+ TTS engines to date. These include several of the leading voice cloning systems, some of which arguably generate far more realistic output than OpenAI Voice Engine. In fact, Pindrop has partnered with Respeecher7, an Emmy award winning, AI voice cloning technology, to help ensure Pindrop Pulse is continuously tested against the industry’s most sophisticated voice clones. We are expanding these partnerships with several other leading TTS engines as well. 

Recommending an even better approach to Voice Engine safety 

In its March 29th blog8, OpenAI has a section on “building Voice Engine safely” emphasizing its AI safety principles. We appreciate OpenAI’s commitment to AI safety by not releasing their new text-to-speech engine. Additionally, OpenAI has used recommended usage policies (e.g., explicit informed consent on cloning, explicit disclosure of synthetically generated media, etc.).

Pindrop has 2 suggestions for OpenAI to consider to further improve its AI safety stance: 

  1. We encourage OpenAI to release 2024 Voice Engine dataset for Pindrop’s technology to analyze to help the industry make their own data-based decision on how to fortify their defenses against deepfakes from OpenAI’s Voice Engine. 
  2. We applaud OpenAI’s emphasis on watermarking. But it’s important to remember that watermarking is neither sufficient nor robust9. The research community has demonstrated its vulnerability10. Pindrop believes that it is preferable to combine watermarks with liveness detection11 and invites OpenAI to partner with Pindrop to strengthen the ability of Good AI to fight the bad actors using Gen AI. 

What does this mean for customers using or exploring voice biometrics 

Audio generated by high-end TTS systems can fool some voice authentication systems. Therefore, authentication systems need to have the capability to detect all types of AI generated audio-modulation, synthetic voices, voice clones, and replays of voices. This was our main motivation to launch Pindrop® Pulse as an integrated capability of our multi-factor authentication platform Pindrop® Passport.  If you have doubts that your authentication system has strong deepfake detection or not, contact Pindrop and test Pindrop® Passport with Pindrop® Pulse against the most sophisticated TTS systems. 

Pindrop continues to stand behind our multi-factor authentication platform Pindrop® Passport that includes the new add-on Pindrop® Pulse, which has been trained on a dataset from 350+ TTS systems. Our deepfake engine has demonstrated >95% accuracy12 against leading TTS engines, including the major new ones launched in the last 12 months such as Meta’s VoiceBox, ElevenLabs, PlayHT, Resemble.AI and Vall-E.

How can OpenAI help put this speculation to bed

The OpenAI blog has caused unnecessary fear and panic in the industry. Decisions made in an environment of speculation are usually suboptimal. We encourage OpenAI to help the industry arrive at their own fact based decisions by sharing a representative dataset of synthetic voices. Pindrop has shown the path towards better collaboration between deepfake detection technology providers and voice cloning solution providers7. Pindrop’s continued research investment in deepfake detection will ensure Pindrop® Pulse will continue to safeguard our customers’ investments in voice biometrics and multi-layered authentication systems.

 

Digital security is a constantly evolving arms race between fraudsters and security technology providers. In this race, fraudsters have now acquired the weapon of Artificial Intelligence (AI) that has posed an unprecedented challenge to solution providers, businesses and consumers. There are several technology providers, including Pindrop, that have claimed to detect audio deepfakes consistently. NPR, the leading, independent news organization, put these claims to the test. NPR recently ran an experiment under the special series “Untangling Disinformation”, to assess whether current technology solutions are capable of detecting AI generated audio deepfakes on a consistent basis. 

While various providers participated in the experiment, Pindrop® Pulse emerged as the clear leader, boasting a 96.4% accuracy rate in identifying AI-generated audio1. The NPR study included 84 clips of five to eight seconds each. About half of them were cloned voices of NPR reporters and the rest were snippets of real radio stories from those same reporters.

Pindrop Pulse liveness detection technology accurately detected 81 out of the 84 audio samples correctly, translating to a 96.4% accuracy rate. In addition, Pindrop Pulse detected 100% of all deepfake samples as such. While other providers were also evaluated in the study, Pindrop emerged as the leader by demonstrating that its technology can reliably and accurately detect both deepfake and genuine audio. 

A few additional notes on these results

  • The voice samples evaluated in the study were relatively short utterances of 6.24 seconds. With slightly longer audio samples, the accuracy would increase even further2.
  • Pindrop Pulse was not trained previously on the PlayHT voice cloning software that was used to generate the audio deepfakes in this study. This is the scenario of zero-day attack or “Unseen” models that we highlighted in a previous study. This showcases Pindrop® Pulse unmatched accuracy, one of the main tenets of our technology. On known voice cloning systems, our accuracy is 99%3. In fact, Pulse is constantly evolving and is being trained on new deepfake models which ensures that its detection accuracy continues to increase4.
  • The audio samples used in this study are very difficult for humans to detect5. But Pindrop was still able to detect these deepfakes with a 96.4% accuracy.  
  • Pindrop Pulse is a Liveness detection solution that identifies whether an audio is created by using a real human voice or a synthetic one. If Liveness detection is combined with multiple factors such as voice analysis, behavior pattern analysis, device profiles and carrier metadata, the deepfake detection rate would be even higher2
  • The three audio samples that Pindrop missed, do not present a security threat since those were genuine voices. In typically authentication applications, individuals would have a second chance to authenticate into the systems using other factors.

The study also put a spotlight on several tenets that security technology providers should follow to improve their deepfake detection accuracy, such as training artificial intelligence models with datasets of real audio and fake audio, making their systems resilient to background noise and audio degradations and training their detectors on every new AI audio generator on the market. 

Pindrop® Pulse is built on these core tenets6 and is committed to keeping our solutions ahead in the race of stopping audio deepfakes and fraud. Pindrop provides peace of mind for businesses in an era of uncertainty. We’re grateful for the trust and support from our team, customers, and partners, propelling us forward in security innovation.    

ON-DEMAND WEBINAR

Voice Deepfakes in the Contact Center: Ask Us Anything

The rise of advanced audio manipulation and generative AI technology has sparked concerns over digital deception, fraud, and disinformation. In response, we launched Pindrop® Pulse last month, offering real-time deepfake detection for contact centers.

 

Whether you’re a seasoned IT professional or curious about deepfake detection, join our Pindrop voice security experts for a live Ask Me Anything (AMA) session. It’s an opportunity to ask questions and gain valuable insights on how your business can navigate the complexities of this rapidly evolving digital landscape.

Real-world examples of deepfake attacks on businesses, and how Pindrop Pulse can help

How Pindrop’s liveness detection technology works to help protect organizations and their consumers

The biggest security threats of 2024 and how you can stay ahead

Your expert panel

Amit Gupta

VP, Product, Research & Engineering, Pindrop

Elie Khoury

VP, Research, Pindrop

Aarti Dongargaonkar

Sr. Manager, Software Engineering, Pindrop

Sarosh Shahbuddin

Director, Product & Engineering, Pindrop

Pindrop recently hosted a webinar on Voice Theft and How Audio Deepfakes Are Comprising Security on Tuesday, March 5th. Top C-suite execs from Pindrop shared their research and insights on how Pindrop Pulse is leading the battle against deepfakes with its latest technology.

As we’ve seen in the media recently, Generative AI continues to advance at an unprecedented pace. Pulse is an actionable product and solution designed to help you build customer trust as it detects deepfakes from infiltrating your business and workflows. Pulse is a new add-on module that is a fully integrated engine that works with Pindrop Protect and Pindrop Passport as a sophisticated deepfake detection solution. 

Let’s explore the common mysteries surrounding deepfakes and the new technology discussed in the webinar. We’ll also discuss what your company can do to leverage technology to protect itself better.

What is a Deepfake?

Deepfake is a term that first emerged in 2017 and describes synthetic media that replaces one person’s likeness and voice with another using artificial intelligence (AI) technologies. What changed in 2023 was the emergence of generative AI and the ability for anyone to create deepfakes at minimal cost with public voice data.

How Does Liveness Detection Work?

Liveness detection identifies deepfake audio by detecting unique patterns that differ from speech, like frequency changes and spectral distortions. Pindrop’s technology uses AI to produce a “fakeprint” — a unit-vector low-rank mathematical representation preserving the artifacts that distinguish between machine-generated and genuine speech. The results, even with the largest banks, were astounding. 

Here’s what top customers leveraging the product have said:

>>This article also details how Pulse analyzes emerging themes across various beta partners.

1) Clearing Up Why and How Deepfake Production Is on the Rise

Since deepfakes are on the rise across every industry and causing real threats to call center security, companies can rest easy with the right technology to detect and protect against them. In January, Pindrop released the Pulse liveness detection add-on module as a beta to customers, who discovered the patterns of deepfake attacks and how bad actors were adopting the technology to attack call centers. 

>>When fraudsters can record a voice and create deepfakes within minutes, it becomes imperative to have technology to combat and protect IVRs against the risks.

2) Uncovering How Fraudsters Are Infiltrating Top Security Systems Using Deepfakes

The most significant areas of consumer concern relate to areas where sensitive personal identifiable information (PII) is at risk, and false public information could have negative consequences. Notably, most concerns tend to be in healthcare and insurance but need to be more balanced by banking, according to research done by Voicebot.ai and Pindrop. Government and media are also expressing concern and are subject to deepfake-laced misinformation.

 

Types of attacks Pindrop has caught firsthand include IVR Navigation, targeted account discovery, profile changes, and voice bots being trained to mimic the IVRs at banks. The webinar highlighted two recent research articles to break this down further:

The learnings noted in each article explain how most biometric authentication systems were neither designed nor operationalized to protect against the sophisticated deepfakes attacking call centers today and how paramount it can be to implement a solution that can keep out the bad actors using Gen AI.

3) A Multi-Pronged Approach Will Be Required to Catch Deepfakes in the Future

What became apparent during the webinar and in reading the articles mentioned was that companies will need to invest in multi-pronged analysis to protect against deepfakes in the future. Humans won’t be able to do it alone, and the technology is advancing quickly. 

This entails examining metrics in specific scenarios or variable combinations, such as False Acceptance Rate (FAR) and Synthetic Voice Injection across different net speech intervals with more detailed analysis to increase the likelihood that fraud won’t get through a system’s sensitivity with robust fraud detection. Getting the right system to check all these variables also ensures you can preserve the customer experience.

CONCLUSION: How to Stop Deepfake Risks in the Future

As deepfake creation rises, it’s important to review and implement the best technology that combats risks to your business and customer experience. Pindrop Pulse is the only technology to safeguard your business against fraud and trust erosion caused by deepfakes. 

Pindrop’s technology uses seven key features to detect deepfakes in their track:

  1. It’s comprehensive with liveness detection, so it can spot text-to-speech, voicebot, voice modulation, and voice conversion — detecting replay and modulated voices.
  2. It’s a proprietary AI model developed over eight years without relying on watermarking or specific voice-generated tools.
  3. It uses a unique dataset of over 20M unique audio utterances.
  4. It’s language and phrase agnostic.
  5. Pindrop is leading the industry with over 18 patents granted in deepfake and synthetic voice detection and over 130 in voice security.
  6. It’s academically benchmarked and validated ASV proof since 2017.
  7. It’s continually enhancing its technology to keep up with deepfake evolution.

Contact a rep today to schedule a demo and see how this new product works.

Voice biometric authentication systems were neither designed nor operationalized to protect against the sophisticated deepfakes attacking call centers today. It’s paramount that every call center using voice biometric authentication solution should test that their authentication processes can keep out the bad actors using Gen AI. 3Qi, a leading software QA solution provider, helped a Tier 1 US Bank test its call centers against the full variety of deep fakes. In this blog, we share their learnings and best practices from that exercise for everyone to learn from. – The Pindrop Team

3Qi Labs has always prided itself on its ability to break software. Founded as a software QA solutions provider, 3Qi Labs’ mandate has extended beyond executing predefined test plans to proactively seek out the corner cases that could precipitate defects. In software testing, constraints of time and resources dictate the scope, requiring prioritization of features critical to user experience and areas with the highest risk profiles. For Authentication systems, this translates to a balanced mix of Positive* and Negative* tests, with a focus on fraud detection capabilities due to the dramatic increase in biometric fraud incidents. 

Among various types of vulnerabilities, the challenge of testing and detecting Synthetic Speech Injection stands out. Although other fraud methods like voice recordings and impersonation remain relevant, the swift spread of deep fakes powered by advancements in Generative AI has made sophisticated synthetic voice technologies easily accessible to a wide audience at minimal cost. This reality places synthetic voice generation and detection at the forefront of our testing efforts for Authentication systems like the Pindrop solution.

The current state of synthetic speech generation technology

In a recent evaluation of Authentication platforms for one of 3Qi’s banking customers, we evaluated multiple AI-driven speech generation platforms. Some of the insights are highlighted below:

  1. Explosion in Number of Gen AI Systems : Today there are well over 120 Gen AI systems with a combination of text-to-speech and speech-to-speech systems. Among the technologies assessed were Eleven Labs, Descript, Podcastle, PlayHT, and Speechify. These entities, fueled by significant venture capital investments, are positioned to accelerate advancements in this space.
  2. Easy Accessibility to Sophisticated Attacks: Minimal effort is required to create convincing synthetic voice samples— we used 60 seconds of speech per tester. Not long ago you needed a 30 minute sample to generate an equivalent sample and Microsoft claims its VALL-E model can clone a voice from a 3-second audio clip
  3. Potential for Misuse: The efficacy of these technologies was demonstrated by the successful spoofing of multiple Authentication systems using our synthetic samples. It’s no surprise that the Federal Communications Commission (FCC) just outlawed AI generated robocalls, underscoring the need to protect the citizens against misuse of these technologies,  especially considering the vast amount of voice data accessible via social media.
  4. Affordability: These technologies are accessible to a broad audience due to the low cost. Our testing encompassed all the aforementioned platforms for as little as $1 to clone a voice! It’s no wonder that the low barrier to entry for utilizing these advanced technologies has resulted in Deepfake identity fraud doubling from 2022 to Q1 2023.

Best practices for testing systems against synthetic voice

As part of an Authentication platform evaluation, we employ a holistic testing approach covering a broad spectrum of demographic categories, various technologies, a range of input/environmental factors, and synthetic voice injection. 

Below is the outline of our approach for a recent evaluation for a Tier-1 US Bank:

  1. Proctored Sessions: Direct supervision of interactions for thorough scenario coverage and real-time result capture.
  2. Diverse Scenarios: Employing a mix of positive and negative tests with randomized scenario execution and Net Speech* variation (more below) across testers.
  3. Enrollment & Verification: Assessing user onboarding and verification efficiency, considering variables such as presence of background noise and speech clarity.
  4. Security & Fraud Detection: Validating systems against synthetic and replayed voice attacks, as well as distinguishing between different live voices.
  5. Demographic Representation: A broad participant demographic across age, gender, linguistic, and ethnic backgrounds.
  6. Technical Infrastructure: Utilizing a variety of mobile devices and networks, with synthetic voice generation facilitated through tools like Eleven Labs.


Central to our testing is the concept of
Net Speech*, a critical variable that directly influences the accuracy and reliability of Authentication systems. The amount of net speech provided is positively correlated with the system’s ability to generate a precise voice enrollment and, consequently, its capability to authenticate a caller or detect a fraudulent one.  By examining synthetic voice samples of varying lengths, we can identify the specific net speech duration at which synthetic or cloned voices begin to significantly affect false acceptance rates, a key factor in maintaining system integrity and user trust. Thus, net speech serves as a crucial variable in our evaluations, leading us to test across various intervals to determine the optimal net speech requirement per platform. This is vital for minimizing the risk of fraud while promoting a superior customer experience.

The art and science of measuring performance

The cornerstone of our analysis is the evaluation of False Acceptance Rate* (FAR) and False Rejection Rate* (FRR) across diverse data segments. This entails examining these metrics in specific scenarios or variable combinations, such as FAR for Synthetic Voice Injection across different net speech intervals, or even more detailed analyses, like verification FAR for synthetic voice injection among Spanish-speaking females with net speech < 6 seconds. While achieving statistical significance in niche scenarios can be challenging, the juxtaposition of FAR and FRR is critical. It increases the likelihood that the system’s sensitivity is finely tuned to balance robust fraud detection (low FAR) against user convenience (low FRR), essential for optimizing both security and customer experience.

Ultimately, our testing methodologies and processes are designed to arm decision makers with the data they need to objectively evaluate different Authentication platforms. The goal is to not only challenge and evaluate the efficacy of biometric authentication in Authentication systems, but also to ensure that enterprises can confidently integrate these technologies, bolstering both security and user satisfaction.

Glossary:

  • Positive Tests: Test cases where the system correctly identifies an authorized user’s voice.  They test the system’s reliability and effectiveness in recognizing and verifying authorized users, enhancing user trust and system integrity.
  • Negative Tests (aka Spoof Tests): Test cases where the system correctly rejects an unauthorized user’s voice.  These tests are essential for assessing the system’s security measures and its capability to safeguard against unauthorized access attempts.
  • Net Speech: Net Speech refers to the actual amount of speech content within a voice interaction, excluding any periods of silence or non-speech elements. Optimizing the Net Speech threshold is essential for efficient and secure user authentication. It impacts system responsiveness, user experience, and the ability to accurately authenticate users under various conditions. 
  • False Acceptance Rate (FAR): The percentage of Negative Tests where unauthorized users are incorrectly verified/recognized as authorized users. This is a key metric for maximizing the security of the system.
  • False Rejection Rate (FRR): The percentage of Positive Tests where authorized users are wrongly denied access by the biometric system, mistaking them for unauthorized users. Minimizing FRR is essential for user experience and satisfaction and overall efficiency.

Deepfakes are no longer a future threat in call centers. Bad actors actively use deepfakes to break call center authentication systems and conduct fraud. Our new Pindrop® Pulse liveness detection module, released to beta customers in January, has discovered the different patterns of deepfake attacks bad actors are adopting in call centers today. 

A select number of Pindrop’s customers in financial services opted to incorporate the beta version of Pulse into their Pindrop® Passport authentication subscription. Within days of enabling, Pulse started to detect suspicious calls with low liveness scores, indicating the use of synthetic voice. Pindrop’s research team further analyzed the calls to validate that the voices were synthetically generated. Ultimately, multiple attack paths were uncovered across the different customers participating in the early access program, highlighting that the use of synthetic voice is already more prevalent than the earlier lack of evidence might have indicated.

The following four themes emerged from our analysis across multiple Pulse beta customers:

    1. Synthetic voice was used to bypass authentication in the IVR: We also observed fraudsters using machine-generated voice to bypass IVR authentication for targeted accounts, providing the right answers for the security questions and, in one case, even passing one-time passwords (OTP). Bots that successfully authenticated in the IVR identified accounts worth targeting via basic balance inquiries. Subsequent calls into these accounts were from a real human to perpetrate the fraud. IVR reconnaissance is not new, but automating this process dramatically scales the number of accounts a fraudster can target.
    2. Synthetic voice requested profile changes with Agent: Several calls were observed using synthetic voice to request an agent to change user profile information like email or mailing address. In the world of fraud, this is usually a step before a fraudster either prepares to receive an OTP from an online transaction or requests a new card to the updated address. The experience for agents on these calls could have been more comfortable at best, and on one call, the agent updated the address successfully at the request of the fraudulent synthetic voice.
    3. Fraudsters are training their own voicebots to mimic bank IVRs: In what sounded like a bizarre first call, a voicebot called into the bank’s IVR not to do account reconnaissance but to repeat the IVR prompts. Multiple calls came into different branches of the IVR conversation tree, and every two seconds, the bot would restate what it heard. A week later, more calls were observed doing the same, but at this time, the voice bot repeated the phrases in precisely the same voice and mannerisms of the bank’s IVR. We believe a fraudster was training a voicebot to mirror the bank’s IVR as a starting point of a smishing attack. 
    4. Synthetic voice was not always for duping authentication: Most calls were from fraudsters using a basic synthetic voice to figure out IVR navigation and gather basic account information. Once mapped, a fraudster called in themselves to social engineer the contact center agent.

There are 4 main takeaways for Call Centers: 

  1. Deepfakes are no longer an emerging threat, they are a current attack method: Bad actors are actively using deepfakes to break the authentication systems in call centers and conduct fraud. Every call center needs to validate the defensibility of its authentication system against deepfakes. Review a professional testing agency’s best practices on how to test your authentication system against such attacks
  2. Liveness evaluation is needed independently and alongside authentication: catching and blocking pre-authentication reconnaissance calls can prevent fraudsters from gathering intel to launch more informed attacks.
  3. Liveness detection is most impactful when integrated into a multi-factor authentication (MFA) platform: Few fraudsters can dupe multiple factors, making MFA platforms a no-brainer choice for companies concerned about deepfakes. the Pindrop® Passport solution uses seven factors to determine authentication eligibility and returns high-risk and low voice match scores on many synthetic beta calls. In contrast, solutions relying on voice alone put customers at greater risk with reliance on the single factor most fraudsters are focused on getting past.
  4. Call Centers need continuous monitoring for Liveness: Different attacks target different call segments. Monitoring both IVR and Agent legs of call helps protect against both reconnaissance and account access attacks. 

Many companies are considering the future impact of Deepfakes, but it’s already here. A Fall 2023 survey of Pindrop customers showed that while 86% were concerned about the risk posed by deepfakes in 2024, 66% were not confident in their organization’s ability to identify them. Meanwhile, consumers expect to be protected, with about 40% expressing at least “Somewhat High” confidence that “Banks, Insurance & Healthcare” have already taken steps to protect them against risks in our Deepfake and Voice Clone Consumer Sentiment Report. While it may take time for attackers to move downstream from the largest targets, it’s clear that the threat of deepfake attacks is already here. It’s time to fortify the defenses. 

Learn more about the Pindrop® Pulse product here.

In a world where threats from Generative Artificial Intelligence continue to advance at an unprecedented pace, the need for robust cybersecurity solutions has never been more critical. At Pindrop, we pride ourselves on staying at the forefront of innovation, and today, we are thrilled to introduce a groundbreaking addition to our portfolio – the Audio Deepfake Detection Solution, Pindrop® Pulse.

The new solution is designed to fortify existing authentication and fraud detection products. This pioneering solution is engineered to detect audio deepfakes and voice clones in contact centers in real-time, setting a new standard for contact center security. With just 2 seconds of netspeech, this revolutionary solution has achieved an impressive 99% detection rate for known deepfake engines and over 90% detection for new or unseen deepfake generation engines while maintaining a minimal false positive rate of less than 1%. A liveness score, seamlessly integrated into a multi-factor fraud detection and authentication platform, enables automated real-time operationalization and post call-analysis

Every call center needs protection against deepfakes

With advancements in generative artificial intelligence (Gen AI), voice cloning has become a powerful tool that creates believable replicas of human voices. These artificially generated voices capture the mannerisms, cadences, and imperfections of human speech so well, rendering human ears ineffective at validating their veracity. This technology presents a new challenge for call centers to identify if it is a real human and not only if it is the right human. Similar to how captchas have become commonplace in online channels, deepfake detection leveraging AI detection technologies is now needed by every call center to stay one step ahead of fraudsters. 

AI that protects your call center against the advanced threat of Deepfakes

The threat of deepfakes is no longer in the future, but it is already here. Pulse employs liveness detection, a sophisticated technology that discerns deepfake audio by identifying unique patterns that come naturally to humans but are hard for machines to replicate at scale over sustained periods. Examples of such patterns include frequency distortions, voice variance, unnatural pauses, temporal anomalies, etc. Leveraging Deep Learning breakthroughs, Pulse analyzes these patterns to generate a “fakeprint” – a unit-vector low-rank mathematical representation preserving the artifacts distinguishing between machine-generated and generic human speech.

We’ve had the privilege to partner with some of the leaders in the banking and insurance industries as early adopters of this technology. First National Bank of Omaha (FNBO) was among the first cohort of customers to deploy this solution in their contact centers and saw a remarkable accuracy for the technology to identify synthetic and recorded voices augmenting their existing fraud prevention controls.

In an era where AI advancements bring both innovation and new threats, FNBO remains committed to protecting our customers and their information. In an effort to proactively combat the emerging threat of deepfakes, our partnership with Pindrop provides us with cutting-edge solutions that safeguard our customers’ information with precision. After rigorous testing, we’re very happy with the results – Pindrop’s technology ensures our defense mechanism is robust against advanced threats. Their commitment to excellence and innovation makes them an invaluable ally in our mission to protect our customers.” – Steve Furlong, Director of Fraud Management at First National Bank of Omaha

As human ears are typically unable to differentiate between real human voice vs. AI-generated voice, Pindrop Pulse enables organizations to bolster the security of their contact centers to confidently serve the needs of their customers. With best-in-class performance, Pindrop Pulse provides seamless integration to Pindrop Passport and Pindrop Protect using the models developed with sustained leadership in research and exhibits core tenets of an effective deepfake detection solution.

  • Best-in-class Performance: Pindrop Pulse’s ability to detect deepfakes provides organizations and their customers protection against a variety of voice attacks, including recorded voice replay, synthetic voice, automated voice chatbot, voice modulation, and voice conversion. Pindrop’s technology has detected 99% of attacks that use previously seen deepfake tools and 90% of “zero-day” attacks that use new or previously unseen tools.
  • Integrated Customer Experience for Authentication and Fraud Protection: Pindrop Pulse, in combination with Pindrop Protect and Pindrop Passport, provides a strong line of defense against AI-generated, synthetic voice attacks that can perform large-scale account reconnaissance and takeovers. With seamless integration with existing authentication and fraud protection tools, Pulse helps keep customers safe from large-scale contact center fraud.
  • Sustained leadership in research for deepfake detection: With 18 patents filed or granted for audio deepfake detection, in addition to 250+ patents on voice security, Pindrop brings a solution that not only detects deepfakes created by engines from today but provides protection against zero-day attacks as well. Pindrop’s deepfake technology has been tested against a proprietary dataset of 20M audio samples that represent well over 120 AI systems that generate voices. Additionally, Pindrop has been among the top performers in the ASVspoof challenge since 2017. 
  • Designed by core tenets of deepfake detection:  With real-time detection, Pindrop Pulse provides continuous assessment during calls to protect against evolving threats. Resilient to noise, reverberation, and adversarial attacks, another core tenet of Pindrop’s deepfake solution is “explainability” (providing reasons and attributes of deepfake detection) to enhance the understanding and intelligence of security processes. This fraud feedback and intelligence provided by Pindrop’s solution can be used by security analysts to maximize the accuracy of fraud detection and case investigation processes. 

Stay Ahead, Stay Secure

As we launch this revolutionary module, we invite you to explore the future of contact center security with us.

Learn more about how our Audio Deepfake Detection solution can empower your organization here.

At Pindrop, we remain dedicated to providing innovative voice security solutions that meet and exceed the evolving demands of the digital landscape. Join us on this journey toward a more secure future.

Humans are unable to detect over a quarter of deepfake speech samples. New research from UCL has found that humans could only detect artificially generated speech 73% of the time. This also appeared to be consistent on our recent webinar: Are you smarter than a 5th Generation Deepfake engine? where we tested our audience’s ability to identify whether an audio sample was genuine or a deepfake. Advanced liveness detection technology is needed to support your business as generative AI continues to evolve.

But first, what are deepfakes, and how have they evolved?

Deepfakes use a form of artificial intelligence called deep learning to make images of fake events or videos appear natural. These can entail creating fake photos from scratch that seem authentic, using voice skins, or even voice clones of public figures. Joe Biden and Taylor Swift have both been recent victims of deepfake fraud. Using generative AI technology, bad actors can inject voice into real-time streams, leading to significant fraud loss, the spread of misinformation, and damaged brand reputation.

Here are three reasons why Pindrop’s liveness detection feature, Pindrop Pulse, can detect deepfakes better than humans. 

  1. Our deepfake detection engine is constantly evolving and adapting to new types of attacks.

A quick look at AI performance on benchmark charts can show that it’s surpassed humans at several tasks, and the rate at which humans are being surpassed at new tasks is increasing. Even as the fraud industry finds a way to develop new entry models, tech is maturing to get in front of those changes to be more operational.

In our lab testing with 11 million sample test data sets, Pindrop Pulse can detect a deepfake 99% of the time. We can also detect 90% of zero-day attacks, meaning that even before we could train our technology for a new attack, 9 out of 10 times, we can detect an attack based on one of the multiple methodologies we have in place today. Our solutions have innovative technology that evolves and learns with your business. It tracks trends and fraud detection rates, allowing it to iterate on its strategy to catch and detect fraud each time.

When Pindrop Pulse is added to Pindrop Passport, it can be set up via simple policies to conduct either active or passive voice authentication or even both simultaneously.

2. Pindrop Pulse decreases human error.

Implementing a solution with behavioral and multi-factor authentication practices removes the high chance of human error in detecting fraud. Pindrop has found that fraudsters can pass agent verification 40% to 60% of the time. When a call is received, Pindrop’s technology has already processed the call’s metadata. Once the caller interacts with the IVR, Pindrop provides predictions and a full risk scoring.

At the same time, Pindrop Passport will also provide an ANI Validation score to determine if the caller ID is genuine or being spoofed.  Passport can quickly take in multiple inputs from the caller to decide whether the call is genuine or a deepfake. Look at the image below to see how Pindrop Pulse works in the background during a call.

3. Pindrop has the only authentication platform with the best-performing, fully integrated deepfake detection. 

Lastly, Pindrop Passport in conjunction with Pulse deepfake detection provides out-of-the-box flexibility to create custom policies and alerts. Our liveness score can be seen in the UI and consumed in real time as part of our risk APIs. More importantly, the authentication policies you create in Passport can use this score to prevent any deepfake calls to your organization from being enrolled or authenticated. 

Contact our team today to learn how Pindrop Pulse, integrated with Passport, can strengthen your call center’s authentication approach while minimizing friction for legitimate callers! 

WEBINAR

Are You Smarter Than a 5th Generation Deepfake Engine? – Pindrop

Are your ears better at detecting synthetic audio than our authentication and deepfake solution, Pindrop Passport?

In this live webinar, our authentication and deepfake experts will share what’s top of mind in the world of voice security, while putting you to the test to see who performs best: human or machine.

Test your knowledge about deepfakes

See live demonstrations of Pindrop’s new liveness detection feature

Get your burning authentication and deepfake questions answered

So… are you smarter than a 5th generation deepfake engine?

Meet the Experts

Bryce McWhorter

Sr. Director, Product, Research & Engineering, Pindrop

Tara Garnett

Sr. Product Manager, Authentication Products, Pindrop

Darren Baldwin

Sr. Director and Account Executive, Pindrop

Deepfakes have already disrupted the consumption of mass media as we know it. Scammers are creating deepfakes of popular celebrities and famous figures in a bid to defraud innocent individuals on multiple platforms. 

And, as we head into 2024, an election year, the looming threat of deepfakes is only going to get worse. It’s imperative for organizations and individuals to come up with a strategy to combat misinformation and fraud in 2024 and invest in technology to neutralize this attack vector in 2024. 

But, as we bring down the curtains on this year and head into 2024, here are some interesting things that you should know about deepfakes. 

More Investments in AI Expected in 2024

According to a report by McKinsey, 40% of business respondents are expected to make further investments into AI and cybersecurity, and 28% have already made it a hot topic for their business agendas going into the new year. 

As generative AI becomes a household term, cybersecurity professionals also expect to see a rise in different types of phone scams

We already know just how ruthless scammers can be in their attempts to gain access to someone’s financial information. With the threat of deepfake attacks now becoming clearer, now is the time for companies to invest in AI and protect themselves. 

In fact, the government is already taking notice of the evolving landscape. The Deep FAKES Accountability Act, typically known as the “Deepfake Accountability Act,” was introduced as a legislative proposal introduced to address the challenges posed by deepfake technology in 2023.

The Deepfake Accountability Act aims to regulate the creation and distribution of deepfakes, which are sophisticated artificial intelligence-generated images, videos, or audio recordings that make it appear as though someone is doing or saying something they did not.

As we see it, the Deepfake Accountability Act represents a significant step towards addressing the complex challenges posed by deepfake technology. However, it also raises questions about the balance between regulation and freedom of expression, the technical feasibility of enforcing such regulations, and the potential for global enforcement given the borderless nature of the internet. 

With the nature of phone scams changing and becoming increasingly more advanced, it’s imperative that companies start investing combating misinformation and fraud and take steps to protect themselves from such attacks.

Deepfake Identity Fraud Doubled from 2022 to Q1 2023

As the year comes to a close, we’ve seen some startling facts come to light. An independent research study conducted by Sumsub showed that deepfake identity fraud scams doubled from 2022 to just the first quarter of 2023. 

As new technologies become more accessible, this is only going to rise further (keep in mind we still don’t have all the stats from 2023 yet!). 

In September 2023, the NSA, CSI, and CISA joined together to release a Cybersecurity Information Sheet entitled Contextualizing Deepfake Threats to Organizations

The report was created to help organizations better understand just how powerful deepfakes and the threat of generative AI can be, and it also recommended the use of passive authentication technologies.

For instance, in the contact center space, companies can use passive voice authentication to seamlessly identify callers. This not only saves time, but also helps reduce operational costs in the contact center. 

Deepfake Audio Samples are Increasingly Hard to Detect 

90% of consumers have raised concerns about deepfake attacks, as revealed in our Deepfake and Voice Clone Consumer Report. But, did you know that most people already have a hard time identifying deepfake audio samples?

According to a study that was recently published on PLOS One, it was revealed that one in four people can’t identify a deepfake from an actual audio sample. 

This is a serious concern, especially in industries where data security is of paramount importance. As deepfakes are likely to become more realistic in the near future, companies have to step up and take appropriate measures to protect consumer data. 

What Threats Do Deepfakes Pose in 2024?

Deepfakes have escalated the spread of misinformation and propaganda. By creating realistic videos or audio clips, malicious actors can easily fabricate statements or actions of public figures, leading to false information being rapidly spread. 

This can have serious implications for politics, where deepfakes could be used to damage reputations, influence public opinion, or interfere with elections. 

Rising Cybersecurity Threats

Deepfakes aren’t just hyper-realistic digital manipulations of video content, but scammers are also able to create audio content and gain access to a person’s bank accounts. 

The use of deepfakes in cybercrime has risen significantly. Cybercriminals can create deepfake videos or audio of key personnel to gain unauthorized access to secure environments. 

This includes using deepfakes for social engineering attacks, where individuals are tricked into revealing sensitive information or transferring funds to fraudulent accounts.

Impact on Personal Privacy

The proliferation of deepfakes presents numerous legal and ethical challenges. Determining the authenticity of digital content has become more complex, complicating legal proceedings and journalistic integrity. 

Additionally, the creation and distribution of deepfakes raise questions about the right to privacy, consent, and the ethical implications of manipulating digital content.

Erosion of Trust

In case a contact center is compromised due to deepfakes, one of the biggest challenges that they face is rebuilding consumer trust. In high-security industries like finance and banking, trust is everything. 

There have been many cases where hacks have led businesses to being heavily fined and losing long-term relationships with their clients. 

Deepfakes contribute to the erosion of trust in media and institutions. As it becomes increasingly difficult to distinguish between real and fake content, public trust in media sources and digital content declines.

Pindrop’s Deep Voice biometric engine can be used to combat deepfake threats in contact centers. It helps companies prevent cybersecurity attacks and can be used to simplify voice authentication in call centers. 

Protect Your Organization from Deepfake Attacks in 2024

Pindrop offers advanced deepfake detection, with a 99% detection rate. This can help contact centers in combating the threats posed by generative AI and help minimize the risk of losses. Request a demo today to see how it works!

Biometric spoofing is a common tactic used by scammers to manipulate biometric traits in order to impersonate innocent targets. Plus, with deepfakes that are becoming prevalent (we’re already seeing scammers use deepfakes to impersonate popular celebrities), it’s becoming increasingly difficult to protect against biometric spoofing. 

However, while biometric spoofing poses a serious issue, deepfake detection tools can be used to combat such threats. Here’s what you need to know about preventing biometric spoofing with deepfake detection.

How Deepfake Detection Tools Prevent Biometric Spoofing

As deepfake technology becomes more sophisticated, the potential for its use in biometric spoofing grows, posing a significant threat to the security of these systems. However, advanced deepfake detection methods provide a line of defense against such threats.

For voice recognition systems, deepfake detection technologies are similarly vital. These systems analyze speech patterns, looking for inconsistencies in pitch, tone, and rhythm that are indicative of synthetic audio. 

By identifying these anomalies, deepfake detection can prevent the use of AI-generated voice replicas in spoofing voice biometric systems. This is particularly important in sectors like banking and customer service, where voice authentication is increasingly common. 

Moreover, deepfake detection contributes to the ongoing development of more secure biometric systems. As detection algorithms evolve in response to more advanced deepfake techniques, they drive improvements in biometric technology, ensuring that these systems remain a step ahead of potential spoofing attempts. 

This includes the development of more sophisticated liveness detection features and the integration of multi-factor authentication processes, which combine biometric data with other forms of verification.

Deepfake detection tools are also being used in facial recognition systems. Deepfake detection algorithms focus on identifying subtle discrepancies and anomalies that are not present in authentic human faces. 

These systems analyze details such as eye blinking patterns, skin texture, and facial expressions, which are often imperfectly replicated by deepfake algorithms. 

By integrating deepfake detection into facial recognition systems, it becomes possible to flag and block attempts at spoofing using synthetic images or videos, thereby enhancing the overall security of the biometric authentication process.

What is Biometric Spoofing?

Biometric spoofing refers to the process of artificially replicating biometric characteristics to deceive a biometric system into granting unauthorized access or verifying a false identity. 

This practice exploits the vulnerabilities in biometric security systems, which are designed to authenticate individuals based on unique biological traits like fingerprints, facial recognition, iris scans, or voice recognition. 

Biometric systems, while advanced, are not infallible. They work by comparing the presented biometric data with the stored data. If the resemblance is close enough, access is granted. Spoofing occurs when an impostor uses fake biometric traits that are sufficiently similar to those of a legitimate user. 

For instance, a fingerprint system can be spoofed using a fake fingerprint molded from a user’s fingerprint left on a surface, or a facial recognition system might be tricked with a high-quality photograph or a 3D model of the authorized user’s face.

The implications of biometric spoofing are significant, especially in areas requiring high security like border control, banking, and access to personal devices. 

As biometric systems become more prevalent, the techniques for spoofing these systems have also evolved, prompting a continuous cycle of advancements in biometric technology and spoofing methods.

Understanding Deepfake Detection

Deepfake voice detection is an intricate technical process that targets the identification of AI-generated audio or video, aimed at replicating human speech or mannerisms. 

This field leverages a combination of signal processing, machine learning, and anomaly detection techniques to discern the authenticity of audio samples.

Machine learning models are central to this process. These models are trained on vast datasets containing both genuine and AI-generated speech. 

By learning the nuances of human speech patterns and their AI-generated counterparts, these models become adept at identifying discrepancies indicative of deepfakes. 

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are commonly employed in this context, offering high efficacy in pattern recognition within audio data.

Signal analysis plays a pivotal role in deepfake voice detection. Here, advanced algorithms are used to scrutinize the spectral features of the audio, including frequency and amplitude characteristics. 

Deepfake algorithms, while advanced, often leave behind anomalous spectral signatures that are not typically present in natural human speech. These can manifest as irregularities in formant frequencies, unexpected noise patterns, or inconsistencies in harmonics.

Deepfake detection algorithms also rely on temporal analysis, which involves examining the continuity and consistency of speech over time. 

Deepfake audio may exhibit temporal irregularities, such as inconsistent pacing or abnormal speech rhythm, which can be detected through careful analysis. This technique often involves examining the audio waveform for unexpected breaks or changes in speech flow.

How do Deepfakes Work?

The word deepfake is a portmanteau of “deep learning” and “fakes,” which include highly realistic manipulations of both audio and video elements. These are created using advanced artificial intelligence (AI) and machine learning (ML) techniques. 

Understanding how deepfakes work involves diving into the complex interplay of technology, AI algorithms, and data manipulation.

Deepfakes are primarily generated using a type of neural network known as a Generative Adversarial Network (GAN). This involves two neural networks: a generator and a discriminator. 

The generator creates images or sounds that mimic the real ones, while the discriminator evaluates their authenticity. Through iterative training, where the generator continuously improves its output based on feedback from the discriminator, the system eventually produces highly realistic fakes.

To create a deepfake, a substantial amount of source data is needed. For example, to generate a deepfake video of a person, one would need many images or video clips of the target individual. 

These are fed into the GAN, enabling the AI to learn and replicate the person’s facial features, expressions, and voice (if audio is involved). The quality and realism of a deepfake are directly proportional to the quantity and quality of the training data.

In video deepfakes, the AI alters facial expressions and movements to match those of another person. 

This is done frame by frame, ensuring that the facial features align convincingly with the movements and expressions in the source video. 

For audio deepfakes, the AI analyzes the voice patterns, including tone, pitch, and rhythm, to create a synthetic voice that closely resembles the target individual.

Once a preliminary deepfake is created, it goes through refinement processes to enhance realism. 

This can include smoothing out discrepancies in lighting, refining edge details, and ensuring consistent skin tones. The final rendering involves compiling these adjusted frames into a seamless video or audio clip.

Protect Your Business with Pindrop’s PhonePrinting Technology

Pindrop offers advanced deepfake detection solutions to prevent biometric spoofing. Using voice printing technology, Pindrop can detect subtle anomalies with acoustic features, helping with fraud identification, and pinpointing by device type or even carrier. Curious to know how it works? Request a demo today!

In our recent report with Voicebot.ai, over 2,000 U.S. consumers were surveyed about their knowledge and feelings around deepfake and voice clone technology. Amit Gupta, Pindrop’s VP of Product, Research & Engineering and Bret Kinsella, Voicebot.ai’s Founder and CEO sat down in a webinar to discuss the key findings in the report. 

Below you’ll find answers to burning questions around consumer deepfake and voice clone sentiment. 

1 – Where are consumers learning about deepfakes?

Unsurprisingly, social media is leading the charge with channels like YouTube, TikTok, Instagram, and Facebook — to name a few. Then, it is followed by movies, documentaries, and the news media, where these technologies are used and covered when relevant headlines pop up. Awareness of deepfakes across these channels was slightly higher than voice clones. 

2 – What do consumers like about deepfakes and voice clones?

Consumers have a positive sentiment around deepfakes and voice clones because, in some cases, they’re funny and entertaining. In video games, deepfakes are seen to improve realism and add creativity. But 90% of the population had some sort of concern around the technology.

3 – Is sentiment more positive or more negative about deepfakes and voice clones?

The report showed that many skewed either more positive or negative, like an inverted bell curve. It’s the inverse of a normal distribution. The highest number for deepfakes was extremely positive and negative (22.3% each, respectively), with fewer people in the neutral middle (11.8%). Voice clones looked slightly more pessimistic, with 21.6% in the extremely negative category and only 18.8% in the extremely positive category, perhaps because awareness is slightly lower. 

4 – What are some examples of deepfakes or voice clones that consumers like? 

Very famously on America’s Got Talent, one of the contestants used a Simon Cowell face swap. For the new Indiana Jones movie, filmmakers used AI to make the lead actor look 20 years younger. Lastly, in the Andy Warhol Diaries, the creators used a voice clone to copy Andy’s voice and read the journal aloud. These examples make it much easier to comprehend that 38% percent of the consumers surveyed in the study have more positive feelings about these use cases.

5 – How does consumer sentiment vary by industry?

Two-thirds of those surveyed knew about voice clones taking place in the banking industry. Voice clone awareness was also high in the politics and government and media industries. Consumers also feel that many industries such as banking, insurance, and healthcare are not doing enough to protect against the risks of deepfake technology.

6 – What other factors weighed into consumer concern?

Awareness of deepfakes and voice clones declines with age, with those over 60 dropping in awareness quite dramatically. Concern around deepfakes and voice clones rises with income. The report also showed that consumers were very interested in voice authentication as a tool to protect against larger deepfake threats.

7 – How will AI help consumers and businesses detect deepfakes and voice clones in the future?

Deepfake and liveness detection is becoming more important as generative AI evolves. Liveness detection helps businesses and consumers have high confidence in knowing that who they’re speaking to is a real, live person. Learn more about the benefits of biometric liveness detection and how it prevents fraud here

In conclusion

As AI technology evolves, so will fraudsters’ ability to leverage deepfakes and voice clones for dangerous tactics, including spreading misinformation and attempting fraud. Businesses can enhance their call center security and protect their consumers with a robust security strategy, including multi-factor authentication, advanced identity verification and real-time liveness detection. 

Interested in watching the full webinar discussing the report’s findings? View it on-demand here. Or request a demo of Pindrop’s liveness detection.

AI-manipulated media can come in multiple forms. Pindrop’s newly released Deepfake and Voice Clone Consumer Sentiment Report (October 2023) categorizes deepfake technology in various ways, from static (i.e., text, printed, written) to dynamic, which can have audio and even video elements.

Regardless of the technology and how it’s delivered, deepfakes are AI-manipulated digital media in text, image, audio, and video formats that replicate something real or alter critical characteristics that risk manipulating how that media is interpreted. As technology rapidly advances, deepfakes pose a risk to consumers and businesses across industries.

The report dives into consumer knowledge of the various subcategories within deepfake technology, highlighting where consumers are most concerned regarding recent trends and advances.

Results showed that 60% of surveyors were highly concerned with deepfakes and voice clones (a subcategory of deepfakes), and over 90% expressed concern.

Some key insights from the report include:

  • Social media drives more exposure — The top four channels for deepfake exposure are social media platforms: YouTube, TikTok, Instagram, and Facebook, with 49.0%, 42.8%, 37.9%, and 36.2% reported encounters, respectively.
  • There is slightly more awareness around voice clones than deepfakes — The term voice clone registered higher consumer awareness than deepfake, at 63.6% to 54.6%, respectively. This may be surprising considering how much higher deepfake registers in Google trends data. However, an explanation may be that the meaning of voice cloning is evident because most people are well aware of cloning, while “deepfake” is just now gaining usage and broader popularity.
  • Awareness is driven by income — Three-quarters of U.S. adults with over $125,000 annual income know about voice clones, and over two-thirds know deepfakes. However, for consumers with income less than $50,000, those figures are just 56.5% and 43.6%, respectively.
  • There is slightly more concern around deepfakes — U.S. adults report they are “extremely” or “very” concerned about the negative impacts of deepfakes at a higher rate than voice clones. The total for deepfakes is 60.4% and 57.9% for voice clones. The critical difference is in the “extremely” category, where deepfake outpaced voice clones by three percentage points.

Let’s dive into some areas that we found interesting in the report.

1. The Growing Concern in Men and Women Varied

More men than women reported awareness of deepfake technology – over 66% of men compared to 42% of women. Men are also 50% more likely to have created a deepfake than women. This could have to do with how they are consuming information.

While men were more likely to have encountered a deepfake on YouTube (52.4% to 46.7%), women had the edge over men on TikTok (43.1% to 38.4%). Men also report significantly more concern than women about deepfakes and voice clones causing adverse impacts. And the gap grew wider for voice clones, with 60.8% for men and 54.5% for women.

2. Pop Culture Plays a Role in AI Technology Sentiment

The polarization of deepfake sentiment varies across media types. Some entertainment consumers raise positive views of evolving AI technology. For example, the report showed that viewers of The Andy Warhol Diaries documentary held a 71% positive sentiment towards using deepfakes in the film.

Viewers of the most recent Indiana Jones movies also reported a 56.7% positive sentiment. This analysis suggests that viewers of those media saw tangible value in the technology.

What consumers appreciate other than the creativity (39.4%) of deepfakes is that they are funny or entertaining (40%). But that comes with a paradox of being concerned with the risks. Over 90% of consumers are concerned about the threat of deepfakes, though 40% expressed positive sentiment toward generative AI technology.

3. Concerns Vary By Industry and Income

67.5% of U.S. adults expressed concern about the risks posed by deepfakes and voice clones related to banking. Politics, governments (54.9%) followed that, then media and healthcare (50.1%). One reason for the variation in concern could be attributed to income.

Consumers with less than a $50,000 annual salary were the least likely to express concern in every category. This trend excludes the insurance sector, where the problem – regardless of income – was nearly equal to the $125,000 salary group.

4. There are Split Views Around Media and Social Media Readiness

Consumers need more confidence that news media organizations are taking proactive measures to combat deepfakes. While those surveyed (40%) reportedly feel optimistic that banks, insurance & healthcare providers have taken steps to combat deepfake fraud, they have lower confidence in the news and social media sectors.

These results are significant because news and media platforms, especially YouTube and TikTok, drive the highest number of deepfake encounters.

Final Thoughts: Deepfake, Voice Clone, and Consumer Sentiment Report 2023

The concern around deepfakes and voice-cloned media continues to grow as technology evolves. As they build awareness strongly driven by income and industry, companies must build trust in their customer base by developing action plans with holistic security tactics such as multifactor authentication and voice biometric technology to protect against AI fraud.

Nearly two-thirds of U.S. adults expressed concern about the inappropriate and often dangerous use of deepfakes and voice clones in the workplace, leading to fraud attacks, a halt in business operations, brand damage, and more.

Overall, 90% of respondents are concerned about generative AI technology, and that number will continue to increase as both awareness of deepfakes and the capabilities of technology grow.

If you want to understand how to better protect your organization against deepfake and voice clone technology, download the report to learn more.

Report

2023 Deepfake and Voice Clone Consumer Report

Read Pindrop’s new report on consumer sentiment surrounding deepfakes and voice clones.

  • Learn more about consumer concerns regarding deepfakes and synthetic voice, with 90% od surveyors expressing worry.
  • Discover which industries face the highest levels of concern about deepfake risks.
  • Explore how pop culture influences AI sentiment and strategies to combat deepfakes effectively

Deepfakes use a form of artificial intelligence called deep learning to make images of fake events or videos appear natural. These can entail creating fake photos from scratch that seem authentic, using voice skins, or even voice clones of public figures. Deepfakes are becoming a severe issue in many industries. However, with deepfake detection, companies can easily detect fraud and protect themselves from unprecedented damage to their brand reputation, customer data and financial loss. 

How Deepfake Detection Works

In the US, there was a 1200% increase in deepfakes among all fraud types in the first three months of 2023. Deepfake detection is now essential for all businesses to protect against these scams, especially attacks that occur in the call center. Although Pindrop’s research indicates that synthetic content is already present in call centers, it is not yet rampant — making it an excellent time to get in front of a growing problem. 

The technology works through IVR (Interactive Voice Response) flows and creates a traffic light so the agent can quickly signal the next steps for the call when and if fraud is detected. All of your calls can then easily flow through a liveness score, allowing your contact center to operate as usual without fraudsters making their way through.

The Importance of Deepfake Detection

According to a recent study, humans can detect deepfake speech only 73% of the time. This study, out of the University of London with 529 participants, was one of the first to assess humans’ ability to detect artificially-generated speech in a language other than English.  Conversely, Pindrop’s deepfake detection technology has a 99% success rate.

Verification algorithms can also be more successful in detecting deepfake images (like passport photos or mugshots), achieving accuracy scores as high as 99.97% on standard assessments like NIST’s Facial Recognition Vendor Test (FRVT). According to the Center for Strategic and International Studies (CSIS), “Facial recognition systems are powered by deep learning, a form of AI that operates by passing inputs through multiple stacked layers of simulated neurons to process information.” Humans quite simply can’t replicate this level of accuracy and technology.

The Dangers of Deepfakes

Organizations and individuals are at risk regarding deepfakes as it’s a source that leverages social engineering attempts to manufacture fraudulent texts, voice messages, and fake videos to spread misinformation. 

According to the US Department of Defense, deepfakes are AI-generated, highly realistic content that can be used to: 

  • Threaten an organization’s brand.
  • Impersonate leaders and financial officers.
  • Enable access to networks, communications, and other sensitive information.

In this sense, all companies that are housing business and customer data could be at risk to these attacks.

Deepfakes and Cybercrime

According to PwC’s 25th CEO Survey, 58% of CEOs consider cyber attacks a significant threat to business operations. Climate change (33%) and health risks (26%) were ranked much lower. The report explains that the impact on businesses could be much more significant regarding the ability to sell and develop products in the future. A PwC expert, Gerwn Naber, says, “CEOs face the challenge of properly preparing their organization for a cyber attack.” 

According to a recent report the Department of Homeland Security released, cybercrime, attacks on infrastructure, misinformation, and election interruptions with emerging technologies could be the most significant cyber threats in 2024. No wonder it was at the top of the PwC’s 2024 Global Digital Trust Insights regarding where business leaders are focused on investment in the next 12 months.

Deepfakes and Fake News

This new report the Department of Homeland Security (DHS) released predicts that 

“Financially motivated criminal cyber actors will likely impose high financial costs on the US economy in the coming year.” The report explains, “Nation-state adversaries likely will continue to spread [misinformation] aimed at undermining trust in government institutions, our social cohesion, and democratic processes.” 

Understanding the Technology Behind Deepfake Detection

Ninety-two percent of respondents in a recent Pindrop survey said their leadership was interested in learning more about deepfakes. When companies are overloaded, “The only development work that we want our customers to do is the one they need to operationalize this new intelligence,” says Amit Gupta, VP of Product Management, Research and Engineering at Pindrop. Below explains how deepfake detection works in a call center, using Pindrop’s liveness detection technology. 

As calls come in, the IVR (Interactive Voice Response) is set up to create a traffic light for an agent, signaling prescriptive next steps if and when fraud is detected. Pindrop Protect is working to augment a liveness score to create organizational alerts on video replays. When humans can only see a deepfake on average 70% of the time, and our technology is 99%+ accurate and does not create more work for a company when implemented, it can make a big difference. Meta’s Voicebox Case Study is a great example to learn more about how deepfake technology works.

Deep Learning and Neural Networks 

Deep learning is a subfield of machine learning, while neural networks are the backbone of deep learning algorithms. “Deep learning is just an extensive neural network, appropriately called a deep neural network. It’s called deep learning because the deep neural networks have many hidden layers, much larger than normal neural networks, that can store and work with more information,” an article in Western Governors’ University explains. Wikipedia defines deep learning as the broader family of machine learning methods based on artificial neural networks. The neural network has multiple hidden layers known as a deep learning system.

One (deep learning) teaches AI how to process data, while the other (neural network) is its underlying technology. In this sense, organizations can use neural networks for machine learning at lower costs as they are more straightforward, but deep learning systems have a more comprehensive range of practical uses. In the latter, you can leverage models to assist with language processing, autonomous driving, speech recognition, and more.

Artificial Intelligence

AI is the intelligence of machines or software, and machine learning is the umbrella term for solving problems by which the cost of human programmers would be cost-prohibitive. Microsoft’s Low-Code Signals 2023 report says 87% of Chief Innovation Officers and IT professionals believe “increased AI and automation embedded into low-code platforms would help them better use the full set of capabilities.” This helps to explain why so many companies are leveraging this technology to improve their security posture and protect against deepfakes.

The Future of Deepfake Detection

With emerging technology comes many risks to businesses. It’s becoming so easy to leverage AI voice deepfakes that it’s increasingly critical for companies to know how to detect deepfakes. Deepfake detection tools, like Pindrop’s liveness detection, could be one solution that assists companies in protecting themselves with minimal output needed to achieve successful outcomes. If you’d like to learn more about how Pindrop works with deepfake detection, request a demo to talk to one of our reps or visit Pindrop’s deepfake resource site to learn more. 

Recent leaps in the field of generative AI, in combination with a plethora of available data, has resulted in numerous tools that are able to generate highly convincing audio and video deepfakes. This includes both fully synthetic identities but also synthetic voices and voice cloning. 

While much work at Pindrop research has gone into developing tools for accurate deepfake detection, we believe that more can be done to protect users from malicious or misleading use of deepfakes. One such path is to use digital audio watermarking to aid the distinction between live and synthetically generated speech. The vision here is that all synthetically generated speech is watermarked but like any other opportunity, it does not come without its own challenges. Most watermarking technology has been applied to images and it is already used for AI generated images1

Here we introduce the basics of audio watermarking and discuss the particular challenges that arise if this was to be used for speech at call-centers. In summary, the use of watermarking would be a good start but it will not alone solve potential threats posed by deepfake speech for two reasons. First, there is an implicit assumption that all deepfakes will be watermarked, which will be  difficult  to enforce and second, the acoustic and phone channel degradations makes watermarking more vulnerable to attacks. It’s not surprising that researchers at University of Maryland found it easy to evade current methods of watermarking. Multiple academic institutions shared their skepticism on the efficacy of watermarking, in the same article in WIRED 2 that outlined the University of Maryland findings. Therefore, at Pindrop we believe that watermarking, especially in the context of audio for contact centers, is not fool proof and should be considered in combination with other advanced deepfake protection tools. 

What is digital audio watermarking?

Audio watermarking describes the insertion (or encoding) of a signature signal (watermark) to an audio recording. Much of the watermarking development targets copyright protection of music. A related field is steganography, where the audio signal is used to carry hidden information. 

A watermark must be added in a strategically subtle way so as to be both imperceptible to the human ear (otherwise it will impact the quality of the host audio) as well as detectable by a decoding algorithm. 

The decoding algorithm exploits knowledge of how and where the watermark has been encoded in the signal to reveal its presence; it may also require a secret digital key if one has been used as part of the encoding process. 

The process must be robust to degradations of the watermarked signal. For instance, a music decoding algorithm must be capable of revealing a watermark even if the audio has been compressed or altered (e.g., uploading to YouTube). Finally, depending on how much information we wish to store via watermarking, the capacity can also become a constraint. A watermark must achieve a balance between robustness and imperceptibility (i.e., increasing the watermark intensity will make it more robust, but it can also make it perceptible). 

The challenges of watermarking speech for call centers
Figure 1. A typical call-path for a speech signal from a talker to a listener highlighting many of the potential sources of degradation a speech signal may encounter in the transmission between a talker and a listener.
Figure 1. A typical call-path for a speech signal from a talker to a listener highlighting many of the potential sources of degradation a speech signal may encounter in the transmission between a talker and a listener.
  
Watermarking of speech is inherently a much harder problem than watermarking of music or images due to its narrow bandwidth, its sparse spectral content and ease of predictability. Put simply, in music there is more “signal” within which to add the watermark (i.e., music has greater spectral richness).

The level of robustness required for music-watermarking is not high; music compression causes relatively low quality loss (e.g., MP3). By comparison, if we consider synthetic speech at the contact center, the degradations are significantly more challenging.

The contact center use case is important as different attacks using deefpake speech can be anticipated. For example, it may be used to impersonate a victim to bypass a voice authentication system or to mislead a contact center agent.

Text-to-speech (TTS) engines typically operate at high sampling rates (44.1kHz or 48kHz). As the typical legitimate use case of the synthesized content involves minimal degradations and little or no compression, the watermark perceptibility must be low even at high sampling rates.

Then, there is, of course, the challenge posed by deliberate attacks whereby a malicious attacker applies operations to the watermarked signal to remove or attenuate the watermark.

But even if we were to disregard intentional attacks, the anticipated degradations of the speech signal can be significant. A speech signal may pass from a loudspeaker (ambient noises and delays) and be captured by a microphone (filtering). The telephony channel itself adds degradations (downsampling, packet-loss, echoes, compression) see Figure 1. Figure 2 illustrates the balance between imperceptibility quantified by the perceptual evaluation of speech quality (PESQ) and watermark detection performance quantified by equal error rate (EER) for two commonly used watermarking methods: the spread-spectrum (SS) and the time-spread echo (TSE). The use of EER in watermark detection is the point where the false alarm rate (detecting a watermark when one is not present) and the false rejection rate (not detecting a watermark when one is present) are equal. For an imperceptible watermark of PESQ greater than 4 the typical performance is in the range of 15-30% EER. Note that both these methods achieve close to 0% EER even in attack types such as additive noise and resampling as shown in Fig. 3. In both cases the parameter that is varied is the watermark strength which governs the tradeoff between quality and performance.
 
Figure 2. Quality measured in terms of PESQ vs performance quantified by the EER of the watermark detection for two commonly used watermarking techniques, spread-spread spectrum and time-spread echo, in the presence of call-path signal modifications (downsampling, codec at 8 kbps and noise gating).

FIgure 2. Quality measured in terms of PESQ vs performance quantified by the EER of the watermark detection for two commonly used watermarking techniques, spread-spread spectrum and time-spread echo, in the presence of call-path signal modifications (downsampling, codec at 8 kbps and noise gating).
 
Figure 3. Quality measured in terms of PESQ vs performance quantified by the EER of the watermark detection for two commonly used watermarking techniques, spread-spread spectrum and time-spread echo, in the presence of deliberate signal modifications (additive white Gaussian noise and resampling).
FIgure 3. Quality measured in terms of PESQ vs performance quantified by the EER of the watermark detection for two commonly used watermarking techniques, spread-spread spectrum and time-spread echo, in the presence of deliberate signal modifications (additive white Gaussian noise and resampling).
 

Concluding comments

Deepfake speech poses a risk to contact centers where it may be used to bypass voice-based authentication or impersonation in account takeover attempts. Watermarking could be a good step towards the protection from malicious use of deepfake speech. However, as opposed to previous applications in, for example, music, watermarking that is robust to acoustic and phone channel degradations is a challenge that must be overcome. Moreover, there is an underlying assumption that watermarking can be enforced for all synthetic content, which may not be realistic beyond commercial TTS providers.

Therefore, the sole use of watermarking will likely not be enough and it is preferable to combine this with other sophisticated tools for synthetic speech detection.

 

1. MIT Technology Review: Google DeepMind has launched a watermarking tool for AI-generated images, Aug 2023 

2. Wired.com: Researchers Tested AI Watermarks—and Broke All of Them, Oct 2023 

Deepfakes, capable of mimicking anyone’s voice with remarkable realism, have emerged as a real threat to businesses and consumers. Fraudsters can now use technology to impersonate others with shocking accuracy, leading to brand damage, financial losses, and more. But how can businesses protect themselves and their customers against these threats? We tuned in to Pindrop’s fireside chat with executives Elie Koury, VP of Research, and Amit Gupta, VP of Product Management, Research and Engineering — to get your top questions answered around deepfakes.

1) What are the different types of deepfakes, and how do they work?

According to Elie, so many types of voice attacks have emerged in recent years. The top four types of deepfake attacks are recorded voice play, speech synthesis, automated voice chatbot, and voice conversion. Below are how they are orchestrated: 

  • Recorded voice play: Fraudster uses a device & voice recording to attempt to fool a voice biometric solution, replaying the recording or concatenating words from different recordings to formulate phrases.
  • Speech synthesis: Fraudster creates a voice model and uses text to generate spoken words that sound like an actual person.
  • Automated voice chatbots: Fraudster uses an automated chatbot & voice model to sound and interact like real people.
  • Voice conversion: Fraudster speaks into a device that changes their voice to the sound of another person.

It’s becoming more difficult for humans to detect a deepfake confidently. The evolution of AI has allowed the ability to replicate a voice in under 30 minutes, causing concerns from businesses and individuals alike.

2) What are some real-world examples of deepfakes?

With a few Googles, you can see many examples of deepfakes making recent headlines. For example, a deepfake of Sir Keir Starmer of the UK was released during a labor conference. Even Tom Hanks had to make a statement about dental insurance plans due to unauthorized AI-generated content being posted. AI has led to malicious ads and content being created to minimize authenticity and create doubt around leadership. One recent example is Senator Blumenthal’s opening remarks at the Senate hearing on AI. Senator Blumenthal began by speaking with his own voice and eventually switched to a deepfake impersonation of his voice. We used our liveness detection engine to detect the integrity of the voice. Learn more about this example here. “One thing is sure: we are seeing more and more deepfakes emerge in the media,” says Amit. 

3) What are some measures to protect against imposters using deepfake technology?

Leveraging technology that detects synthetic voices and uses multifactor authentication is essential. “We are finding that humans cannot pick up the audio difference, but technology can and at much faster rates,” says Amit. Learn more about the top 4 factors to prioritize when building your deepfake defense strategy. 

It’s also important to note that less than 10% of attendees at this webinar were confident in their organization’s ability to prevent deepfakes.

4) What examples show that technology is better than humans at detecting deepfakes?

Meta’s voicebox case study was referred to on the webinar as a great example of how far technology has come in detecting deepfakes. According to the article, Meta introduced Voicebox on June 16, 2023. This new system achieved state-of-the-art performance on various TTS applications, including editing, denoising, and cross-lingual TTS. With Pindrop’s deepfake detection, we could detect 90% of the voicebox samples and close the gap with an accuracy of over 99%.

5) What are some ways voice biometrics work?

Experts at Pindrop have discovered some commonalities around voice biometrics: 

  1. Text-to-speech (TTS) systems are built to use existing open-source components, making zero-day deepfakes much more challenging to execute.
  2. Zero-day deepfake attacks are less likely to fool voice authentication systems with liveness detection capabilities, like Pindrop. 
  3. Voice authentication systems, especially as part of a multifactor authentication strategy, are a highly effective way to authenticate real users.

6) How do you see liveness detection helping in real-time on a call? If a caller gets to an agent without being detected in the IVR (Interactive Voice Response), how can you notify an agent during a call?

“Most of our customers leverage real-time intelligence through APIs or a policy engine,” says Elie. He continues, “Behind the scenes in their IVR flows, business rules dictate what the agent will see.” Agents are already overloaded in the contact center, so most call centers just need to show individual liveness detection scores to the agent. “This creates a “traffic light” to the agent, signaling prescriptive next steps if fraud is detected,” says Elie. 

7) How does Pindrop liveness detection work at detecting deepfakes?

In the liveness detection module, the authentication policy helps to decide the level of trust to put into a user or if there is a need for further validation. Those policies are augmented with a liveness score, which can be combined with the existing scores already available in the tool. “They can also create “enrollment,” “do not enroll, “authenticate,” or “do not authenticate” policies and then receive that information back in real-time,” says Amit. 

Pindrop’s Protect product is doing something similar on the fraud mitigation side: the risk API is getting augmented with an additional liveness score and case policies used to create alerts on fraudulent and potentially fraudulent calls. “The goal for us was to minimize the integration overhead for our customers,” says Amit. He continues: “The only development work that we want our customers to do is the one they need to operationalize this new intelligence.” 

Final Thoughts: Your Top Q’s Answered on Deepfake Prevention

“Our customers see synthetic identity as the top trend and threat to any individual and company in the future,” says Elie. It’s important to know that the person you interact with on the other line is precisely who they say they are. 

If you are interested in learning more about how Pindrop works to detect deepfakes in real time, request a demo with one of our reps for more information.

WEBINAR

Fortify your Business Against Deepfakes

Deepfakes, capable of imitating anyone’s voice with startling accuracy, have become a significant threat to businesses and consumers. Malicious actors use deepfakes to impersonate real customers, leading to increased fraud, data breaches, and damage to brand reputation.

As these attacks grow in frequency, advanced fraud detection technologies, including multi-factor authentication and real-time liveness detection, are essential to protect against the damaging effects of deepfakes. Join Elie Khoury and Amit Gupta of Pindrop for an executive fireside chat on how deepfakes are impacting your business and how to protect your contact center and customers.

The top five ways deepfakes are a threat to your business

How you can stay ahead of deepfake attacks with Pindrop’s liveness detection

How you can create more trust with your customer base

Your expert panel

Elie Khoury, Ph.D

VP, Research, Pindrop

Amit Gupta

VP, Product Management, Research & Engineering, Pindrop

In the rapidly evolving landscape of cyber threats, the escalating prominence of deepfake attacks demands a comprehensive technical response. As organizations grapple with the complexities of this digital deception, a proactive approach becomes paramount.

1200% That’s the proportion of increase in deepfakes among all fraud types in the US in 1Q 20231. We’re seeing this play out in highly publicized examples such as scammers using AI to target individuals or this fraudster stealing over $600k by impersonating as the target’s friend. Senator Brown’s email to the top 6 US banks put Deepfake on the agenda of Chief Risk Officers and the Boards of all financial institutions. Our research indicates that while synthetic content is already present in call centers, it is not yet rampant.

Deepfake is a new topic for most risk, compliance, security teams and for call center operations and technology teams. It got added to their agenda in the early days of 2023. These teams still have more questions than answers, on the technology behind deepfakes, its real risk and the right approach to addressing it. To help our customers navigate the internal discussions on deepfake, Pindrop recently published an ebook on “What Executives Need to Know and Do about Deepfakes”. As a follow up, we’re sharing learnings from a survey of 100+ executives and other meetings with executives responsible for risk, cybersecurity and call center technology in 40 top financial institutions, retailers and healthcare companies in the US. 

1. Help your leadership team understand deepfake technology and its risks

Deepfake detection is starting to be a growing concern. 92% of respondents expressed interest in learning more about deepfakes. Banks, credit unions and insurance firms are most concerned about deepfakes. Interest in retailers and other industries is not as high as-yet. Deepfake detection has become a buying criteria for Pindrop customers who are evaluating new authentication platforms or planning to upgrade their existing authentication solutions. Customers using old, legacy solutions, especially on-premise solutions are exploring adding deepfake to their solutions or migrating to cloud based solutions like Pindrop that are designed for adding deepfake detection. 

Source: Pindrop Client Forum Exchange, June 2023 survey of 100 executives across 40 US financial institutions, health care providers and retailers

The interest in deepfakes is often driven from the Chief Risk Officer or the CISO. Call center technology teams are often being asked by risk teams to prepare briefings and do initial assessments of readiness of their authentication and fraud systems. While deepfake protection was not part of the 2023 operating plan, the interest from leadership is forcing the call center product teams to respond quickly.

2. Aim to fortify defenses now, when the risk is still emerging

Generative AI has unsealed a Pandora’s Box of impending deepfake threats. Presently, enterprise contact center authentication systems lack optimization for deepfake protection, specifically in identifying real human voices from synthetic or recorded ones. Pindrop incorporated deepfake detection into its Voice API solution in 2022, initially focusing on digital channels. However, the low historical prevalence of deepfakes in call center interactions meant that companies didn’t prioritize them. This scenario shifted dramatically with the advent of generative AI, voice cloning advancements, and increasingly sophisticated Text-to-Speech (TTS) systems. Consequently, companies are now acutely aware of the heightened deepfake risk, compelling them to enact strategies for risk mitigation.

Source: Pindrop Client Forum Exchange, June 2023 survey of 100 executives across 40 US financial institutions, health care providers and retailers

The rapid evolution of generative artificial intelligence technology, especially Text-to-Speech (TTS) systems and the increased frequency of reporting in the media on deepfakes has changed that. Preparedness for the future is top of mind for customers as they plan for deepfake defense. Customers realize that the best time to address a vulnerability is before being attacked.

3. Assess deepfake risks holistically

Deepfakes need to be fought on multiple fronts. Account fraud and unauthorized access were the top 2 risks in customers’ minds.