Sarosh: Very similar to any security system, the more information I have the better advantage I have from fighting against the bad guys, and one of the advantages that we have is not only just having classifiers to detect the malicious activity, but we also have a voice biometric engine to know who you are.
I now have a whole ton of data about who Elie is and his voice because he goes through our system and we get to positively identify him. The more calls and audio that I have of Ellie, the more data I have available of Elie than a fraudster would have of Elie.
Now I know not only is the incoming call or the incoming voice is Elie from a positive perspective, but are there artifacts within the audio that tell me that this is also not Ellie, and it’s synthesized? It’s sort of both using the good and the bad side, or the bad signals.
Elie: Basically those tools, with their deep neural network system, will look at the spectral temporal information that is in the audio (both high and low frequencies) that the human ear cannot perceive, but the automated machine learning system would be able to detect.
Detecting those artifacts in the high frequencies, detecting the dynamics of the speech, the temporal changes in the speech, those are cues that the machine learning system will be able to detect and to use as cues to say, “This is a deepfake or genuine speech.”