OAKLAND–WhatsApp has enjoyed tremendous growth in the last couple of years, a trend that accelerated even further when the company announced it was implementing end-to-end encryption on its messaging service. But that rollout also raised a serious issue for the company: how to identify spammers without access to the contents of users’ messages.
Like most messaging services, WhatsApp has a function that allows users to report spammers directly in the app. That’s the main mechanism the company uses to identify abusive accounts. But that’s a manual process and it relies on users to initiate it.
“We catch most spammers through reports. We’re not chasing after something that’s completely invisible. But how can we catch them sooner?” Matt Jones, a software engineer at WhatsApp, said during a talk at the Enigma conference here Wednesday.
With identifying spammers quickly as the goal, WhatsApp needed a system that could accomplish the task without decrypting users’ messages. There are a few relatively quick methods for doing that, such as checking whether a potentially abusive account has a password set up. If it doesn’t, then the account wasn’t the target of an account takeover, Jones said. Also, if an account is using a script to send messages or is using a client other than the official WhatsApp client, those are good signals that it’s likely a spam account.
“The people who get to this level are already sophisticated.”
“These are fast approaches, but they’re less effective as you go down the list,” Jones said.
So the company built a classifier that looks at a number of different attributes and makes decisions about the validity of the account. For example, it takes into consideration the age of the account and how many messages it’s sent in the last 30 seconds. A new account sending a large volume of messages is clearly suspicious.
“We either allow the account to continue or ban it completely. People ask why we don’t have some middle ground there. If you think you’re going to trick spammers by not delivering their messages, you’re wrong,” Jones said. “The people who get to this level are already sophisticated.”
WhatsApp also uses historical data to identify spammers’ behavior. After banning a spammer, Jones and his team look at all of the account’s past actions and labels them all as spam and adds them to the model. The company also looks at how an account sends messages and whether recipients respond to them. All of the things WhatsApp does in this process is designed to avoid giving any information to the spammers so they don’t know how to counter the company’s moves.
“If we detect that you’re not running our app, we record that fact but we don’t action in order to avoid providing a feedback loop to the attackers,” Jones said. “We use it for machine learning.”
The WhatsApp model also looks at the network infrastructure that an account is using, such as the ASN it’s on, and then checks how many other accounts they’ve seen from that same ASN. If a high percentage of accounts from a given ASN have been banned, that’s an indicator that an account from that ASN is a spammer.
“We look at the reputation of the things an attacker is using. It forces them to buy more things, so it raises the cost for them,” Jones said.
The model helped WhatsApp reduce the amount of spam on its service by 75 percent in the three months after launching end-to-end encryption.