Fraud, Abuse, Fingerprinting, Privacy, and Openness

Fraud and abuse online are pretty serious problems. How sites manage fraud is something of a mystery to most people. Indeed, as this post will show, that’s deliberate.

This post provides an outline of how fraud management operates. It looks at the basic techniques that are used and the challenges involved. In doing so, it explores the tension between fraud management and privacy.

Hopefully this post helps you understand why fingerprinting is bad for privacy; why you should nevertheless be happy that your bank is fingerprinting you; and, why efforts to replace fingerprinting are unlikely to change anything.

Fraud and abuse are a consequence of the way the Web works. Recognizing that these are a part of the cost of a Web that values privacy, openness, and equity is hard, but I can’t see a better option.

What sorts of fraud and abuse?

This post concentrates on the conduct of fraud or abuse using online services. Web-based services mostly, but mobile apps and similar services have similar concerns.

The sorts of fraud and abuse of most interest are those that operate at scale. One-off theft needs different treatment. Click fraud in advertising is a good example. Click fraud is where a site seeks to convince advertisers that ads have been shown to people in order to get more money. Click fraud is a constant companion to the advertising industry, and one that is unlikely to ever go away. Managing click fraud is an important part of participating in advertising, and something that affects everyone that uses online services.

Outside of advertising, fraud management techniques^[1] are also used to manage the risk of fake accounts that are created for fraud or abuse purposes. Online stores and banks also use fraud management as part of an overall strategy for managing the risk of payment fraud or theft.

This is a very high-level overview, so most of this document applies equally to lots of different fraud and abuse scenarios. Obviously, each situation will be different, but I’m glossing over the details.

Understanding online fraud and abuse

Let’s say that you have a site that makes some information or service available. This site will attract clients, which we can split into two basic groups: clients that the site wants to serve, and clients that the site does not want to serve.

Why the site does not want to serve the clients from the latter group does not matter that much, but there are some common themes we tend to see. Distinguishing between humans and bots is a very common goal. CAPTCHAs are supposed to be able to distinguish this. Of course, CAPTCHAs have always had very poor accessibility properties and increasingly, computers are better at solving CATCHAs than humans^[2].

That doesn’t stop sites from wanting to be able to pick out a bot. For advertising cases, sites will want to serve humans – after all, bots are unlikely to change their purchasing habits as a result of “viewing” an ad. Similarly, sites that provide goods that are limited in quantity – such as theatre tickets or limited run goods like sneakers^[3] – might prefer to ensure that their inventory is only sold to people.

The attacker in this model seeks to access the service for some reason. In order to do so, the attacker attempts to convince sites that they are a real client.

For click fraud, a site might seek to convince its advertising partners that ads were shown to real people. The goal is to convince the advertiser to pay the fraudlent site more money. Sophisticated click fraud can also involve faking clicks or ad conversions in an effort to falsely convince the advertiser that the ads on the fraudulent site are more useful as they are responsible for sales.

An adversary rarely gains much by performing a single instance of fraud. They will often seek to automate fraud, accessing the service as many times as possible. Fraud at scale can be very damaging, but it also means that it is easier to detect.

Automation allows fraud to be conducted at scale, but it also creates telltales: signals that allow an attack to be recognized.

Detection

Detection is the first stage for anyone looking to defeat fraud or abuse. To do that, site operators will look for anomalies of any sort. Maybe the attack will appear as an increase in incoming requests or a repetitive pattern of accesses.

Repetition might be a key to detecting fraud. An attacker might try to have their attacks blend in with real humans that are also accessing the system. An attacker’s ability to mimic human behaviour is usually limited, as they often hope to execute many fraudulent transactions. Attackers have to balance the risk that they are detected against the desire to complete multiple actions before they are detected.

Detecting fraud and abuse relies on a range of techniques. Anti-fraud people generally keep details of their methods secret, but we know that they use both automated and manual techniques.

Automated systems generally use machine learning that is trained on the details of past attacks. This scales really well and allows for repeat attacks to be detected quickly and efficiently.
Human experts can be better at recognizing new forms of attack. Attacks that are detected by automated systems can be confirmed by humans before deploying interventions.

Of course, attackers are also constantly trying to adapt their techniques to evade detection. Detecting an attack can take time.

Identification/classification

It is not enough to know that fraud is occurring. Once recognized, the pattern of fraudulent behaviour needs to be classified, so that future attacks can be recognized.

As noted, most fraud is automated in some way. Even if humans are involved, to operate at any significant scale, even humans will be operating to a script. Whether executed by machines or humans, the script will be designed to evade existing defenses. This means that attacks need to be carefully scripted, which can produce patterns. If a pattern can be found, attempts at fraud can be distinguished from genuine attempts from people to visit the site.

Patterns in abuse manifest in one of two ways:

Common software. If attackers only use a specific piece of hardware or software, then any common characteristics might be revealed by fingerprinting. Even if the attacker varies some characteristics (like the User-Agent header or similar obvious things), other characteristics might stay the same, which can be used to recognize the attack. This is why browser fingerprinting is a valuable tool for managing fraud.
Common practices. Software or scripted interaction can produce fixed patterns of behaviour that can be used to recognize an attempted attack. Clues might exist in the timing of actions or the consistency of interaction patterns. For instance, automated fraud might not exhibit the sorts of variance in mouse movements that a diverse set of people could.

The script that is followed by an attacker might try to vary some of these things. However, unless the attack script is able to simulate the sorts of diversity that real people do – which is unlikely – any resulting common patterns can be used to identify likely attempts at fraud.

Once a pattern is established, future attempts can be recognized. Also, if enough information has been recorded from past interactions, previously undetected fraud might now be identifiable.

Learned patterns can sometimes be used on multiple sites. If an attack is detected and thwarted on one site, similar attacks on other sites might be easier to identify. Fraud and abuse detection services that operate across many sites can therefore be very effective at detecting and mitigating attacks on multiple sites.

Fingerprinting and privacy

Browser makers generally regard browser fingerprinting as an attack on user privacy. The fingerprint of a browser is consistent across sites in ways that are hard to control. Browsers can have unique or nearly-unique fingerprints, which means that people can be effectively identified and tracked using the fingerprint of their browser, against their wishes or expectations.

Fingerprinting used this way undermines controls that browsers use to maintain contextual integrity. Circumventing these controls is unfortunately widespread. Services exist that offer “cookie-less tracking” capabilities, which can including linking cross-site activity using browser fingerprinting or “primary identifiers”^[4].

Fingerprinting options in browsers continue to evolve in two directions:

New browser features, especially those with personalization or hardware interactions, can expand the ways in which browsers might become more identifiable through fingerprinting.
Browser privacy engineers are constantly reducing the ways in which browsers can be fingerprinted.

Though these efforts often pull in different directions, the general trend is toward reduced effectiveness of fingerprinting. Browsers are gradually becoming more homogenous in their observable behaviour despite the introduction of new capabilities. New features that might be used for fingerprinting tend not to be accessible without active user intervention, making them far less reliable as a means of identification. Existing rich sources of fingerprinting information – like plugin or font enumeration – will eventually be far more limited.

Reductions in the effectiveness of fingerprinting are unlikely to ever result in every browser looking identical. More homogenous browser fingerprints makes the set of people who share a fingerprint larger. In turn, this only reduces the odds that a site can successfully reidentify someone using a fingerprint.

Reduced effectiveness of fingerprinting might limit the ability of sites in distinguishing between real and abusive activity. This places stronger reliance on other signals, like behavioural cues. It might also mean that additional checks are needed to discriminate between suspicious and wanted activity, though this comes with its own hazards.

Even when fingerprinting is less useful, fingerprints can still help in managing fraud. Though many users might share the same fingerprint, additional scrutiny can be reserved for those browsers that share a fingerprint with the attacker.

Mitigation strategies

Once a particular instance of fraud is detected and a pattern has been established, it becomes possible to mitigate the effects of the attack. This can involve some difficult choices.

With the difficulty in detecting fraud, sites often tolerate extensive fraud before they are able to start implementing mitigation. Classification takes time and can be error prone. Furthermore, sites don’t want to annoy their customers by falsely accusing them of fraud.

Stringing attackers along

Tolerance of apparent abuse can have other positive effects. A change in how a site reacts to attempted abuse might tip an attacker off that their method is no longer viable. To that end, a site might allow abuse to continue, without any obvious reaction^[5].

A site that reacts to fraud in obvious ways will also reveal when fraud has escaped detection. This can be worse, as it allows an attacker to learn when their attack was successful. Tolerating fraud attempts deprives the attacker of immediate feedback.

Delaying the obvious effects of mitigation allows abuse detection to remain effective for longer. Similarly, providing feedback about abuse in the aggregate might prevent an attacker from learning when specific tactics were successful. Attackers that receive less feedback or late feedback cannot adapt as quickly and so are able to evade detection for a smaller proportion of the overall time.

Addressing past abuse

A delayed response depends on being able to somehow negate or mitigate the effect of fraud from the past. This is also helpful where instances of fraud or abuse previously escaped detection.

For something like click fraud, the effect of fraud is often payment, which is not immediate. The cost of fraud can be effectively managed if it can be detected before payment comes due. The advertiser can refuse to pay for fraudulent ad placements and disqualify any conversions that are attributed to them. The same applies to credit card fraud, where settlement of payments can be delayed to allow time for fraudulent patterns to be detected.

It is not always possible to retroactively mitigate fraud or delay its effect. Sites can instead require additional checks or delays. These might not deprive an attacker of feedback on whether their evasive methods were successful, but changes in response could thwart or slow attacks.

Security by obscurity

As someone who works in other areas of security, this overall approach to managing fraud seems very … brittle.

Kerckhoffs’s principle – which guides the design of most security systems – says that you design systems that depend only on protecting the key and not keeping the details of how a system is built secret. A system design that is public knowledge can be analysed and improved upon by many. Keeping the details of the system secret, known as security by obscurity, is considered bad form and usually considered indicative of a weak system design.

Here, security assurances rely very much on security by obscurity. Detecting fraud depends on spotting patterns, then building ways of recognizing those patterns. An attacker that can avoid detection might be able to conduct fraud with impunity. That is, the system of defense relies on techniques so fragile that knowledge of their details would render them ineffectual.

Is there hope for new tools?

There are some technologies that offer some hope of helping manage fraud and abuse risk. However, my expectation is that these will only support existing methods.

Any improvements these might provide is unlikely to result in changes in behaviour. Anything that helps attackers avoid detection will be exploited to the maximum extent possible; anything that helps defenders detect fraud or abuse will just be used to supplement existing information sources.

Privacy Pass

Privacy Pass, offers a way for sites to exchange information about the trustworthiness of their visitors. If one site decides that someone is trustworthy, it can give the browser an anonymous token. Other sites can be told that someone is trustworthy by passing them this token.

Ostensibly, Privacy Pass tokens cannot carry information, only the presence (or absence) of a token carries any information. A browser might be told that the token means “trustworthy”, but it could mean anything^[6]. That means that the token issuer needs to be trusted.

How a site determines whether to provide a token also has consequences. Take Apple’s Private Access Tokens, which are supposed to mean that the browser is trustworthy, but they really carry a cryptographically-backed assertion that the holder has an Apple device. For sites looking to find a lucrative advertising audience, this provides a strong indicator that a visitor is rich enough to be able to afford Apple hardware. That is bankable information.

This is an example of how the method used to decide whether to provide a token can leak. In order to protect this information, a decent proportion of tokens need to use alternative methods.

We also need to ensure that sites do not become overly reliant on tokens. Otherwise, people who are unable to produce a token could find themselves unable to access services. People routinely fail to convince computers of their status as a human for many reasons^[7]. Clients might be able to withhold some proportion of tokens so that sites might learn not to become dependent on them.

If these shortcomings are addressed somehow, it is possible that Privacy Pass could help sites detect or identify fraud or abuse. However, implementing the safeguards necessary to protect privacy and equitable access is not easy. It might not even be worth it.

Questionable options

Google have proposed an extension to Privacy Pass that carries secret information. The goal here is to allow sites to rely on an assessment of trust that is made by another site, but not reveal the decision to the client. All clients would be expected to retrieve a token and proffer one in order to access the service. Suspicious clients would be given a token that secretly identifies them as such.

This would avoid revealing to clients that they have been identified as potentially fraudulent, but it comes with two problems:

Any determination would only be based on information available to the site that provides the token. The marking would less reliable as a result and based only on the client identity or browser fingerprint^[8]. Consequently, any such marking would not be directly usable and it need to be combined with other indicators, like how the client behaves.
Clients that might be secretly classified as dishonest have far less incentive to carry a token that might label them as such.

The secret bit also carries information, which – again – could mean anything. Anything like this would need safeguards against privacy abuse by token providers.

Google have also proposed Web Environment Integrity, which seeks to suppress diversity of client software. Eric Rescorla has a good explanation of how this sort of approach is problematic. Without proper safeguards, the same concerns apply to Apple’s Private Access Tokens.

The key insight for me is that all of these technologies risk placing restrictions on how people access the Web. Some more than others. But openness is worth protecting, even if it does make some things harder. Fraud and abuse management are in some ways a product of that openness, but so is user empowerment, equity of access, and privacy.

Summary

It seems unlikely that anything is going to change. Those who want to commit fraud will continue to try to evade detection and those who are trying to stop them will try increasingly invasive methods, including fingerprinting.

Fraud and abuse are something that many sites contend with. There are no easy or assured methods for managing fraud or abuse risk. Defenders look for patterns, both in client characteristics and their behaviour. Fingerprinting browsers this way can have poor privacy consquences. Concealing how attacks are classified is the only way to ensure that attackers do not adapt their methods to avoid protections. New methods for classification might help, but they create new challenges that will need to be managed.

Fraud is here to stay. Fingerprinting too. I wish that I had a better story to tell, but this is one of the prices we pay for an open Web.

I’m not comfortable using the more widely used “anti-fraud” term here. It sounds too definite, as if to imply that fraud can be prevented perfectly. Fraud and abuse can be managed, but not so absolutely. ↩︎
This story has been widely misreported, see (Schneier, The Register, and Slashdot). These articles cite a recent study from UC Irvine, which cites a study from 2014 that applies to a largely defunct CAPTCHA method. CAPTCHA fans might hold out some hope, though maybe the rest of us would be happy to never see another inane test. ↩︎
There is a whole industry around the scalping of limited run sneakers, to the point that there are specialist cloud services that boast extra low latency access to the sites for major sneaker vendors. ↩︎
Think email addresses or phone numbers. These sites like to pretend that these practices are privacy respecting, but collecting primary identifiers often involves deceptive practices. For example, making access to a service conditional on providing a phone number. ↩︎
It is widely believed that, during the second World War, that the British chose not to act on intelligence gained from their breaking of Enigma codes. No doubt the Admiralty did exercise discretion in how it used the information it gained, but the famous case of the bombing of Coventry in November 1940 was not one of these instances. ↩︎
It could be bad if tokens had something to say about the colour of a person’s skin or their gender identity. There are more bad uses than good ones for these tokens. ↩︎
Finally, a good reason to cite the study mentioned previously. ↩︎
A fingerprint could be re-evaluated on the other site without using a token, so that isn’t much help. ↩︎