Low Entropy

Expanding what HTTPS means

2024-12-24T00:00:00Z

So you have a device, maybe IoT, or just something that sits in a home somewhere. You want to be able to talk to it with HTTPS.

Recall Zooko’s “meaningful, unique, decentralized” naming trichotomy. HTTPS chooses to drop “decentralized”, relying on DNS as central control.

In effect, HTTPS follows a pretty narrow definition. To offer a server that works, you need to offer a TLS endpoint that has a certificate that meets a pretty extensive set of requirements. To get that certificate, you need a name that is uniquely yours, according to the DNS^[1].

Unique names

It is entirely possible to assign unique names to devices. There’s an awful lot of IoT thingamabobs out there, but there are far more names we could ever use. Allocation can even be somewhat decentralized by having manufacturers manage the assignment^[2].

The problem with unique names for IoT devices is that they are probably not going to be memorable (thanks Zooko). I don’t know about you, but printer.<somehash>.service-provider-cloud.example isn’t exactly convenient. Still, this is a system that is proven to work in real deployments.

It we want to make this approach work, maybe it just needs adapting. Following this approach, the problems we’d be seeking to solve are approximately:

How to make the names more manageable. For instance, how you manage to securely distribute search suffixes is a significant problem.
How to distribute certificates. ACME is an obvious choice, but what does the device talk to? Obviously, there is some need for something to connect to the big bad Internet, but how and how often?
Whether rules about certificates that apply to big bad Internet services fit in these contexts. Is it OK that you need to get fresh certificates every 45 days? How do Certificate Transparency requirements fit in this model? Does adding lots of devices to the system lead to scaling problems?

These problems all largely look like operational challenges. Any protocol engineering toward this end would be aimed at smoothing over the bumps. Many of the questions even seem to have fairly straightforward answers.

I don’t want to completely dismiss this approach as infeasible, but it seems clear that there are some pretty serious impediments. After all, nothing has really prevented someone from deploying systems this way. Many have tried. That few have succeeded^[3] is perhaps evidence in support of it being too hard.

.onion names

Tor’s solution to this problem is making names self-authenticating. You take a public key (something for which no one else can produce a valid signature) and that becomes your identity. Your server name becomes a hash of that public key. Of course, “<somelongstring>.onion” as a name is definitely not user-friendly. You won’t want to be typing that name into an address bar^[4].

That use of a name that is bound to a key recognizes that the identity of the service is bound to its name. In the world of DNS names, that binding is extrinsic and validated by a CA. In Tor, that binding is intrinsic: the name itself carries the binding.

Tor requires that endpoints follow different rules to the rest of the uniquely-named servers. Those rules include a particular protocol and deployment. Being, as they are, a bit onerous, only a few systems exist that are able to resolve “.onion” names. However, this approach does suggest that maybe there is an expansion to the definition of HTTPS that can be made to work.

.local with cryptographically bound names

The same concept as Tor could be taken to local names. Using “<somehash>.local” could be an option^[5]. The idea being that the name is verified differently, but still unique.

A name that is cryptographically verified means that you could maybe drop some of the requirements you might otherwise apply to “normal” names.

The trick here is that you are asking clients to change a fair bit. Maybe less than Tor demands, but they still need to recognize the difference. Servers also need to understand that their name has changed.

The biggest problem with relying on unique names remains: these aren’t going to be easy to remember and type.

Nicknames

One approach for dealing with ugly names is to add nicknames. In a browser, you might have a bookmark labeled “printer”, which navigates to your printer at “<somehash>.local”. Or maybe you edit /etc/hosts to add a name alias.

Either way, usability depends on the creation of a mapping from the friendly name to the unfriendly one. From a security perspective, the mapping becomes a critical component.

The idea that you might receive this critical information from the network – for example, the DHCP Domain Search Option – is no good. We gave to assume that the network is hostile^[6].

The real challenge here is that everyone will have their own nicknames, there can no canonical mapping. My printer and your printer are (probably) different devices, but we might want to use the same nickname.

TOFU and nicknames

Of course, in most of these cases, what you get from a system like this is effectively TOFU.

That is, you visit the server the first time and give it a friendly name. If that first visit was to the correct server, you can use the nickname securely thereafter. If not, and an attacker was present for your first visit, then you could be visiting them forever after.

This model works pretty well for SSH. It can also be hardened further if you care to do the extra work.

It’s a bit rough if the server key changes, which leads to some fair criticism. For use in the home, it might be good enough.

Non-unique names, unique identities

Recognizing that the practical effect of nicknames plus cryptographically-bound names, the logical next step is to just do away with the funny name entirely.

The reason we want the long and awkward label is twofold:

Firstly, we need to be able to find the thing and talk to it.
Then, we need to ensure that it has a unique identity, distinct from all other servers, so that it cannot be impersonated.

Those two things don’t need to be so tightly coupled.

Finding the thing works perfectly well without a ridiculous name. I would argue that mDNS works better for people if it uses names that make sense to them.

We could use the friendly name where it makes sense and an elaborate name – or identifier – everywhere that impersonation matters.

Managing impersonation risk

If there are potentially many printers that can use “printer.local”, how do we prevent each from impersonating any other? The basic answer is that each needs to be presented distinctly.

In the browser

On the web at least, this could be relatively simple. There are two concepts that are relevant to all interactions:

An origin. An origin is a tuple of values that are combined to form an unambiguous identifier. Origins are the basis for all web interactions. For ordinary HTTPS, this is a tuple that combines the scheme or protocol (“https”), the hostname (“www.example.com”), and the server port number (443).
A site. Certain features combine multiple origins for reasons that are convoluted and embarrassing. A site is defined as a test, rather than a tuple of values. Two origins can be same site or schemelessly same site.

Neither of these rely on having flat names for servers, which makes extending them a real possibility. For instance, “https://printer.local” might be recognized as non-unique and therefore be assigned a tuple that includes the server public key, thereby ensuring that it is distinct from all other “https://printer.local” instances.

From there, many of the reasons for impersonation can be managed. Passkeys, cookies, and any other state that a browser associates with a given “https://printer.local” are only presented to that instance, not any other. That’s a big chunk of the impersonation risk handled.

Passwords and phishing remain a challenge^[7]. Outside of the use of password manager, it won’t be hard to convince people to enter a password into the wrong instance. That might be something that can be managed with UX changes, but that’s unlikely to be perfect.

Elsewhere

Outside of the browser, there are a lot of systems that do not update in quite the same fashion as browsers. Their definition of server identity is likely to be less precise than the origin/site model browsers use.

For these, it might be easier to formulate a name that includes a cryptographic binding to the public key. That name could be used in place of the short, friendly name. There are reserved names that can be used for this purpose.

Working out how to separate out places where names need to be unique and where they can be user-friendly isn’t that straightforward. A starting point might be to use an ugly name everywhere, with substitution of nicer names being done surgically.

One place that might need to be tweaked first is the protocol interactions. A printer might easily handle being known as “printer.local”, but it might be less able to handle being known as “<somehash>.whatever.example”. That would keep the changes for servers to a minimum.

Key rotation and other problems

One reasonable criticism of this approach is that no mechanisms exist to support servers changing their keys.

That’s mostly OK. Key rotation will mean a new identity, which resets existing state. Losing state is likely tolerable for cookies and passkeys. the phishing risk of having to enter a password to restore state, on the other hand, is pretty bad.

That’s a genuine problem that would need work. Of course, if the alternative is no HTTPS, it might be a good trade.

Servers in these environments probably shouldn’t be rotating keys anyway. Things like expiration of certificates largely only serve to ensure that servers are equipped to deal with change. A server at a non-unique name doesn’t have to deal with its name disappearing or having to renew it periodically. Those that want to deal with all of that can get a real name.

Of course, this highlights how this would require a distinct set of rules for non-unique names. Working out what this differences need to be is the hard part.

Conclusion

Extending the definition of HTTPS to include non-unique names is potentially a big step. However, it might mean that we can do away with the bizarre exceptions we have for unsecured HTTP in certain environments.

This post sketched out a model that requires very little of servers. Servers only need to present a certificate over TLS, with a unique key. It doesn’t care much what those certificates contain^[8]. Changes are focused on clients and what they expect from devices.

Allowing a system that is obviously lesser to share the “HTTPS” scheme with the system we know (and love/hate/respect/loathe/dread) might seem dishonest or misleading. I maintain that – as long as the servers with real names are unaffected, as they would be – no harm comes from a more inclusive definition.

Expanding what it means to be an HTTPS server might help eliminate unsecured local services. After all, cleartext HTTP is not fit for deployment to the Internet.

Or, maybe, a globally unique IP address. Really, you don’t want that though. ↩︎
Let’s pretend that the manufacturer isn’t going to go out of business during the lifetime of the widget. OK, I can’t pretend: this is unrealistic. Even if they stay in business, there is no guarantee that they will maintain the necessary services. ↩︎
With some notable exceptions. ↩︎
And good luck noticing the phishing attack that replaces the name. It’s not that hard for an attacker to replace the name with one that matches a few characters at the start and end. How do you think Facebook got “facebookcorewwwi.onion”? ↩︎
You might use xx--\<somehash>.local or some other reserved label to eliminate the risk, however remote, of collisions with existing names. ↩︎
You hand your packets to the attacker to forward. ↩︎
I should be recommending the use of passkeys here, pointing to Adam Langley’s nice book, but – to be perfectly frank – the user experience still sucks. Besides, denying that people use passwords is silly. ↩︎
It might not be that simple. You probably want the server to include its name, if only to avoid unknown key share attacks. That might rule out the use of raw public keys. ↩︎

C2PA Is Not Going To Fix Our Misinformation Problem

2024-12-12T00:00:00Z

A lot of people are deeply concerned about misinformation.

People often come to believe in falsehoods as part of how they identify with a social group. Once established, false beliefs are hard to overcome. Beliefs are a shorthand we use in trying to make sense of the world.

Misinformation is often propagated in order to engender delusion, or a firmly-held belief that does not correspond with reality. Prominent examples of delusions include belief in a flat earth, the risk of vaccines causing autism, or that moon landing was staged.

Delusions – if sufficiently widespread or if promoted aggressively enough – can have a significant effect on the operation of our society, particularly when it comes to involvement in democratic processes.

Misinformation campaigns seek to drive these effects. For instance, promoting a false belief that immigrants are eating household pets might motivate the implementation of laws that lead to unjustifiable treatment of immigrants.

For some, the idea that technology might help with this sort of problem is appealing. If misinformation is the cause of harmful delusions, maybe having less misinformation would help.

The explosion in popularity and efficacy of generative AI has made the creation of content that carries misinformation far easier. This has sharpened a desire to build tools to help separate truth and falsehood.

A Security Mechanism

Preventing the promotion of misinformation can be formulated a security goal. We might set out one of two complementary goals:

It must be possible to identify fake content as fake.
It must be possible to distinguish genuine content.

Our adversary might seek to pass off fake content as genuine. However, an easier goal might be easier to achieve: the adversary only needs to avoid having their fake content being identified as fabrications.

Note that we assume that once a story is established as fake, most people will cease to believe it. That’s a big assumption, but we can at least pretend that this will happen for the purposes of this analysis.

In terms of capabilities, any adversary can be assumed to be capable of using generative AI and other tools to produce fake content. We also allow the adversary access to any mechanism used to distinguish between real and fake content^[1].

Technical Options

Determining what is – or is not – truthful is not easy. Given an arbitrary piece of content, it is not trivial to determine whether it contains fact or fabrication. After all, if it were that simple, misinformation would not be that big a problem.

Technical proposals in this space generally aim for a less ambitious goal. One of two approaches is typically considered:

Mark fake content as fake.
Mark genuine content as genuine.

Both rely on the system that creates content knowing which of the two applies. The creator can therefore apply the requisite mark. As long as that mark survives to be read by the consumer of the content, what the creator knew about whether the content was “true” can be conveyed.

Evaluating these options against the goals of our adversary – who seeks to pass off fake content as “real” – is interesting. Each approach requires high levels of adoption to be successful:

If an adversary seeks to pass off fake content as real, virtually all fake content needs to be marked as such. Otherwise, people seeking to promote fake content can simply use any means of production that don’t add markings. Markings also need to be very hard to remove.
In comparison, genuine content markings might still need to be universally applied, but it might be possible to realize benefits when limited to specific outlets.

That makes markings on genuine content more appealing as a way to help counteract misinformation.

Attesting to Fakeness

If content (text, image, audio, or video) is produced with generative AI, it can maybe include some way to check that it is fake. The output of many popular LLMs often includes both metadata and a small watermark.

These indications are pretty useless if someone is seeking to promote a falsehood. It is trivial to edit content to remove metadata. Similarly, visible watermarks can be edited out of images.

The response to that is a form of watermarking that is supposed to be impossible to remove. Either the generator embeds markings in the content as it is generated, or the marking is applied to the output content by a specialized process.

A separate system is then provided that can take any content and determine whether it was marked.

The question then becomes whether it is possible to generate a watermark that cannot be removed. This paper makes a strong case for the negative by demonstrating the removal – and re-application – of arbitrary watermarks, is possible, requiring only access to the system that rules on whether the watermark is present.

Various generative AI vendors companies have implemented systems of markings, including metadata, removable watermarks, and watermarking that is supposed to be resistant to removal.

Furthermore, generative AI models have to be controlled so that people can’t generate their own content without markings. That is clearly not feasible, as much as some would like to retain control.

Even if model access could be controlled, it seems likely that watermarks will be removable. At best, this places the systems that apply markings in a escalating competition with adversaries that seek to remove (or falsify) markings.

Content Provenance

There’s a case to be made for the use of metadata in establishing where content came from, namely provenance. If the goal is to positively show that content was generated in a particular way, then metadata might be sufficient.

Provenance could work to label content as either fake or real. However, it is most interesting as a means of tracing real content to its source because that might be more feasible.

The most widely adopted system is C2PA. This system has received a lot of attention and is often presented as the answer to online misinformation.

An unpublished opinion piece that I wrote in 2023 about C2PA is highly critical. This blog is a longer examination of what C2PA might offer and its shortcomings.

How C2PA Works

The C2PA specification is long and somewhat complicated^[2], but the basics are pretty simple:

Content is digitally signed by the entity that produced it. C2PA defines a bunch of claims that all relate to how the content was created.

C2PA binds attributes to content in one of two ways. A “hard” binding uses a cryptographic hash, which ensures that any modification to the content invalidates the signature. A “soft” binding binds to a perceptual hash or a watermark (more on that below).

The C2PA metadata includes a bunch of attributes, including a means of binding to the content, all of which are digitally signed.

An important type of attribute in C2PA is one that points to source material used in producing derivative content. For instance, if an image is edited, an attribute might refer to the original image. This is supposed to enable the tracing of:

the original work, when the present work contains edits, or
the components that comprise a derivative work.

What Might Work in C2PA

Cryptographic assertions that come from secured hardware might be able to help identify “real” content.

A camera or similar capture device could use C2PA to sign the content it captures. Provided that the keys used cannot be extracted from the hardware^[3], an assertion by the manufacturer might make a good case for the image being genuine.

The inclusion of metadata that includes URLs for source material – “ingredients” in C2PA-speak^[4] – might also be useful in finding content that contains a manufacturer signature. That depends on the metadata including accessible URLs. As any assertion in C2PA is optional, this is not guaranteed.

Where C2PA Does Not Deliver

The weaknesses in C2PA are somewhat more numerous.

This section looks in more detail at some aspects of C2PA that require greater skepticism. These are the high-level items only; there are other aspects of the design that seem poorly specified or problematic^[5], but the goal of this post is to focus on the primary problem.

C2PA Soft Bindings

A soft binding in C2PA allows for modifications of the content. The idea is that the content might be edited, but the assertions would still apply.

As mentioned, two options are considered in the specification:

Perceptual hashing, which are non-cryptographic digests of content that are intended to remain stable when content is edited.
Watermarking, which binds to a watermark that is embedded in the content.

In an adversarial setting, the use of perceptual hashes is well-studied, with numerous results that show exploitable weaknesses.

Perceptual hashes are not cryptographic hashes, so they are often vulnerable to cryptanalytic attack. Collision and second preimage attacks are most relevant here:

Collision attacks – such as this one – give an adversary the ability to generate two pieces of content with the same fingerprint.
Second preimage attacks – such as implemented with this code – allow an adversary to take content that produces one output and then modify completely different content so that it results in the same fingerprint.

Either attack allows an adversary to substitute one piece of content for another, though the preimage attack is more flexible.

Binding to a watermark appears to be easier to exploit. It appears to be possible to extract a watermark from one piece of content and apply it to another. Watermarks are often able to be removed – such as the TrustMark-RM mode of TrustMark^[6] – and re-applied. That makes it possible to extract a watermark from one piece of content and copy it – along with any C2PA assertions – to entirely different content.

C2PA Traceability and Provenance

One idea that C2PA promotes is that source material might be traced. When content is edited in a tool that supports C2PA, the tool embeds information about the edits, especially any source material. In theory, this makes it possible to trace the provenance of C2PA-annotated content.

In practice, tracing provenance is unlikely to be a casual process. Some publisher sites might aid the discovery of source material but content that is redistributed in other places could be quite hard to trace^[7].

Consider photographs that are published online. Professional images are captured in formats like RAW that are unsuitable for publication. Most images are often transcoded and edited for publication.

To trace provenance, editing software needs to embed its own metadata about changes^[8], including a means of locating the original^[9].

Any connection between the published and original content cannot be verified automatically in a reliable fashion. A hard, or cryptographic, binding is immediately invalidated by any edit.

The relationship between edited and original content therefore cannot be validated by a machine. Something like a perceptual hash might be used to automate this connection. However, as we’ve already established, perceptual hashes are vulnerable to attack. Any automated process based on a perceptual hash is therefore unreliable.

At best, a human might be able to look at images and reach their own conclusions. That supports the view that provenance information is unlikely to be able to take advantage of the scaling that might come from machine validation.

C2PA and DRM

With a published specification, anyone can generate a valid assertion. That means that C2PA verifiers need some means of deciding which assertions to believe.

For hardware capture of content (images, audio, and video), there are relatively few manufacturers. For the claims of a hardware manufacturer to be credible, they have to ensure that the keys they use to sign assertions can only be used with unmodified versions of their hardware.

That depends on having a degree of control. Control over access to secret keys in specialized hardware modules means that it might be possible to maintain the integrity of this part of the system.

There is some risk of this motivating anti-consumer actions on the part of manufacturers. For example, cameras could refuse to produce assertions when used with aftermarket lenses. Or, cameras that stop producing assertions if they are repaired.

As long as modifying hardware only results in a loss of assertions, that seems unlikely to be a serious concern for many people. Very few people seek to modify hardware^[10].

The need to restrict editing software is far more serious. In order for edits to be considered trustworthy, strict controls are necessary.

The need for controls would make it impossible for open source software to generate trustworthy assertions. Assertions could only be generated to cloud-based – or maybe DRM-laden – software.

Completely New Trust Infrastructure

The idea of creating trust infrastructure for authenticating capture device manufacturers and editing software vendors is somewhat daunting.

Experience with the Web PKI shows that this is a non-trivial undertaking. A governance structure needs to be put in place to set rules for how inclusions – and exclusions – are decided. Systems need to be put in place for distributing keys and for managing revocation.

This is not a small undertaking. However, for this particular structure, it is not unreasonable to expect this to work out. With a smaller set of participants than the Web PKI, along with somewhat lower stakes, this seems possible.

Alternative Trust Infrastructure Options

In discussions about C2PA, when I raised concerns about DRM, Jeffrey Yasskin mentioned a possible alternative direction.

In that alternative, attestations are not made by device or software vendors. Content authors (or editors or a publisher) would be the ones to make any assertions. Assertions might be tied to an existing identity, such as a website domain name, avoiding any need to build an entirely new PKI.

A simple method would be to have content signed^[11] by a site that claims it. That immediately helps with the problem of people attempting to pass fake information as coming from a particular source.

The most intruiging version of this idea relies on building a reputation system for content. If content can then be traced to its source, the reputation associated that source can in some way be built up over time.

The key challenge is that this latter form changes from a definitive sort of statement – under C2PA, content is either real or not – to a more subjective one. That’s potentially valuable in that it encourages more active engagement with the material.

The idea of building new reputational systems is fascinating but a lot more work is needed before anything more could be said.

A Simpler Provenance

The difficulty of tracing, along with the problems associated with editing, suggests a simpler approach.

The benefits of C2PA might be realized by a combination of hardware-backed cryptographic assertions and simple pointers (that is, without digital signatures) from edited content to original content.

Even then, an adversary still has a few options.

Trickery

When facial recognition systems were originally built, researchers found that some of these could be defeated by showing the camera a photo^[12].

Generating a fake image with a valid assertion could as simple as showing a C2PA camera a photograph^[13]. The use of trick photography to create a false impression is also possible.

No Expectations

It is probably fair to say that – despite some uptake of C2PA – most content in existence does not include C2PA assertions.

Limited availability seriously undermines the value of any provenance system in countering misinformation. An attacker can remove metadata if people do not expect it to be present.

This might be different for media outlets that implement policies that result in universal – or at least near-universal – use of something like C2PA. Then, people can expect content produced by that outlet will contain provenance information.

Articles on social media can still claim to be from that outlet. However, it might become easier to refute that sort of false claim.

That might be reason enough for a media outlet to insist on implementing something like C2PA. After all, the primary currency in which journalistic institutions trade is their reputation. Having a technical mechanism that can support refutation of falsified articles has some value in terms of being able to defend their reputation.

The cost might be significant, if the benefits are not realized until nearly all content is traceable. That might entail replacing every camera used by journalists and outside contributors. Given the interconnected nature of news media, with many outlets publishing content that is sourced from partners, that’s likely a big ask.

A Lack of Respect for the Truth

For any system like this to be effective, people need to care about whether something is real or not.

It is not just about expectations, people have to be motivated to interrogate claims and seek the truth. That’s not a problem that can be solved by technical means.

Conclusion

The narrow applicability of the assertions for capture hardware suggests that a simpler approach might be better and more feasible. Some applications – such as in marking generated content – are probably ineffectual as a means of countering misinformation. The DRM aspect is pretty ugly, while not really adding any value.

All of which is to say that the technical aspects of provenance systems like C2PA are not particularly compelling.

We have to assume that people will need to be able to ask whether content is real or fake for the system to work. ↩︎
And – it pains me to say – it is not very good. I write specifications for a living, so I appreciate how hard it is to produce something on this scale. Unfortunately, this specification needs far more rigor. I suspect that the only way to implement C2PA successfully would be to look at one of the implementations. ↩︎
That’s a big “if”, though not implausible. Though hardware keys used in consumer hardware have been extracted, the techniques used for protecting secrets require considerable resources. That would only invalidate the signatures from a single manufacturer or limited product lines. C2PA might not be worth the effort. ↩︎
C2PA can also indicate generative AI ingredients such as the text prompt used and the details of the generative model. That’s not much use in terms of protecting against use of content for misinformation, but it might have other uses. ↩︎
For instance, the method by which assertions can be redacted is pretty questionable. See my post on selective disclosure for more on what that sort of system might need to do. ↩︎ ↩︎
TrustMark is one of the soft binding mechanisms that C2PA recognizes. It’s also the first one I looked into. I have no reason to believe that other systems are better. ↩︎
C2PA does not use standard locators (such as https://), defining a new URI scheme. That suggests that the means of locating source material is likely not straightforward. ↩︎
I did not look into how much detail about edits is recorded. Some of the supporting material for C2PA suggests that this could be quite detailed, but that seems impractical and the specification only includes a limited set of edit attributes. ↩︎
C2PA also defines metadata for an image thumbnail. Nothing prevents this from including a false representation. ↩︎
This might be more feasible for images and video than for audio. Image and video capture equipment is often integrated into a single unit. Audio often features analog interconnections between components, which makes it harder to detect falsified inputs. ↩︎
Yes, we’ve been here before. Sort of. ↩︎
Modern systems use infrared or depth cameras that are harder to spoof so trivially, though not completely impossible: hardware spoofing and depth spoofing both appear to be feasible. ↩︎
C2PA has the means to attest to depth information, but who would expect that? Especially when you can redact any clues that might lead someone to expect it to be present^[5:1]. ↩︎

Everything you need to know about selective disclosure

2024-11-21T00:00:00Z

Why does this matter?

A lot of governments are engaging with projects to build “Digital Public Infrastructure”. That term covers a range of projects, but one of the common and integral pieces relates to government-backed identity services. While some places have had some form of digital identity system for years — hi Estonia! — there are many more governments looking to roll out some sort of digital identity wallet for their citizens. Notably, the European Union recently passed a major update to their European Digital Identity Regulation, which seeks to have a union-wide digital identity system for all European citizens. India’s Aadhaar is still the largest such project with well over a billion people enrolled.

There are a few ways that these systems end up being implemented, but most take the same basic shape. A government agency will be charged with issuing people with credentials. That might be tied to driver licensing, medical services, passports, or it could be a new identity agency. That agency issues digital credentials that are destined for wallets in phones. Then, services can request that people present these credentials at certain points, as necessary.

The basic model that is generally used looks something like this:

The government agency is the “issuer”, your wallet app is a “holder”, and the service that wants your identity information is a “verifier”.

This is a model for digital credentials that is useful in describing a lot of different interactions. A key piece of that model is the difference between a credential, which is the thing that ends up in a wallet, and a presentation, which is what you show a verifier.

This document focuses on online use cases. That is, where you might be asked to present information about your identity to a website Though there are many other uses for identity systems, online presentation of identity is becoming more common. How we use identity online is likely to shape how identity is used more broadly.

The goal of this post is to provide information and maybe a fresh perspective on the topic. This piece also has a conclusion that suggests that the truly hard problems in online identity are not technical in nature, so do not necessarily benefit from the use of selective disclosure. As much as selective disclosure is useful in some contexts, there are significant challenges in deploying it on the Web.

What is selective disclosure?

A presentation might be a reduced form of the credential. Let’s say that you have a driver license, like the following:

One way of thinking about selective disclosure is to think of it as redacting those parts of the credential that you don’t want to share.

Let’s say that you want to show that you are old enough to buy alcohol. You might imagine doing something like this:

That is, if you were presenting that credential to a store in person, you would want to show that the card truly belongs to you and that you are old enough.

If you aren’t turning up in person, the photo and physical description are not that helpful, so you might cover those as well.

You don’t need to share your exact birth date to show that you are old enough. You might be able to cover the month and day of those too. That is still too much information, but the best you can easily manage with a black highlighter.

If there was a “can buy alcohol” field on the license, that might be even better. But the age at which you can legally buy alcohol varies quite a bit across the world. And laws apply to the location, not the person. A 19 year old from Canada can’t buy alcohol in the US just because they can buy alcohol at home^[1]. Most digital credential systems have special fields to allow for this sort of rule, so that a US^[2] liquor store could use an “over_21” property, whereas a purchase in Canada might check for “over_18” or “over_19” depending on the province.

Simple digital credentials

The simplest form of digital credential is a bag of attributes, covered by a digital signature from a recognized authority. For instance, this might be a JSON Web Token, which is basically just a digitally-signed chunk of JSON.

For our purposes, let’s run with the example, which we’d form into something like this:

{
  "number": "01-47-87441",
  "name": "McLOVIN",
  "address": "892 MOMONA ST, HONOLULU, HI 96820",
  "iss": "1998-06-18",
  "exp": "2008-06-03",
  "dob": "1981-06-03",
  "over_18": true,
  "over_21": true,
  "over_55": false,
  "ht": "5'10",
  ...
}

That could then be wrapped up and signed by whatever Hawaiian DMV issues the license. Something like this:

That isn’t perfect, because a blob of bytes like that can just be copied around by anyone that receives that credential. Anyone that received a credential could “impersonate” our poor friend.

The way that problem is addressed is through the use of a digital wallet. The issuer requires that the wallet hold a second signing key. The wallet provides the issuer with an attestation, which is just evidence from the wallet maker (which is often the maker of your phone) that they are holding a private key in a place where it can’t be moved or copied^[3]. That attestation includes the public key that matches that private key.

Once the issuer is sure that the private key is tied to the device, the issuer produces a credential that lists the public key from the wallet.

In order to use the credential, the wallet signs the credential along with some other stuff, like the current time and maybe the identity of the verifier^[4], as follows:

With something like this, unless someone is able to use the signing key that is in the wallet, they can’t generate a presentation that a verifier will accept. It also ensures that the wallet can use a biometric or password check to ensure that a presentation is only created when the person allows it.

That is a basic presentation that includes all the information that the issuer knows about. The problem is that this is probably more than you might be comfortable with sharing with a liquor store. After all, while you might be able to rely on the fact that the cashier in a store isn’t copying down your license details, you just know that any digital information you present is going to be saved, stored, and sold. That’s where selective disclosure is supposed to help.

Salted hash selective disclosure

One basic idea behind selective disclosure is to replace all of the data elements in a credential — or at least the ones that someone might want to keep to themselves — with placeholders. Those placeholders are replaced with a commitment to the actual values. Any values that someone wants to reveal are then included in the presentation. A verifier can validate that the revealed value matches the commitment.

The most basic sort of commitment is a hash commitment. That uses a hash function, which is really anything where it is hard to produce two inputs that result in the same output. The commitment to a value of X is H(X).

That is, you might replace the (“name”, “McLOVIN”) with a commitment like H(“name” || “McLOVIN”). The hash function ensures that it is easy to validate that the underlying values match the commitment, because the verifier can compute the hash for themselves. But it is basically impossible to recover the original values from the hash. And it is similarly difficult to find another set of values that hash to the same value, so you can’t easily substitute false information.

A key problem with a hash commitment is that a simple hash commitment only works to protect the value of the input if that input is hard to guess in the first place. But most of the stuff on a license is pretty easy to guess in one way or another. For simple stuff like “over_21”, there are just two values: “true” or “false”. If you want to know the original value, you can just check each of the values and see which matches.

Even for fields that have more values, it is possible to build a big table of hash values for every possible (or likely) value. This is called a “rainbow table”^[5].

Rainbow tables don’t work if the committed value very hard to guess. So, in addition to the value of the field, a large random number is added to the hidden value. This number is called “salt” and a different value needs to be generated for every field that can be hidden, with different values for every new credential. As long as there are many more values for the salt than can reasonably be stored in a rainbow table, there is no easy way to work out which commitment corresponds to which value.

So for each field, the issuer generates a random number and replaces all fields in the credential with H(salt || name || value), using some agreed encoding. The issuer then signs over those commitments and provides the wallet with a credential that is full of commitments, plus the full set of values that were committed to, including the associated salt.

The wallet can then use the salt and the credential to reveal a value and prove that it was included in the credential, creating a presentation something like this:

The verifier then gets a bunch of fields with the key information replaced with commitments. All of the commitments are then signed by the issuer. The verifier also gets some number of unsigned tuples of (salt, name, value). The verifier can then check that H(salt || name || value) matches one of the commitments.

This is the basic design that underpins a number of selective disclosure designs. Salted hash selective disclosure is pretty simple to build because it doesn’t require any fancy cryptography. However, salted hash designs have some limitations that can be a little surprising.

Other selective disclosure approaches

There are other approaches that might be used to solve this problem. Imagine that you had a set of credentials, each of which contained a single attribute. You might imagine sharing each of those credentials separately, choosing which ones you show based on what the situation demanded.

That might look something like this:

Having multiple signatures can be nefficient, but this basic idea is approximately sound^[7]. There are a lot of signatures, which would make a presentation pretty unwieldy if there were lots of properties. There are digital signature schemes that make this more efficient though, like the BLS scheme, which allows multiple signatures to be folded into one.

That is the basic idea behind SD-BLS. SD-BLS doesn’t make it cheaper for an issuer. An issuer still needs to sign a whole bunch of separate attributes. But combining signatures means that it can make presentations smaller and easier to verify. SD-BLS has some privacy advantages over salted hashes, but the primary problem that the SD-BLS proposal aims to solve is revocation, which is covered in more detail below.

Problems with salted hashes

Going back to the original example, the effect of the salted hash is that you probably get something like this:

Imagine that every field on the license is covered with the gray stuff you get on scratch lottery tickets. You can choose which to scratch off before you hand it to someone else^[8]. Here’s what they learn:

That this is a valid Hawaii driver license. That is, they learn who issued the credential.
When the license expires.
The value of the fields that you decided to reveal.
How many fields you decided not to reveal.
Any other places that you present that same credential, as discussed below.

On the plus side, and contrary to what is shown for a physical credential, the size and position of fields is not revealed for a digital credential.

Still, that is likely a bit more information than might be expected. If you only wanted to reveal the “over_21” field so that you could buy some booze, having to reveal all those other things isn’t exactly ideal.

Revealing who issued the credential seems like it might be harmless, but for a digital credential, that’s revealing a lot more than your eligibility to obtain liquor. Potentially a lot more. Maybe in Hawaii, holding a Hawaii driver license isn’t notable, but it might be distinguishing — or even disqualifying — in other places. A Hawaii driver license reveals that you likely live in Hawaii, which is not exactly relevant to your alcohol purchase. It might not even be recognized as valid in some places.

If the Hawaiian DMV uses multiple keys to issue credentials, you’ll also reveal which of those keys was used. That’s unlikely to be a big deal, but worth keeping in mind as we look at alternative approaches.

Revealing the number of fields is a relatively minor information leak. This constrains the design a little, but not in a serious way. Basically, it means that you should probably have the same set of fields for everyone.

For instance, you can’t include only the “over_XX” age fields that are true; you have to include the false ones as well or the number of fields would reveal an approximate age. That is, avoid:

{ ..., "older_than": [16, 18], ... }

Note: Some formats allow individual items in lists like this to be committed separately. The name of the list is generally revealed in that case, but the specific values are hidden. These usually just use H(salt || value) as the commitment.

And instead use:

{ ..., "over_16": true, "over_18": true, "over_21": false, "over_55": false, ... }

Expiration dates are tricky. For some purposes, like verifying that someone is allowed to drive, the verifier will need to know if the credential is not expired.

On the other hand, expiry is probably not very useful for something like age verification. After all, it’s not like you get younger once your license expires.

The exact choice of expiration date might also carry surprising information. Imagine that only one person was able to get a license one day because the office had to close or the machine broke down. If the expiry date is a fixed time after issuance, the expiry date on their license would then be unique to them, which means that revealing that expiration date would effectively be identifying them.

The final challenge here is the least obvious and most serious shortcoming of this approach: linkability.

Linkability and selective disclosure

A salted hash credential carries several things that makes the credential itself identifiable. This includes the following:

The value of each commitment is unique and distinctive.
The public key for the wallet.
The signature that the issuer attaches to the credential.

Each of these is unique, so if the same credential is used in two places, it will clearly indicate that this is the same person, even if the information that is revealed is very limited.

For example, you might present an “over_21” to purchase alcohol in one place, then use the full credential somewhere else. If those two presentations use the same credential, those two sites will be able to match up the presentations. The entity that obtains the full credential can then share all that knowledge with the one that only knows you are over 21, without your involvement.

Even if the two sites only receive limited information, they can still combine the information they obtain — that you are over 21 and what you did on each site — into a profile. The building of that sort of profile online is known as unsanctioned tracking and generally regarded as a bad thing.

This sort of matching is technically called verifier-verifier linkability. The way that it can be prevented is to ensure that a completely fresh credential is used for every presentation. That includes a fresh set of commitments, a new public key from the wallet, and a new signature from the issuer (naturally, the thing that is being signed is new). At the same time, ensuring that the presentation doesn’t include any extraneous information, like expiry dates, helps.

A system like this means that wallets need to be able to handle a whole lot of credentials, including fresh public keys for each. The wallet also needs to be able to handle cases where its store of credentials run out, especially when the wallet is unable to contact the issuer.

Issuers generally need to be able to issue larger batches of credentials to avoid that happening. That involves a lot of computationally intensive work for the issuer. This makes wallets quite a bit more complex. It also increases the cost of running issuance services because they need better availability, not just because they need more issuance capacity.

In this case, SD-BLS has a small advantage over salted hashes because its “unregroupability” property means that presentations with differing sets of attributes are not linkable by verifiers. That’s a weaker guarantee than verifier-verifier unlinkability, because presentations with the same set of attributes can still be linked by a verifier; for that, fresh credentials are necessary.

Using a completely fresh credential is a fairly effective way to protect against linkability for different verifiers, but it does nothing to prevent verifier-issuer linkability. An issuer can remember the values they saw when they issued the credential. A verifier can take any one of the values from a presentation they receive (commitments, public key, or signature) and ask the issuer to fill in the blanks. The issuer and verifier can then share anything that they know about the person, not limited to what is included in the credential.

What the issuer and verifier can share isn’t limited to the credential. They can share anything they know, not just the stuff that was included in the credential. Maybe McLovin needed to show a passport and a utility bill in order to get a license and the DMV kept a copy. The issuer could give that information to the verifier. The verifier can also share what they have learned about the person, like what sort of alcohol they purchased.

Useful linkability

In some cases, linkability might be a useful or essential feature. Imagine that selective disclosure is used to authorize access to a system that might be misused. Selective disclosure avoids exposing the system to information that is not essential. Maybe the system is not well suited to safeguarding private information. The system only logs access attempts and the presentation that was used.

In the event that the access results in some abuse, the abuse could be investigated using verifier-issuer linkability. For example, the access could be matched to information available to the issuer to find out who was responsible for the abuse.

The IETF is developing a couple of salted hash formats (in JSON and CBOR) that should be well suited to a number of applications where linkability is a desirable property.

All of this is a pretty serious problem for use for something like online age verification. Having issuers, which are often government agencies, being in a position to trace activity, might have an undesirable chilling effect. This is something that legislators generally recognize and laws often include provisions that require unlinkability^[9].

In short, salted hash based systems only work if you trust the issuer.

Linkable attributes

There is not much point in avoiding linkability when the disclosed information is directly linkable. For instance, if you selectively disclose your name and date of birth, that information is probably unique or highly identifying. Revealing identifying information to a verifier makes verifier-issuer linkability easy; just like revealing the same information to two verifiers makes verifier-verifier linkability simple.

This makes linkability for selective disclosure less concerning when it comes to revealing information that might be identifying.

Unlinkability therefore tends to be most useful for non-identifying attributes. Simple attributes — like whether someone meets a minimum age requirement, holds a particular qualification, or has authorization — are less likely to be inherently linkable, so are best suited to being selectively disclosed.

Privacy Pass

If the goal is to provide a simple signal, such as whether a person is older than a target age, Privacy Pass is specifically designed to prevent verifier-issuer linkability.

Privacy Pass also includes options that split the issuer into two separate functions — an issuer and an attester — where the attester is responsible for determining if a holder (or client) has the traits required for token issuance and the issuer only creates the tokens. This might be used to provide additional privacy protection.

A Privacy Pass issuer could produce a token that signifies possession of a given trait. Only those with the trait would receive the token. For age verification, the token might signify that a person is at a selected age or older.

Token formats for Privacy Pass that include limited public information are also defined, which might be used to support selective disclosure. This is far less flexible than the salted hash approach as a fresh token needs to be minted with the set of traits that will be public. That requires that the issuer is more actively involved or that the different sets of public traits are known ahead of time.

Privacy Pass does not naturally provide verifier-verifier unlinkability, but a fresh token could be used for each usage, just like for the salted hash design. Some of the Privacy Pass modes can issue a batch of tokens for this reason.

In order to provide tokens for different age thresholds or traits, an issuer would need to use different public keys, each corresponding to a different trait.

Privacy Pass is therefore a credible alternative to the use of salted hash selective disclosure for very narrow cases. It is somewhat inflexible in terms of what can be expressed, but that could mean more deliberate additions of capabilities. The strong verifier-issuer unlinkability is definitely a plus, but it isn’t without shortcomings.

Key consistency

One weakness of Privacy Pass is that it depends on the issuer using the same key for everyone. The ideal privacy is provided when there is a single issuer with just one key for each trait. With more keys or more issuers, the key that is used to generate a token carries information, revealing who issued the token. This is just like the salted hash example where the verifier needs to learn that the Hawaiian DMV issued the credential.

The privacy of the system breaks down if every person receives tokens that are generated using a key that is unique to them. This risk can be limited through the use of key consistency schemes. This makes the system a little bit harder to deploy and operate.

As foreshadowed earlier, the same key switching concern also applies to a salted hash design if you don’t trust the issuer. Of course, we’ve already established that a salted hash design basically only works if you trust the issuer. Salted hash presentations are linkable based on commitments, keys, or signatures, so there is no real need to play games with keys.

Anonymous credentials

A zero knowledge proof enables the construction of evidence that a prover knows something, without revealing that information. For an identity system, it allows a holder to make assertions about a credential without revealing that credential. That creates what is called an anonymous credential.

Anonymous credentials are appealing as the basis for a credential system because the proofs themselves contain no information that might link them to the original credential.

Verifier-issuer unlinkability is a natural consequence of using a zero knowledge proof. Verifier-verifier unlinkability would be guaranteed by providing a fresh proof for each verifier, which is possible without obtaining a fresh credential. The result is that anonymous credentials provide excellent privacy characteristics.

Zero knowledge proofs trace back to systems of provable computation, which mean that they are potentially very flexible. A proof can be used to prove any property that can be computed. The primary cost is in the amount of computation it takes to produce and validate the proof^[10]. If the underlying credential can be adjusted to support the zero knowledge system, these costs can be reduced, which is what the BBS signature scheme does. Unmodified credentials can be used if necessary.

Thus, a proof statement for use in age verification might be a machine translation of the following compound statement:

this holder has a credential signed by the Hawaiian DMV;
the expiration date on the credential is later than the current date;
the person is 21 or older (or the date of birth plus 21 years is earlier than the current date);
the holder knows the secret key associated with the public key mentioned in the credential; and,
the credential has not been used with the current verifier more than once on this day^[11].

A statement in that form should be sufficient to establish that someone is old enough to purchase alcohol, while providing assurances that the credential was not stolen or reused. The only information that is revealed is that this is a valid Hawaiian license. We’ll see below how hiding that last bit is also possible and probably a good idea.

Reuse protections

The last statement from the set of statements above provides evidence that the credential has not been shared with others. This condition, or something like it, is a necessary piece of building a zero-knowledge system. Otherwise, the same credential can be used and reused many times by multiple people.

Limiting the number of uses doesn’t guarantee that a credential isn’t shared, but it limits the number of times that it can be reused. If the credential can only be used once per day, then that is how many times the credential can be misused by someone other than the person it was issued to.

Choosing how many times a credential might be used will vary on the exact circumstances. For instance, it might not be necessary to have the same person present proof of age to an alcohol vendor multiple times per day. Maybe it would be reasonable for the store to remember them if they come back to make multiple purchases on any given day. One use per day might be reasonable on that assumption.

In practice, multiple rate limits might be used. This can make the system more flexible over short periods (to allow for people making multiple alcohol purchases in a day) but also stricter over the long term (because people rarely need to make multiple purchases every day). For example, age checks for the purchase of alcohol might combine a three per day limit with a weekly limit of seven. Multiple conditions can be easily added to the proof, with a modest cost.

It is also possible for each verifier to specify their own rate limits according to their own conditions. A single holder would then limit the use of credentials according to those limits.

Tracking usage is easy for a single holder. An actor looking to abuse credentials by sharing and reusing them has more difficulty. A bad actor would need to carefully coordinate their reuse of a credential so that any rate limits were not exceeded.

Hiding the issuer of credentials

People often do not get to choose who issues them a credential. Revealing the identity of an issuer might be more identifying than is ideal. This is especially true for people who have credentials issued by an atypical issuer.

Consider that Europe is building a union-wide system of identity. That means that verifiers will be required to accept credentials from any country in the EU. Someone accessing a service in Portugal with an Estonian credential might be unusual if most people use a Portuguese credential. Even if the presentation is limited to something like age verification, the choice of issuer becomes identifying.

This could also mean that a credential that should be valid is not recognized as such by an issuer, simply because they chose not to consider that issuer. Businesses in Greece might be required by law to recognize other EU credentials, but what about a credential issued by Türkiye?

Zero knowledge proofs can also hide the issuer, only revealing that a credential was issued by one of a set of issuers. This means that a verifier is unable to discriminate on the basis of issuer. For a system that operates at scale, that creates positive outcomes for those who hold credentials from atypical issuers.

Credential revocation

Perhaps the hardest problem in any system that involves the issuance of credentials is what to do when the credential suddenly becomes invalid. For instance, if a holder is a phone, what do you do if the phone is lost or stolen?

That is the role of revocation. On the Web, certificate authorities are required to have revocation systems to deal with lost keys, attacks, change of ownership, and a range of other problems. For wallets, the risk of loss or compromise of wallets might also be addressed with revocation.

Revocation typically involves the verifier confirming with the issuer that the credential issued to the holder (or the holder itself) has not been revoked. That produces a tweak to our original three-entity system as follows:

Revocation is often the most operationally challenging aspect of running identity infrastructure. While issuance might have real-time components — particularly if the issuer needs to ensure a constant supply of credentials to maintain unlinkability — credentials might be issued ahead of time. However, revocation often requires a real-time response or something close to it. That makes a system with revocation much more difficult to design and operate.

Revoking full presentations

When a full credential or more substantive information is compromised, lack of revocation creates a serious impersonation risk. The inability to validate biometrics online means that a wallet might be exploited to perform identity theft or similarly serious crimes. Being able to revoke a wallet could be a necessary component of such a system.

The situation with a complete credential presentation, or presentations that include identifying information, is therefore fairly simple. When the presentation contains identifying information, like names and addresses, preventing linkability provides no benefit. So providing a direct means of revocation checking is easy.

With verifier-issuer linkability, the verifier can just directly ask the issuer whether the credential was revoked. This is not possible if there is a need to perform offline verification, but it might be possible to postpone such checks or rely on batched revocations (CRLite is a great example of a batched revocation system). Straightforward or not, providing adequate scale and availability make the implementation of a reliable revocation system a difficult task.

Revoking anonymous credentials

When you have anonymous credentials, which protect against verifier-issuer linkability, revocation is very challenging. A zero-knowledge assertion that the credential has not been revoked is theoretically possible, but there are a number of serious challenges. One issue is that proof of non-revocation depends on providing real-time or near-real-time information about the underlying credential. Research into solving the problem is still active.

It is possible that revocation for some selective disclosure cases is unnecessary. Especially those cases where zero-knowledge proofs are used. We have already accepted some baseline amount of abuse of credentials, by virtue of permitting non-identifying and unlinkable presentations. Access to a stolen credential is roughly equivalent to sharing or borrowing a credential. So, as long as the overall availability of stolen credentials is not too high relative to the availability of borrowed credentials, the value of revocation is low. In other words, if we accept some risk that credentials will be borrowed, then we can also tolerate some use of stolen credentials.

Revocation complications

Even with linkability, revocation is not entirely trivial. Revocation effectively creates a remote kill switch for every credential that exists. The safeguards around that switch are therefore crucial in determining how the system behaves.

For example, if any person can ask for revocation, that might be used to deny a person the use of a perfectly valid credential. There are well documented cases where organized crime has deprived people of access to identification documents in order to limit their ability to travel or access services.

These problems are more tied to the processes that are used, rather than the technical design. However, technical measures might be used to improve the situation. For instance, SD-BLS suggests that threshold revocation be used, where multiple actors need to agree before a credential can be revoked.

All told, and especially if dealing with revocation on the Web has taught us anything, it might not be worth the effort to add revocation. It might be easier — and no less safe — to frequently update credentials.

Authorizing Verifiers

Selective disclosure systems can fail to achieve their goals if there is a power imbalance between verifiers and holders. For instance, a verifier might withhold services unless a person agrees to provide more information than the verifier genuinely requires. That is, the verifier might effectively extort people to provide non-essential information. A system that can withhold information to improve privacy is pointless unless attempts to exercise withholding are supported.

One way to work around this is to require that verifiers be certified before they can request certain information. For instance, EU digital identity laws require that it be possible to restrict who can request a presentation. This might involve the certification of verifiers, so that verifiers would be required to provide holders with evidence that they are authorized to receive certain attributes.

A system of verifier authorization could limit overreach, but it might also render credentials ineffective in unanticipated situations, including for interactions in foreign jurisdictions.

Authorizations also need monitoring for compliance. Businesses — particularly larger businesses that engage in many activities — might gain authorization for many different purposes. Abuse might occur if a broad authorization is used where a narrower authorization is needed. That means more than a system of authorization, but creating a way to ensure that businesses or agencies are accountable for their use of credentials.

Quantum computers

Some of these systems depend on cryptography that is only classically secure. That is, a sufficiently powerful quantum computer might be able to attack the system.

Salted hash selective disclosure relies only on digital signatures and hash functions, which makes them the most resilient to attacks that use a quantum computer. However, many of the other systems described rely on some version of the discrete logarithm problem being difficult, which can make them vulnerable. Predicting when a cryptographically-relevant quantum computer might be created is as hard as any other attempt to look into the future, but we can understand some of the risks.

Quantum computers present two potential threats to any system that relies on classical cryptographic algorithms: forgery and linkability.

A sufficiently powerful quantum computer might use something like Shor’s algorithm to recover the secret key used to issue credentials. Once that key has been obtained, new credentials could be easily forged. Of course, forgeries are only a threat after the key is recovered.

Some schemes that rely on classical algorithms could be vulnerable to linking by a quantum computer, which could present a very serious privacy risk. This sort of linkability is a serious problem because it potentially affects presentations that are made before the quantum computer exists. Presentations that were saved by verifiers could later be linked.

Some of the potential mechanisms, such as the BBS algorithm, are still able to provide privacy, even if that the underlying cryptography is broken by a quantum computer. The quantum computer would be able to create forgeries, but not break privacy by linking presentations.

If we don’t need to worry about forgery until a quantum computer exists and privacy is maintained even then, we are therefore largely concerned with how long we might be able to use these systems. That gets back to the problem of predictions and balancing the cost of deploying a system against how long the system is going to remain secure. Credential systems take a long time to deploy, so — while they are not vulnerable to a future advance in the same way as encryption — planning for that future is likely necessary.

The limitations of technical solutions

If there is a single conclusion to this article is that the problems that exist in identity systems are not primarily technical. There are several very difficult problems to consider when establishing a system. Those problems only start with the selection of technology.

Any technological choice presents its own problems. Selective disclosure is a powerful tool, but with limited applicability. Properties like linkability need to be understood or managed. Otherwise, the actual privacy properties of the system might not meet expectations. The same goes for any rate limits or revocation that might be integrated.

How different actors might participate in the system needs further consideration. Decisions about who might act as an issuer in the system needs a governance structure. Otherwise, some people might be unjustly denied the ability to participate.

For verifiers, their incentives need to be examined. A selective disclosure system might be built to be flexible, which might seem to empower people with choice about what they disclose, however that might be abused by powerful verifiers to extort additional information from people.

All of which to say is: better technology does not always help as much as you might hope. Many of the problems are people problems, social problems, and governance problems, not technical problems. Technical mechanisms tend to only change the shape of non-technical problems. That is only helpful if the new shape of the problem is something that people are better able to deal with.

This is different from licensing to drive, where most countries recognize driving permits from other jurisdictions. That’s probably because buying alcohol is a simple check based on an objective measure, whereas driving a car is somewhat more involved. ↩︎
Well, most of the US. It has to do with highways. ↩︎
The issuer might want some additional assurances, like some controls over how the credential can be accessed, controls over what happens if a device is lost, stolen, or sold, but they all basically reduce to this basic idea. ↩︎
If the presentation didn’t include information about the verifier and time of use, one verifier could copy the presentation they receive and impersonate the person. ↩︎
Rainbow tables can handle relatively large numbers of values without too much difficulty. Even some of the richer fields can probably be put in a rainbow table. For example, there are about 1.4 million people in Hawaii. All the values for some fields are known, such as the complete set of possible addresses. Even if every person has a unique value, a very simple rainbow table for a field would take a few seconds to build and around 100Mb to store, likely a lot less. A century of birthdays would take much less storage^[6]. ↩︎
In practice, a century of birthdays (40k values) will have no collisions with even a short hash. You don’t need much more than 32 bits for that many values. Furthermore, if you are willing to have a small number of values associated with each hash, you can save even more space. 40k values can be indexed with a 16-bit value and a 32-bit hash will produce very few collisions. A small number of collisions are easy to resolve by hashing a few times, so maybe this could be stored in about 320kB with no real loss of utility. ↩︎
There are a few things that need care, like whether different attributes can be bound to a different wallet key and whether the attributes need to show common provenance. With different keys, the holder might mix and match attributes from different people into a single presentation. ↩︎
To continue the tortured analogy, imagine that you take a photo of the credential to present, so that the recipient can’t just scratch off the stuff that you didn’t. Or maybe you add a clear coat of enamel. ↩︎
For example, Article 5a, 16 of the EU Digital Identity Framework requires that wallets “not allow providers of electronic attestations of attributes or any other party, after the issuance of the attestation of attributes, to obtain data that allows transactions or user behaviour to be tracked, linked or correlated, or knowledge of transactions or user behaviour to be otherwise obtained, unless explicitly authorised by the user”. ↩︎
A proof can be arbitrarily complex, so this isn’t always cheap, but most of the things we imagine here are probably very manageable. ↩︎
This isn’t quite accurate. The typical approach involves the use of tokens that repeat if the credential is reused too often. That makes it possible to catch reuse, not prevent it. ↩︎

Thoughts on TAG Design Reviews

2023-11-21T00:00:00Z

Before I start on my thoughts, if you work for a W3C member organization, please head to the 2023 TAG Election page. Voting is open until 2023-12-14.

If you are considering how you might like to rank me when voting, read on. I can’t promise that this post will provide much additional context, but it might.

The W3C TAG is a bit of a strange institution. The TAG occupies a position of some privilege due to its standing within the W3C and the long-standing participation and sponsorship of Sir Tim Berners-Lee.

The TAG also has a history marked by notable documents produced under its letterhead. The TAG, through its findings, has been responsible for recognizing and analyzing certain key trends in the evolution of the Web, providing some key pieces of architectural guidance. The TAG also publishes documents with general guidance for people seeking to improve the Web, like design principles and a security and privacy questionnaire.

On a day-to-day basis, however, the TAG provides hands-on guidance to people looking to add new capabilities to the Web, primarily through design reviews. Records of early reviews trace back to 2013 in the TAG repository, but the practice has deeper roots.

The modern review record starts with a meager 5 reviews in the latter half of 2013. More recently, the TAG closed a total of 85 design reviews in 2022^[1]. Already, in 2023, there have been 106 design review requests opened.

The function of the TAG as a body primarily focused on reviewing new Web APIs is one that took a while to settle. A key driver of this increase in volume has clearly been the inclusion of TAG review as a formal precondition for shipping Web-visible changes in the Chromium project. Chromium consequently drives a lot of this review load with 73 of the 106 new requests that arrived in 2023 clearly marked as originating from “Google”, “Chromium”, or “Microsoft” as a primary driver or funder of the work^[2]. That is nearly 70% of the total review load attributed to Chromium. This is in addition to those design reviews that were initiated on behalf of a W3C group in which Chromium contributors were instrumental in the work.

Obviously, at a rate of more than 2 reviews a week, that’s a fairly major outlay in terms of time for the TAG. Proposals vary in size, but some of them are quite substantial. A good review requires reading lengthy explainers and specifications, filling gaps in understanding by talking to different people, considering alternative options, and building an understanding of the broader context. A proper review for a more substantial proposal can take weeks or even months to research, discuss, and write up.

The TAG is expanding in size this year. An increase to 12 members (8 elected, 4 appointed) does give the TAG more capacity, albeit with added coordination costs reducing efficiency. This is predicated on the idea is that reviews are the most important function of the TAG. That being the case, then adding more capacity seems like a reasonable reaction.

That an action is superficially reasonable is not the standard to apply when making such a decision. As with a design review, an examination of the alternatives is generally illuminating. Once those alternatives are understood, we might again conclude that the proposal on the table is the best possible path, but we do so with a more complete understanding of what opportunities are lost or foreclosed as a result. The AB minutes of the decision do not reflect that process, but then they are only responding to a request from the TAG.

There are several other equally reasonable ways of dealing with increased workload. If reviews are taking too long, it might be possible to find ways to make reviewing easier or faster. Perhaps the TAG has exhausted their options in that area already. Maybe they have looked at selectively rejecting more design review requests. Maybe they have considered finding ways to offload review work onto other bodies, like PING.

From my limited perspective, it is not clear that these avenues of investigation have been fully explored. For instance, I have good experience with effective directorate system that the IESG uses to greatly alleviate their workload, but I see no evidence of an effort to delegate in a similar fashion.

TAG members each volunteer time from their ordinarily busy day jobs, so any excess load spent on reviewing is time that is not available for higher functions. In addition to review load, the TAG has a role in W3C Councils and other critical procedural functions in the W3C process. Those tasks are generally not easily delegated or dealt with by a subset of TAG members.

I am supportive of efforts to better use the TAG in for key procedural functions, like the W3C Council. Those functions make the TAG more important in a good way. The W3C needs people in the role who have good judgment and the experience to inform that judgment.

Along with that, it is important to reserve some space for the TAG to provide technical leadership for the W3C and the Web community as a whole. After time spent on the procedural functions demanded by the process, design reviews have the potential to completely drain any time TAG members have to dedicate to the role, leaving no spare capacity. Ideally, there needs to be some remaining space for the careful and thoughtful work that leadership demands.

Effective technical leadership depends somewhat on the TAG being exposed to how the platform is evolving. Reviews are a great way to gain some of that exposure, but that does not mean that the TAG needs to review every single proposal.

I don’t have a specific plan yet. If appointed, it will take some time to understand what the role is and what options are available. I consider myself quite capable of performing that sort of review and I expect it would be easy to settle into that function. But I have no intent of letting design reviews dominate my time; the TAG – and the Web – deserves better.

A note on the numbers here: The TAG has a template that they use for design reviews and I have only selected reviews that include the string “requesting a TAG review”, as present in that template. There were other issues closed in this period, some of which are probably also pre-template design reviews, but I haven’t carefully reviewed those. ↩︎
For posterity, this is the search I used: opened_since(2023-01-01) not(opened_since(2024-01-01)) body("requesting a TAG review") body("(?:driving the (?:design|specification)|funded by):\\s+\\[?(?:Microsoft|Google|Chromium)")), using a tool I built in combination with the excellent GitHub issue archival tool that Mike Bishop wrote. ↩︎

Fraud, Abuse, Fingerprinting, Privacy, and Openness

2023-08-23T00:00:00Z

Fraud and abuse online are pretty serious problems. How sites manage fraud is something of a mystery to most people. Indeed, as this post will show, that’s deliberate.

This post provides an outline of how fraud management operates. It looks at the basic techniques that are used and the challenges involved. In doing so, it explores the tension between fraud management and privacy.

Hopefully this post helps you understand why fingerprinting is bad for privacy; why you should nevertheless be happy that your bank is fingerprinting you; and, why efforts to replace fingerprinting are unlikely to change anything.

Fraud and abuse are a consequence of the way the Web works. Recognizing that these are a part of the cost of a Web that values privacy, openness, and equity is hard, but I can’t see a better option.

What sorts of fraud and abuse?

This post concentrates on the conduct of fraud or abuse using online services. Web-based services mostly, but mobile apps and similar services have similar concerns.

The sorts of fraud and abuse of most interest are those that operate at scale. One-off theft needs different treatment. Click fraud in advertising is a good example. Click fraud is where a site seeks to convince advertisers that ads have been shown to people in order to get more money. Click fraud is a constant companion to the advertising industry, and one that is unlikely to ever go away. Managing click fraud is an important part of participating in advertising, and something that affects everyone that uses online services.

Outside of advertising, fraud management techniques^[1] are also used to manage the risk of fake accounts that are created for fraud or abuse purposes. Online stores and banks also use fraud management as part of an overall strategy for managing the risk of payment fraud or theft.

This is a very high-level overview, so most of this document applies equally to lots of different fraud and abuse scenarios. Obviously, each situation will be different, but I’m glossing over the details.

Understanding online fraud and abuse

Let’s say that you have a site that makes some information or service available. This site will attract clients, which we can split into two basic groups: clients that the site wants to serve, and clients that the site does not want to serve.

Why the site does not want to serve the clients from the latter group does not matter that much, but there are some common themes we tend to see. Distinguishing between humans and bots is a very common goal. CAPTCHAs are supposed to be able to distinguish this. Of course, CAPTCHAs have always had very poor accessibility properties and increasingly, computers are better at solving CATCHAs than humans^[2].

That doesn’t stop sites from wanting to be able to pick out a bot. For advertising cases, sites will want to serve humans – after all, bots are unlikely to change their purchasing habits as a result of “viewing” an ad. Similarly, sites that provide goods that are limited in quantity – such as theatre tickets or limited run goods like sneakers^[3] – might prefer to ensure that their inventory is only sold to people.

The attacker in this model seeks to access the service for some reason. In order to do so, the attacker attempts to convince sites that they are a real client.

For click fraud, a site might seek to convince its advertising partners that ads were shown to real people. The goal is to convince the advertiser to pay the fraudlent site more money. Sophisticated click fraud can also involve faking clicks or ad conversions in an effort to falsely convince the advertiser that the ads on the fraudulent site are more useful as they are responsible for sales.

An adversary rarely gains much by performing a single instance of fraud. They will often seek to automate fraud, accessing the service as many times as possible. Fraud at scale can be very damaging, but it also means that it is easier to detect.

Automation allows fraud to be conducted at scale, but it also creates telltales: signals that allow an attack to be recognized.

Detection

Detection is the first stage for anyone looking to defeat fraud or abuse. To do that, site operators will look for anomalies of any sort. Maybe the attack will appear as an increase in incoming requests or a repetitive pattern of accesses.

Repetition might be a key to detecting fraud. An attacker might try to have their attacks blend in with real humans that are also accessing the system. An attacker’s ability to mimic human behaviour is usually limited, as they often hope to execute many fraudulent transactions. Attackers have to balance the risk that they are detected against the desire to complete multiple actions before they are detected.

Detecting fraud and abuse relies on a range of techniques. Anti-fraud people generally keep details of their methods secret, but we know that they use both automated and manual techniques.

Automated systems generally use machine learning that is trained on the details of past attacks. This scales really well and allows for repeat attacks to be detected quickly and efficiently.
Human experts can be better at recognizing new forms of attack. Attacks that are detected by automated systems can be confirmed by humans before deploying interventions.

Of course, attackers are also constantly trying to adapt their techniques to evade detection. Detecting an attack can take time.

Identification/classification

It is not enough to know that fraud is occurring. Once recognized, the pattern of fraudulent behaviour needs to be classified, so that future attacks can be recognized.

As noted, most fraud is automated in some way. Even if humans are involved, to operate at any significant scale, even humans will be operating to a script. Whether executed by machines or humans, the script will be designed to evade existing defenses. This means that attacks need to be carefully scripted, which can produce patterns. If a pattern can be found, attempts at fraud can be distinguished from genuine attempts from people to visit the site.

Patterns in abuse manifest in one of two ways:

Common software. If attackers only use a specific piece of hardware or software, then any common characteristics might be revealed by fingerprinting. Even if the attacker varies some characteristics (like the User-Agent header or similar obvious things), other characteristics might stay the same, which can be used to recognize the attack. This is why browser fingerprinting is a valuable tool for managing fraud.
Common practices. Software or scripted interaction can produce fixed patterns of behaviour that can be used to recognize an attempted attack. Clues might exist in the timing of actions or the consistency of interaction patterns. For instance, automated fraud might not exhibit the sorts of variance in mouse movements that a diverse set of people could.

The script that is followed by an attacker might try to vary some of these things. However, unless the attack script is able to simulate the sorts of diversity that real people do – which is unlikely – any resulting common patterns can be used to identify likely attempts at fraud.

Once a pattern is established, future attempts can be recognized. Also, if enough information has been recorded from past interactions, previously undetected fraud might now be identifiable.

Learned patterns can sometimes be used on multiple sites. If an attack is detected and thwarted on one site, similar attacks on other sites might be easier to identify. Fraud and abuse detection services that operate across many sites can therefore be very effective at detecting and mitigating attacks on multiple sites.

Fingerprinting and privacy

Browser makers generally regard browser fingerprinting as an attack on user privacy. The fingerprint of a browser is consistent across sites in ways that are hard to control. Browsers can have unique or nearly-unique fingerprints, which means that people can be effectively identified and tracked using the fingerprint of their browser, against their wishes or expectations.

Fingerprinting used this way undermines controls that browsers use to maintain contextual integrity. Circumventing these controls is unfortunately widespread. Services exist that offer “cookie-less tracking” capabilities, which can including linking cross-site activity using browser fingerprinting or “primary identifiers”^[4].

Fingerprinting options in browsers continue to evolve in two directions:

New browser features, especially those with personalization or hardware interactions, can expand the ways in which browsers might become more identifiable through fingerprinting.
Browser privacy engineers are constantly reducing the ways in which browsers can be fingerprinted.

Though these efforts often pull in different directions, the general trend is toward reduced effectiveness of fingerprinting. Browsers are gradually becoming more homogenous in their observable behaviour despite the introduction of new capabilities. New features that might be used for fingerprinting tend not to be accessible without active user intervention, making them far less reliable as a means of identification. Existing rich sources of fingerprinting information – like plugin or font enumeration – will eventually be far more limited.

Reductions in the effectiveness of fingerprinting are unlikely to ever result in every browser looking identical. More homogenous browser fingerprints makes the set of people who share a fingerprint larger. In turn, this only reduces the odds that a site can successfully reidentify someone using a fingerprint.

Reduced effectiveness of fingerprinting might limit the ability of sites in distinguishing between real and abusive activity. This places stronger reliance on other signals, like behavioural cues. It might also mean that additional checks are needed to discriminate between suspicious and wanted activity, though this comes with its own hazards.

Even when fingerprinting is less useful, fingerprints can still help in managing fraud. Though many users might share the same fingerprint, additional scrutiny can be reserved for those browsers that share a fingerprint with the attacker.

Mitigation strategies

Once a particular instance of fraud is detected and a pattern has been established, it becomes possible to mitigate the effects of the attack. This can involve some difficult choices.

With the difficulty in detecting fraud, sites often tolerate extensive fraud before they are able to start implementing mitigation. Classification takes time and can be error prone. Furthermore, sites don’t want to annoy their customers by falsely accusing them of fraud.

Stringing attackers along

Tolerance of apparent abuse can have other positive effects. A change in how a site reacts to attempted abuse might tip an attacker off that their method is no longer viable. To that end, a site might allow abuse to continue, without any obvious reaction^[5].

A site that reacts to fraud in obvious ways will also reveal when fraud has escaped detection. This can be worse, as it allows an attacker to learn when their attack was successful. Tolerating fraud attempts deprives the attacker of immediate feedback.

Delaying the obvious effects of mitigation allows abuse detection to remain effective for longer. Similarly, providing feedback about abuse in the aggregate might prevent an attacker from learning when specific tactics were successful. Attackers that receive less feedback or late feedback cannot adapt as quickly and so are able to evade detection for a smaller proportion of the overall time.

Addressing past abuse

A delayed response depends on being able to somehow negate or mitigate the effect of fraud from the past. This is also helpful where instances of fraud or abuse previously escaped detection.

For something like click fraud, the effect of fraud is often payment, which is not immediate. The cost of fraud can be effectively managed if it can be detected before payment comes due. The advertiser can refuse to pay for fraudulent ad placements and disqualify any conversions that are attributed to them. The same applies to credit card fraud, where settlement of payments can be delayed to allow time for fraudulent patterns to be detected.

It is not always possible to retroactively mitigate fraud or delay its effect. Sites can instead require additional checks or delays. These might not deprive an attacker of feedback on whether their evasive methods were successful, but changes in response could thwart or slow attacks.

Security by obscurity

As someone who works in other areas of security, this overall approach to managing fraud seems very … brittle.

Kerckhoffs’s principle – which guides the design of most security systems – says that you design systems that depend only on protecting the key and not keeping the details of how a system is built secret. A system design that is public knowledge can be analysed and improved upon by many. Keeping the details of the system secret, known as security by obscurity, is considered bad form and usually considered indicative of a weak system design.

Here, security assurances rely very much on security by obscurity. Detecting fraud depends on spotting patterns, then building ways of recognizing those patterns. An attacker that can avoid detection might be able to conduct fraud with impunity. That is, the system of defense relies on techniques so fragile that knowledge of their details would render them ineffectual.

Is there hope for new tools?

There are some technologies that offer some hope of helping manage fraud and abuse risk. However, my expectation is that these will only support existing methods.

Any improvements these might provide is unlikely to result in changes in behaviour. Anything that helps attackers avoid detection will be exploited to the maximum extent possible; anything that helps defenders detect fraud or abuse will just be used to supplement existing information sources.

Privacy Pass

Privacy Pass, offers a way for sites to exchange information about the trustworthiness of their visitors. If one site decides that someone is trustworthy, it can give the browser an anonymous token. Other sites can be told that someone is trustworthy by passing them this token.

Ostensibly, Privacy Pass tokens cannot carry information, only the presence (or absence) of a token carries any information. A browser might be told that the token means “trustworthy”, but it could mean anything^[6]. That means that the token issuer needs to be trusted.

How a site determines whether to provide a token also has consequences. Take Apple’s Private Access Tokens, which are supposed to mean that the browser is trustworthy, but they really carry a cryptographically-backed assertion that the holder has an Apple device. For sites looking to find a lucrative advertising audience, this provides a strong indicator that a visitor is rich enough to be able to afford Apple hardware. That is bankable information.

This is an example of how the method used to decide whether to provide a token can leak. In order to protect this information, a decent proportion of tokens need to use alternative methods.

We also need to ensure that sites do not become overly reliant on tokens. Otherwise, people who are unable to produce a token could find themselves unable to access services. People routinely fail to convince computers of their status as a human for many reasons^[7]. Clients might be able to withhold some proportion of tokens so that sites might learn not to become dependent on them.

If these shortcomings are addressed somehow, it is possible that Privacy Pass could help sites detect or identify fraud or abuse. However, implementing the safeguards necessary to protect privacy and equitable access is not easy. It might not even be worth it.

Questionable options

Google have proposed an extension to Privacy Pass that carries secret information. The goal here is to allow sites to rely on an assessment of trust that is made by another site, but not reveal the decision to the client. All clients would be expected to retrieve a token and proffer one in order to access the service. Suspicious clients would be given a token that secretly identifies them as such.

This would avoid revealing to clients that they have been identified as potentially fraudulent, but it comes with two problems:

Any determination would only be based on information available to the site that provides the token. The marking would less reliable as a result and based only on the client identity or browser fingerprint^[8]. Consequently, any such marking would not be directly usable and it need to be combined with other indicators, like how the client behaves.
Clients that might be secretly classified as dishonest have far less incentive to carry a token that might label them as such.

The secret bit also carries information, which – again – could mean anything. Anything like this would need safeguards against privacy abuse by token providers.

Google have also proposed Web Environment Integrity, which seeks to suppress diversity of client software. Eric Rescorla has a good explanation of how this sort of approach is problematic. Without proper safeguards, the same concerns apply to Apple’s Private Access Tokens.

The key insight for me is that all of these technologies risk placing restrictions on how people access the Web. Some more than others. But openness is worth protecting, even if it does make some things harder. Fraud and abuse management are in some ways a product of that openness, but so is user empowerment, equity of access, and privacy.

Summary

It seems unlikely that anything is going to change. Those who want to commit fraud will continue to try to evade detection and those who are trying to stop them will try increasingly invasive methods, including fingerprinting.

Fraud and abuse are something that many sites contend with. There are no easy or assured methods for managing fraud or abuse risk. Defenders look for patterns, both in client characteristics and their behaviour. Fingerprinting browsers this way can have poor privacy consquences. Concealing how attacks are classified is the only way to ensure that attackers do not adapt their methods to avoid protections. New methods for classification might help, but they create new challenges that will need to be managed.

Fraud is here to stay. Fingerprinting too. I wish that I had a better story to tell, but this is one of the prices we pay for an open Web.

I’m not comfortable using the more widely used “anti-fraud” term here. It sounds too definite, as if to imply that fraud can be prevented perfectly. Fraud and abuse can be managed, but not so absolutely. ↩︎
This story has been widely misreported, see (Schneier, The Register, and Slashdot). These articles cite a recent study from UC Irvine, which cites a study from 2014 that applies to a largely defunct CAPTCHA method. CAPTCHA fans might hold out some hope, though maybe the rest of us would be happy to never see another inane test. ↩︎
There is a whole industry around the scalping of limited run sneakers, to the point that there are specialist cloud services that boast extra low latency access to the sites for major sneaker vendors. ↩︎
Think email addresses or phone numbers. These sites like to pretend that these practices are privacy respecting, but collecting primary identifiers often involves deceptive practices. For example, making access to a service conditional on providing a phone number. ↩︎
It is widely believed that, during the second World War, that the British chose not to act on intelligence gained from their breaking of Enigma codes. No doubt the Admiralty did exercise discretion in how it used the information it gained, but the famous case of the bombing of Coventry in November 1940 was not one of these instances. ↩︎
It could be bad if tokens had something to say about the colour of a person’s skin or their gender identity. There are more bad uses than good ones for these tokens. ↩︎
Finally, a good reason to cite the study mentioned previously. ↩︎
A fingerprint could be re-evaluated on the other site without using a token, so that isn’t much help. ↩︎

Entropy and Privacy Analysis

2022-05-27T00:00:00Z

Aggregation is a powerful tool when it comes to providing privacy for users. But analysis that relies on aggregate statistics for privacy loss hides some of the worst effects of designs.

Background

A lot of my time recently has been spent looking at various proposals for improving online advertising. A lot of this work is centred on the Private Advertising Technology Community Group in the W3C where the goal is to find designs that improve advertising while maintaining strong technical protections for privacy.

Part of deciding whether a design does in fact provide strong privacy protections requires understanding firstly what that means. That is a large topic on which the conversation is continuing. In this post, my goal is to look at some aspects of how we might critically evaluate the privacy characteristics of proposals.

Limitations of Differential Privacy

A number of designs have been proposed in this space with supporting analysis that is based on differential privacy. Providing differential privacy involves adding noise to measurements using a tunable parameter (usually called $\varepsilon$) that hides individual contributions under a random distribution.

I’m a big fan of differential privacy, but while differential privacy provides a good basis for understanding the impact of a proposal, it is recognized that there is a need to continuously release information in order to maintain basic utility in a long-running system.

Continuous release of data potentially leads to the protections offered by differential privacy noise being less effective over time. It is prudent therefore to understand the operation of the system without the protection afforded by noise. This is particularly relevant where the noise uses a large $\varepsilon$ value or is applied to unaggregated outputs, where it can be easier to cancel the effect of noise by looking at multiple output values.

Information exposure is often expressed using information theoretic statistics like entropy. This note explores how entropy — or any single statistic — is a poor basis for privacy analysis and suggests options for more rigorous analysis.

Information Theory and Privacy

Some analysis of Web privacy features often looks at the number of bits of information that a system releases to an adversary. Analyses of this type use the distribution of probabilities of all events as a way of estimating the amount of information that might be provided by a specific event.

In information theory, each event provides information or surprisal, defined by a relationship with the probability of the event:

$$I(x)=-\log_2(P(x))$$

The reason we might use information is that if a feature releases too much information, then people might be individually identified. They might no longer be anonymous. Their activities might be linked to them specifically. The information can be used to form a profile based on their actions or further joined to their identity or identities.

Generally, we consider it a problem when information enables identification of individuals. We might express concern if:

$$2^{I(x)} \ge \text{size of population}$$

Because surprisal is about specific events, it can be a little unwieldy. Surprisal is not useful for reaching a holistic understanding of the system. A statistic that summarizes all potential outcomes is more useful in gaining insight into how the system operates as a whole. A common statistic used in this context is entropy, which provides a mean or expected surprisal across a sampled population:

$$H(X)=\sum_{x\in X}P(x)I(x)=-\sum_{x\in X}P(x)\log_2(P(x))=-\frac{1}{N}\sum_{i=1}^N\log_2(P(x_i))$$

Entropy has a number of applications. For instance, it can be used to determine an optimal encoding of the information from many events, using entropy coding (such as Huffman or Arithmetic coding).

Using Entropy in Privacy Analysis

The use of specific statistics in privacy analysis is useful to the extent that they provide an understanding of the overall shape of the system. However, simple statistics tend to lose information about exceptional circumstances.

Entropy has real trouble with rare events. Low probability events have high surprisal, but as entropy scales their contribution by their probability, they contribute less to the total entropy than higher probability events.

In general, revealing more information is undesirable from a privacy perspective. Toward that end, it might seem obvious that minimizing entropy is desirable. However, this can be shown to be counterproductive for individual privacy, even if a single statistic is improved.

An example might help prime intuition. A cohort of 100 people is arbitrarily allocated into two groups. If people are evenly distributed into groups of 50, revealing the group that a person has been allocated provides just a single bit of information, that is, surprisal is 1 bit. The total entropy of the system is 1 bit.

An asymmetric allocation can produce a different result. If 99 people are allocated to one group and a single person to the other, revealing that someone is in the first group provides almost no information at 0.0145 bits. On the contrary, revealing the allocation for the lone person in the second group — which uniquely identifies that person — produces a much larger surprisal of 6.64 bits. Though this is clearly a privacy problem for that person, their privacy loss is not reflected in the total entropy of the system, which at 0.0808 bits is close to zero.

Entropy tells us that the average information revealed for all users is very small. That conclusion about the aggregate is reflected in the entropy statistic, but it hides the disproportionately large impact on the single user who loses the most.

The more asymmetric the information contributed by individuals, the lower the entropy of the overall system.

Limiting analysis to simple statistics, and entropy in particular, can hide privacy problems. Somewhat counterintuitively, the adverse consequences of a design are felt more by a minority of participants for systems with lower entropy.

This is not a revelatory insight. It is well known that a single metric is often a poor means of understanding data.

Entropy can provides a misleading intuitive understanding of privacy as it relates to the experience of individual users.

Recommendations

Information entropy remains useful as a means of understanding the overall utility of the information that a system provides. Understanding key statistics as part of a design is valuable. However, for entropy measures in particular, this is only useful from a perspective that seeks to reduce overall utility; entropy provides almost no information about the experience of individuals.

Understand the Surprisal Distribution

Examining only the mean surprisal offers very little insight into a system. Statistical analysis rarely considers a mean value in isolation. Most statistical treatment takes the shape of the underlying distribution into account.

For privacy analysis, understanding the distribution of surprisal values is useful. Even just looking at percentiles might offer greater insight into the nature of the privacy loss for those who are most adversely affected.

Shortcomings of entropy are shared by related statistics, like Kullback–Leibler divergence or mutual information, which estimate information gain relative to a known distribution. Considering percentiles and other statistics can improve understanding.

Knowing the distribution of surprisal admits the combination of privacy loss metrics. As privacy is affected by multiple concurrent efforts to change the way people use the Web, the interaction of features can be hard to understand. Richer expressions of the effect of changes might allow for joint analysis to be performed. Though it requires assumptions about the extent to which different surprisal distributions might be correlated, analyses of surprisal that assume either complete independence or perfect correlation could provide insights into the potential extent of privacy loss from combining features.

For example, it might be useful to consider the interaction of a proposal with extant browser fingerprinting. The users who reveal the most information using the proposal might not be the same users who reveal the most fingerprinting information. Analysis could show that there are no problems or it might help guide further research that would provide solutions.

More relevant to privacy might be understanding the proportion of individuals that are potentially identifiable using a system. A common privacy goal is to maintain a minimum size of anonymity set. It might be possible to apply knowledge of a surprisal distribution to estimating the size of a population where the anonymity set becomes too small for some users. This information might then guide the creation of safeguards.

Consider the Worst Case

A worst-case analysis is worth considering from the perspective of understanding how the system treats the privacy of all those who might be affected. That is, consider the implications for users on the tail of any distribution. Small user populations will effectively guarantee that any result is drawn from the tail of a larger distribution.

Concentrating on cases where information might be attributed to individuals might miss privacy problems that might arise from people being identifiable in small groups. Understand how likely smaller groups might be affected.

The potential for targeting of individuals or small groups might justify disqualification of — or at least adjustments to — a proposal. The Web is for everyone, not just most people.

Bundling for the Web

2021-02-26T00:00:00Z

The idea of bundling is deceptively simple. Take a bunch of stuff and glom them into a single package. So why is it so difficult to teach the web how to bundle?

The Web already does bundling

A bundled resource is a resource the composes multiple pieces of content. Bundles can consist of content only of a single type or mixed types.

Take something like JavaScript^[1]. A very large proportion of the JavaScript content on the web is bundled today. If you haven’t bundled, minified, and compressed your JavaScript, you have left easy performance wins unrealized.

HTML is a bundling format in its own right, with inline JavaScript and CSS. Bundling other content is also possible with data: URIs, even if this has some drawbacks.

Then there are CSS preprocessors, which provide bundling options, image spriting, and myriad other hacks.

And that leaves aside the whole mess of zipfiles, tarballs, and self-extracting executables that are used for a variety of Web-adjacent purposes. Those matter too, but they are generally not Web-visible.

Why we might want bundles

What is immediately clear from this brief review of available Web bundling options is that they are all terrible in varying degrees. The reasons are varied and a close examination of the reasons for this is probably not worthwhile.

It might be best just to view this as the legacy of a system that evolved in piecemeal fashion; an evolutionary artifact along a dimension that nature did not regard as critical to success.

I’m more interested in the balance of different pressures, both for and against bundling. There are good reasons in support of bundling, and quite a few reasons to be cautious, but it looks like the time has come to consider bundling seriously.

I doubt that introducing native support for bundling technology will fundamentally change the way Web content is delivered. I see it more as an opportunity to expand the toolkit to allow for more use cases and more flexible deployment options.

In researching this, I was reminded of work that Jonas Sicking did to identify use cases. There are lots of reasons and requirements that are worth looking at. Some of the reasoning is dated, but there is a lot of relevant material, even five years on.

Efficiency

One set of touted advantages for bundling relate to performance and efficiency. Today, we have a better understanding of the ways in which performance is affected by resource composition, so this has been narrowed down to two primary features: compression efficiency and reduced overheads.

I also want to address another reason that is often cited: providing content that a client will need, but doesn’t yet know about.

Shared compression

Compression efficiency can be dramatically improved if similar resources are bundled together. This is because the larger shared context results in more repetition and gives a compressor more opportunities to find and exploit similarities.

Bundling is not the only way to achieve this. Alternative methods of attaining compression gains have been explored, such as SDCH and cross-stream compression contexts for HTTP/2. Prototypes of the latter showed immense improvements in compression efficiency and corresponding performance gains.

General solutions like these have not been successful in find ways to manage operational security concerns. The hope with bundles is that a bundling can occur as a build process. As the build occurs before deploying content to servers, no sensitive or user-specific data will be involved. This is somewhat at odds with some of the dynamic features involved, but that sort of separation could be an effective strategy for managing this security risk.

Reduced overheads

Bundling could also reduce overheads. While HTTP/2 and HTTP/3 reduce the cost of making requests, those costs still compound when multiple resources are involved. The claim here is that internal handling of individual requests in browsers has inefficiencies that are hard to eliminate without some form of bundling.

I find it curious that protocol-level inefficiencies are not blamed here, but rather inter-process communication between internal browser processes. Not having examined this closely^[2], I can’t really speak to these claims.

What I do know is that performance in this space is subtle. When we were building HTTP/2, we found that performance was highly sensitive to the number of requests that could be made by clients in the first few round trips of a connection. The way that networking protocols work means that there is very limited space for sending anything early in a connection^[3]. The main motivation for HTTP header compression was that it allowed significantly more requests to be made early in a connection. By reducing request counts, bundling might do the same.

Eliminating round trips

One of the other potential benefits of bundling is in eliminating additional round trips. For content that is requested, a bundle might provide resources that a client does not know that it needs yet. Without bundling, a resource that references another resource adds an additional round trip as the first resource needs to be fetched before the second one is even known to the client.

Again, experience with HTTP/2 suggests that performance gains from sending extra resources are not easy to obtain. This is exactly what HTTP/2 server push promised to provide. However, as we have learned with server push, the wins here are not easy to realize. A number of attempts to improve performance with server push often resulted in mixed results and sometimes large regressions in performance^[4]. The problem is that servers are unable to accurately predict when to push content and so they push data that is not needed. To date, no studies have shown that there are reliable strategies that servers can use to reliably improve performance with server push.

Realizing performance improvements

For bundles to realize performance gains from eliminating round trips, the compression gains would need to be enough counteract any potential waste. This is more challenging if bundles are built statically.

I personally remain lukewarm on using bundling as a performance tool. Shortcomings in protocols – or implementations – seem like they could be addressed at that level.

Ergonomics

The use of bundlers is an established practice in Web development. Being able to outsource some of the responsibility for managing the complexities of content delivery is no doubt part of the appeal.

Being able to compose complex content into a single package should not be underestimated.

Bundling of content into a single file is a property common to many systems. Providing a single item to manage with a single identity simplifies interactions. This is how most people expect content of all kinds to be delivered, whether it is applications, books, libraries, or any other sort of digital artifact. The Web here is something of an abberation in that it resists the idea that parts of it can be roped off into a discrete unit with a finite size and name.

Though this usage pattern might be partly attributed to path dependence, the usability benefits of individual files cannot be so readily dismissed. Being able to manage bundles as a single unit where necessary, but identify the component pieces is likely to be a fairly large gain for developers.

For me, this reason might be enough to justify using bundles, even over some of their drawbacks.

Why we might not want bundles

The act of bundling subsumes the identity of each piece of bundled content with the identity of the bundle that is formed. This produces a number of effects, some of them desirable (as discussed), some of them less so.

As far as effects go, whether they are valuable or harmful might depend on context and perspective. Some of these effects might simply be managed as trade-offs, with site or server developers being able to choose how content is composed in order to balance various factors like total bytes transferred or latency.

If bundling only represented trade-offs that affected the operation of servers, then we might be able to resolve whether the feature is worth pursuing on the grounds of simple cost-benefit. Where things get more interesting is where choices might involve depriving others of their own choices. Balancing the needs of clients and servers is occasionally necessary. Determining the effect of server choices on clients – and the people they might act for – is therefore an important part of any analysis we might perform.

Cache efficiency and bundle composition

Content construction and serving infrastructure generally operates with imperfect knowledge of the state of caches. Not knowing what a client might need can make it hard to know what content to serve at any given point in time.

Optimizing the composition of the bundles used on a site for clients with a variety of cache states can be particularly challenging if caches operate at the granularity of resources. Clients that have no prior state might benefit from maximal bundling, which allows better realization of the aforementioned efficiency gains.

On the other hand, clients that have previously received an older version of the same content might only need to receive updates for those things that have changed. Similarly, clients that have previously received content for other pages that includes some of the same content. In both cases, receiving copies of content that was already transferred might negate any efficiency gains.

This is a problem that JavaScript bundlers have to deal with today. As an optimization problem it is made difficult by the combination of poor information about client state with the complexity of code dependency graphs and the potential for clients to follow different paths through sites.

For example, consider the code that is used on an article page on a hypothetical news site and the code used on the home page of the same site. Some of that code will be common, if we make the assumption that site developers use common tools. Bundlers might deal with this by making three bundles: one of common code, plus one each of article and home page code. For a very simple site like this, that allows all the code to be delivered in just two bundles on either type of page, plus an extra bundle when navigating from an article to the home page or vice versa.

As the number of different types of page increases, splitting code into multiple bundles breaks down. The number of bundle permutations can increase much faster than the number of discrete uses. In the extreme, the number of bundles could end up being factorial on the number of types of page, limited only by the number of resources that might be bundled. Of course, well before that point is reached, the complexity cost of bundling likely exceeds any benefits it might provide.

To deal with this, bundlers have a bunch of heuristics that balance the costs of providing too much data in a bundle for a particular purpose, against the costs of potentially providing bundled data that is already present. Some sites take this a little further and use service workers to enhance browser caching logic^[5].

It is at this point that you might recognize an opportunity. If clients understood the structure of bundles, then maybe they could do something to avoid fetching redundant data. Maybe providing a way to selectively request pieces of bundles could reduce the cost of fetching bundles when parts of the bundle are already present. That would allow the bundlers to skew their heuristics more toward putting stuff in bundles. It might even be possible to tune first-time queries this way.

The thing is, we’ve already tried that.

A standard for inefficient caching

There is a long history in HTTP of failed innovation when it comes to standardizing improvements for cache efficiency. Though cache invalidation is recognized as one of the hard problems in computer science, there are quite a few examples of successful deployments of proprietary solutions in server and CDN infrastructure.

A few caching innovations have made it into HTTP over time, such as the recent immutable Cache-Control directive. That particular solution is quite relevant in this context due to the way that it supports content-based URI construction, but it is still narrower in applicability than a good solution in this space might need.

If we view bundling as a process that happens as part of site construction, bundles might be treated as opaque blobs by servers. Servers that aren’t aware of bundle structure are likely to end up sending more bits than the client needs. To avoid this, servers and clients both need to be aware of the contents of bundles.

Cache digests

Once both the client and server are aware of individual resources within bundles, this problem starts to look very much like server push.

Previous attempts to solve the problem of knowing what to push aimed to improve the information available to servers. Cache digests is the most notable attempt here. It got several revisions into the IETF working group process. It still failed.

If the goal of failing is to learn, then this too was a failure largely for the most ignomonious of reasons: no deployment. Claims from clients that cache digests are too expensive to implement seem reasonable, but not entirely satisfactory in light of the change to use Cuckoo filters in later versions. More so with recent storage partitioning work.

The point of this little digression is to highlight the inherent difficulties in trying to fix this problem by layering in enhancements to the caching model. More so when that requires replicating the infrastructure we have for individual resources at the level of bundled content.

My view is that it would be unwise to attempt to tackle a problem like this as part of trying to introduce a new feature. If the success of bundling depends on finding a solution to this problem, then I would be surprised, but it might suggest that the marginal benefit of bundling – at least for performance – is not sufficient to justify the effort^[6].

Prioritization is harder

Mark Nottingham reminded me that even if servers and clients are modified so that they are aware of individual resources, there are still limitations. Bundles might contain resources with different priorities. It might be impossible to avoid performance regressions.

It is certainly possible to invent a new system for ensuring that bundles are properly prioritized, but that requires good knowledge of relative priority at the time that bundles are constructed.

Putting important stuff first is likely a good strategy, but that has drawbacks too. Servers need to know where to apply priority changes when serving bundles or the low-priority pieces will be served at the same priority as high-priority pieces. The relative priority of resources will need to be known at bundling time. Bundling content that might change in priority in response to client signals might result in priority inversions and performance regressions.

Just like with caching, addressing prioritization shortcomings could require replicating a lot of the machinery we have for prioritizing individual resources within bundles.

Erasing resource identity

An issue that was first^[7] raised by Brave is that the use of bundles creates opportunities for sites to obfuscate the identity of resources. The thesis being that bundling could confound content blocking techniques as it would make rewriting of identifiers easier.

For those who rely on the identity of resources to understand the semantics and intent of the identified resource, there are some ways in which bundling might affect their decision-making. The primary concern is that references between resources in the same bundle are fundamentally more malleable than other references. As the reference and reference target are in the same place, it is trivial – at least in theory – to change the identifier.

Brave and several others are therefore concerned that bundling will make it easier to prevent URI-based classification of resources. In the extreme, identifiers could be rewritten for every request, negating any attempt to use those identifiers for classification.

One of the most interesting properties of the Web is the way that it insinuates a browser – and user agency – into the process. The way that happens is that the the Web^[8] is structurally biased toward functioning better when sites expose semantic information to browsers. This property, sometimes called semantic transparency, is what allows browsers to be opinionated about content rather than acting as a dumb pipe^[9].

Yes, it’s about ad blockers

Just so that this is clear, this is mostly about blocking advertising.

While more advanced ad blocking techniques also draw on contextual clues about resources, those methods are more costly. Most ad blocking decisions are made based on the URI of resources. Using the resource identity allows the ad blocker to prevent the load, which not only means that the ad is not displayed, but the resources needed to retrieve it are not spent^[10].

While many people might choose to block ads, sites don’t like being denied the revenue that advertising provides. Some sites use techniques that are designed to show advertising to users of ad blockers, so it is not unreasonable to expect tools to be used to prevent classification.

It is important to note that this is not a situation that requires an absolute certainty. The sorry state of Web privacy means that we have a lot of places where various forces are in tension or transition. The point of Brave’s complaint here is not that bundling outright prevents the sort of classification they seek, but that it changes the balance of system dynamics by giving sites another tool that they might employ to avoid classification.

Of course, when it is a question of degree, we need to discuss and agree how much the introduction of such a tool affects the existing system. That’s where this gets hard.

Coordination artifacts

As much as these concerns are serious, I tend to think that Jeffrey Yasskin’s analysis of the problem is broadly correct. That analysis essentially concludes that the reason we have URIs is to facilitate coordination between different entities. As long as there is a need to coordinate between the different entities that provide the resources that might be composed into a web page, that coordination will expose information that can be used for classification.

That is, to the extent to which bundles enable obfuscation of identifiers, that obfuscation needs to be coordinated. Any coordination that would enable obfuscation with bundling is equally effective and easy to apply without bundling.

Single-page coordination

Take a single Web page. Pretend for a moment that the web page exists in a vacuum, with no relationship to other pages at all. You could take all the resources that comprise that page and form them into a single bundle. As all resources are in the one place, it would be trivial to rewrite the references between those resources. Or, the identity of resources could be erased entirely by inlining everything. If every request for that page produced a bundle with a different set of resource identifiers, it would be impossible to infer anything about the contents of resources based on their identity alone.

Unitary bundles for evey page is an extreme that is almost certainly impractical. If sites were delivered this way, there would be no caching, which means no reuse of common components. Using the Web would be terribly slow.

Providing strong incentive to deploy pages as discrete bundles – something Google Search has done to enable preloading search results for cooperating sites – could effectively force sites to bundle in this way. Erasing or obfuscating internal links in these bundles does seem natural at this point, if only to try to reclaim some of the lost performance, but that assumes an unnatural pressure toward bundling^[11].

Absent perverse incentives, sites are often built from components developed by multiple groups, even if that is just different teams working at the same company. To the extent that teams operate independently, they need to agree on how they interface. The closer the teams work together, and the more tightly they are able to coordinate, the more flexible those interfaces can be.

There are several natural interface points on the Web. Of these the URI remains a key interface point^[12]. A simple string that provides a handle for a whole bundle^[13] of collected concepts is a powerful abstraction.

Cross-site coordination

Interfaces between components therefore often use URIs, especially once cross-origin content is involved. For widely-used components that enable communication between sites, URIs are almost always involved. If you want to use React, the primary interface is a URI:

<script src="https://unpkg.com/react@17/umd/react.production.min.js" crossorigin></script>
<script src="https://unpkg.com/react-dom@17/umd/react-dom.production.min.js" crossorigin></script>

If you want add Google analytics, there is a bit of JavaScript^[14] as well, but the URI is still key:

<script async src="https://www.googletagmanager.com/gtag/js?id=$XXX"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', '$XXX');
</script>

The same applies to advertising.

The scale of coordination required to change these URIs is such that changes cannot be effected on a per-request basis, they need months, if not years^[15].

Even for resources on the same site, a version of the same coordination problem exists. Content that might be used by multiple pages will be requested at different times. At a minimum, changing the identity of resources would mean forgoing any reuse of cached resources. Caching provides such a large performance advantage that I can’t imagine sites giving that up.

Even if caching were not incentive enough, I suggest that the benefits of reference stability are enough to ensure that identifiers don’t change arbitrarily.

Loose coupling

As long as loose coupling is a feature of Web development, the way that resources are identified will remain a key part of how the interfaces between components is managed. Those identifiers will therefore tend to be stable. That stability will allow the semantics of those resources to be learned.

Bundles do not change these dynamics in any meaningful way, except to the extent that they might enable better atomicity. That is, it becomes easier to coordinate changes to references and content if the content is distributed in a single indivisible unit. That’s not nothing, but – as the case of selective fetches and cache optimization highlights – content from bundles need to be reused in a different context, so the application of indivisible units is severely limited.

Of course, there are ways of enabling coordination that might allow for constructing identifiers that are less semantically meaningful. To draw on the earlier point about the Web already having bundling options, advertising code could be inlined with other JavaScript or in HTML, rather than having it load directly from the advertiser^[16]. In the extreme, servers could rewrite all content and encrypt all URIs with a per-user key. None of this depends on the deployment of new Web bundling technology, but it does require close coordination.

All or nothing bundles

Even if it were possible to identify unwanted content, opponents of bundling point out that placing that content in the same bundle as critical resources makes it difficult to avoid loading the unwanted content. Some of the performance gains from content blockers are the result of not fetching content^[17]. Bundling unwanted content might eliminate the cost and performance benefits of content blocking.

This is another important criticism that ties in with the early concerns regarding bundle composition and reuse. And, similar to previous problems, the concern is not that this sort of bundling is enabled as a result of native, generic bundling capabilities, but more that it becomes more readily accessible as a result.

This problem, more so than the caching one, might motivate designs for selective acquisition of bundled content.

Existing techniques for selective content fetching, like HTTP range requests, don’t reliably work here as compression can render byte ranges useless. That leads to inventing new systems for selective acquistion of bundles. Selective removal of content from compressed bundles does seem to be possible at some levels, but this leads to a complex system and the effects on other protocol participants is non-trivial.

At some level, clients might want to say “just send me all the code, without the advertising”, but that might not work so well. Asking for bundle manifests so that content might be selectively fetched adds an additional round trip. Moving bundle manifests out of the bundles and into content^[18] gives clients the information they need to be selective about which resources they want, but it requires moving information about the composition of resources into the content that references it. That too requires coordination.

For caches, this can add an extra burden. Using the Vary HTTP header field would be necessary to ensure that caches would not break when content from bundles is fetched selectively^[19]. But it takes full awareness of these requests and how they are applied for a cache to not be exposed to a combinatorial explosion of different bundles as a result. Without updating caches to understand selectors, the effect is that caches end up bearing the load for the myriad permutations of bundles that might be needed.

Supplanting resource identity

A final concern is the ability – at least in active proposals – for bundled content to be identified with URIs from the same origin as the bundle itself. For example, a bundle at https://example.com/foo/bundle might contain content that is identified as https://example.com/foo/script.js. This is a long-standing concern that applies to many previous attempts at bundling or packaging.

This ability is constrained, but the intent is to have content in a bundle act as a valid substitute for other resources. The reason being that you need a fallback for those cases where bundles aren’t optimal or aren’t available. This has implications for anyone deploying a server, who now need to ensure that bundles aren’t hosted adjacent to content that might not want interference from the bundle.

At this point, I will note that replacing the content of other resources is also the point of signed exchanges. The difference is that in signed exchanges, the replacement extends to other origins. The constraints on what can be replaced and how are important details, but the goal is the same: signed exchanges allow a bundle to speak for other resources.

As already noted, this sort of thing is already possible with service workers. Service workers take what it means to subvert the identity of resources to the next level. A request that is handled by a service worker can be turned into any other request or even multiple requests. Service workers are limited though. A site can opt to perform whatever substitutions it likes, but it can only do that for its own requests. Bundles propose something that might be enabled for any server, even inadvertently.

One proposal says that all supplanted resources must be identical to the resources they supplant. The theory there is that clients could fetch the resource from within a bundle or directly and expect the same result. It goes on to suggest that a mismatch between these fetches might be cause for a client to stop using the bundle. However, it is perfectly normal in HTTP for the same resource to return different content when fetched multiple times, even when the fetch is made by the same client or at the same time. So it is hard to imagine how a client would treat inconsistency as anything other than normal. If bundling provides advantages, giving up on using bundles for that reason could make bundles completely unreliable.

One good reason for enabling equivalence of bundled and unbundled resources is to provide a graceful fallback in the case that bundling is not supported by a client. Attempting to ensure that the internal identifiers in bundles are “real” and that the fallback does not change behaviour is not going to work.

Indirection for identifiers

Addressing the problem of one resource speaking unilaterally for another resource requires a little creativity. Here the solution is hinted at with both service workers and JavaScript import maps. Both allow the entity making a reference to rewrite that reference before the browser acts on it.

Import maps are especially instructive here as it makes it clear that the mapping from the import specifier to a URI is not the URI resolution function in RFC 3986 or the URL parsing algorithm in Fetch; import specifiers are explicitly not URIs, relative or otherwise.

This as an opportunity to add indirection, either the limited form provided in import maps where one string is mapped to another, or the Turing-complete version that service workers enable.

That is, we allow those places that reference resources to provide the browser with a set of rules that change howidentifiers they use are translated into URIs. This is something that HTML has had forever, with the <base> element. This is also the fundamental concept behind the fetch maps proposal, which looks like this^[20]:

<script type="fetchmap">
{
  "urls": {
    "/styles.css": "/styles.a74fs3.css",
    "/bg.png": "/bg.8e3ac4.png"
  }
}
</script>
<link rel="stylesheet" href="/styles.css">

In this example, when the browser is asked to fetch /styles.css, it knows to fetch /styles.a74fs3.css instead.

The beauty of this approach is that the change only exists where the reference is made. The canonical identity of the resource is the same for everyone (it is always https://example.com/styles.a74fs3.css), only the way that reference is expressed changes.

In other words, the common property between these designs – service workers, <base>, import maps, or fetch maps – is that the indirection only occurs at the explicit request of the thing that makes the reference. A site deliberately chooses to use this facility, and if it does, it controls the substitution of resource identities. There is no lateral replacement of content as all of the logic occurs at the point the reference is made.

Making resource maps work

Of course, fitting this indirections into an existing system requires a few awkward adaptations. But it seems like this particular design could be quite workable.

Anne van Kesteren pointed out that the import: scheme in import maps exists because many of the places where identifiers appear are concretely URIs. APIs assume that they can be manipulated as URIs and violating that expectation would break things that rely on that. If we are going to enable this sort of indirection, then we need to ensure that URIs stay URIs. That doesn’t mean that URIs need to be HTTP, just that they are still URIs.

You might choose to construct identifiers with a new URI scheme in order to satisfy this requirement^[21]:

<a href="scheme-for-mappings:hats">buy hats here</a>

Of course, in the fetch map example given, those identifiers look like and can act like URIs. They can be fetched directly, without translation, if there is no map. That’s probably a useful feature to retain as it means that you can find local files when the reference is found in a local file during development. Using a new scheme won’t have that advantage. A new scheme might be an option, but it doesn’t seem to be a necessary feature of the design.

I can also credit Anne with the idea that we model this indirection as a redirect, something like an HTTP 303 (See Other). The Web is already able to manage redirection for all sorts of resources, so that would not naturally disrupt things too much.

That is not to say that this is easy, as these redirects will need to conform to established standards for the Web, with respect to the origin model and integration with things like Content Security Policy. It will need to be decided how resource maps affect cross-origin content. And many other details will need to be thought about carefully. But again, the design seems at least plausible.

Of note here is that resource maps can be polyfilled with service workers. That suggests we might just have sites build this logic into service workers. That could work, and it might be the basis for initial experiments. A static format is likely superior as it makes the information more readily available.

Alternatives and bundle URIs

Providing indirection is just one piece of enabling use of bundled content. Seamless integration needs two additional pieces.

The first is an agreed method of identifying the contents of bundles. The IETF WPACK working group have had several discussions about this. These discussions were inconclusive, in part because it was difficult to manage conflicting requirements. However, a design grounded in a map-like construct might loosen some of the constraints that disqualified some of the past options that were considered.

In particular, the idea that a bundle might itself have an implicit resource map was not considered. That could enable the use of simple identifiers for references between resources in the same bundle without forcing links in bundled content to be rewritten. And any ugly URI scheme syntax for bundles might then be abstracted away elegantly.

The second major piece to getting this working is a map that provides multiple alternatives. In previous proposals, mappings were strictly one-to-one. A one-to-many map could offer browsers a choice of resources that the referencing entity considers to be equivalent^[22]. The browser is then able to select the option that it prefers. If an alternative references a bundle the browser already has, that would be good cause to use that option.

Presenting multiple options also allows browsers to experiment with different policies with respect to fetching content when bundles are offered. If bundled content tends to perform better on initial visits, then browsers might request bundles then. If bundled content tends to perform poorly when there is some valid, cached content available already, then the browser might request individual resources in that case.

A resource map might be used to enable deployment of new bundling formats, or even new retrieval methods^[23].

Selective acquisition

One advantage of providing an identifier map like this is that it provides a browser with some insight into what bundles contain before fetching them^[24]. Thus, a browser might be able to make a decision about whether a bundle is worth fetching. If most of the content is stuff that the browser does not want, then it might choose to fetch individual resources instead.

Having a reference map might thereby reduce the pressure to design mechanisms for partial bundle fetching and caching. Adding some additional metadata, like hints about resource size, might further allow for better tuning of this logic.

Reference maps could even provide content classification tools more information about resources that they can use. Even in a simple one-to-one mapping, like with an import map, there are two identifiers that might be used to classify content. Even if one of these is nonsense, the other could be useable.

While this requires a bit more sophistication on the part of classifiers, it also provides opportunities for better classification. With alternative sources, even if the identifier for one source does not reveal any useful information, an alternative might.

Now that I’m fully into speculating about possibilities, this opens some interesting options. The care that was taken to ensure that pages don’t break when Google Analytics is blocked could be managed differently. Remember that script:

window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', '$XXX');

As you can see, the primary interface is always defined and the window.dataLayer object is replaced with a dumb array if the script didn’t load. With multiple alternatives, the fallback logic here could be encoded in the map as a data: URI instead:

<element-for-mappings type="text/media-type-for-mappings+json">
{ "scheme-for-mappings:ga": [
  "https://www.googletagmanager.com/gtag/js?id=$XXX",
  "data:text/javascript;charset=utf-8;base64,d2luZG93LmRhdGFMYXllcj1bXTtmdW5jdGlvbiBndGFnKCl7ZGF0YUxheWVyLnB1c2goYXJndW1lbnRzKTt9Z3RhZygnanMnLG5ldyBEYXRlKCkpO2d0YWcoJ2NvbmZpZycsJyRYWFgnKTs="
]}</element-for-mappings>
<script async src="scheme-for-mappings:ga"></script>

In this case, a content blocker that decides to block the HTTPS fetch could allow the data: URI and thereby preserve compatibility. Nothing really changed, except that the fallback script is async too. Of course, this is an unlikely outcome as this is not even remotely backward-compatible, but it does give some hints about some of the possibilities.

Next steps

So that was many more words than I expected to write. The size and complexity of this problem continues to be impressive. No doubt this conversation will continue for some time before we reach some sort of conclusion.

For me, the realization that it is possible to provide finer control over how outgoing references are managed was a big deal. We don’t have to accept a design that allows one resource speaking for others, we just have to allow for control over how references are made. That’s a fairly substantial improvement over most existing proposals and the basis upon which something good might be built.

I still have serious reservations about the caching and performance trade-offs involved with bundling. Attempting to solve this problem with selective fetching of bundle contents seems like far too much complexity. Not only does it require addressing the known-hard problem of cache invalidation, it also requires that we find solutions to problems that have defied solutions on numerous occasions in the past.

That said, I’ve concluded that giving servers the choice in how content is assembled does not result in bad outcomes for others. Unless we include signed exchanges^[25], we are not talking about negative externalities.

If we accept that selective fetching is a difficult problem, supporting bundles might not be all-powerful from the outset. It might only give servers and developers more options. What we learn from trying that out might give us the information that allows us to find good solutions later. Resource maps mean that we can always fall back to fetching resources individually. Resources maps could even be the foundation upon which we build new experiments with alternative resource fetching models.

All that said, the usability advantages provided by bundles seem to be sufficient justification for enabling their support. That applies even if there is uncertainty about performance. That applies even if we don’t initially solve those performance problems. One enormous problem at a time, please.

Have I ever mentioned that I loathe CamelCase names? Thanks 1990s. ↩︎
Yoav Weiss makes this claim based on his experience with Chromium. I respect his experience here, but don’t know what was done to reach this conclusion. I can see there being a lot more investigation and discussion about this point. ↩︎
This is due to the way congestion control algorithms operate. These start out slow in case the network is constrained, but gradually speed up. ↩︎
Eric Rescorla suggested a possible reason that server push regresses performance: pushing only really helps if the transmission channel from server to client has spare capacity. Because HTTP/2 clients can make lots of requests cheaply, it’s entirely possible that the channel is – or will soon be – already full. If pushed resources are less important than resources the client has already requested, even if the client eventually needs those pushed resources, the capacity spent on pushing will delay more important responses. ↩︎
Tantek Çelik pointed out that you can use a service worker to load old content at the same time as checking asynchronously for updates. That’s even better. The fact is, service workers can do just about anything discussed here. That you need to write and maintain a service worker might be enough to discourage all but the bravest of us though. ↩︎
You might reasonably suggest that this sort of thinking tends toward suboptimal local minima. That is a fair criticism, but my rejoinder there might be that conditioning success on a design that reduces to a previously unsolved problem is not really a good strategy either. Besides, accepting suboptimal local minima is part of how we make forward progress without endless second-guessing. ↩︎
I seem to recall this being raised before Pete Snyder opened this issue, perhaps at the ESCAPE workshop, but I can’t put a name to it. ↩︎
In particular, the split between style (CSS) and semantics (HTML). ↩︎
At this point, a footnote seems necessary. Yes, a browser is an intermediary. All previous complaints apply. It would be dishonest to deny the possibility that a browser might abuse its position of privilege. But that is the topic for a much longer posting. ↩︎
This more than makes up for the overheads of the ad blocker in most cases, with page loads being considerably faster on ad-heavy pages. ↩︎
If it isn’t clear, I’m firmly of the opinion that Google’s AMP Cache is not just a bad idea, but an abuse of Google’s market dominance. It also happens to be a gross waste of resources in a lot of cases, as Google pushes content that can be either already present or content for links that won’t ever be followed. Of course, if they guess right and you follow a link, navigation is fast. Whoosh. ↩︎
With increasing amounts of scripts, interfaces might also be expressed at the JavaScript module or function level. ↩︎
Yep. Pun totally intended. ↩︎
Worth noting here is the care Google takes to structure the script to avoid breaking pages when their JavaScript load is blocked by an ad blocker. ↩︎
I wonder how many people are still fetching ga.js from Google. ↩︎
This isn’t a great example, because while it prevents the code from being identified, it’s probably not a very good solution. For starters, the advertiser no longer sees requests that come directly from browsers, which it might use to track people. ↩︎
Note that, at least for ad blocking, the biggest gains come from not executing unwanted content, as executing ad content almost always leads to a chain of additional fetches. Saving the CPU time is the third major component to savings. ↩︎
Yes, that effectively means bundling them with content. ↩︎
Curiously, the Variants design is might not be a good fit here as it provides enumeration of alternatives, which is tricky for the same reason that caching in ignorance of bundling is. ↩︎
There is lots to quibble about in the exact spelling in this example, but I just copied from the proposal directly. ↩︎
It’s tempting here to suggest urn:, but that might cause some heads to explode. ↩︎
The thought occurs that this is something that could be exploited to allow for safe patching of dependencies when combined with semantic versioning. For instance, I will accept any version X.Y.? of this file greater than X.Y.Z. We can leave that idea for another day though. ↩︎
Using IPFS seems far more plausible if you allow it as one option of many with the option for graceful fallback. ↩︎
To what extent providing information ahead of time can be used to improve performance is something that I have often wondered about; it seems like it has some interesting trade-offs that might be worth studying. ↩︎
We’ve already established that signed exchanges are not good for the Web. ↩︎

Standardizing Principles

2021-01-05T00:00:00Z

There is a perennial question in standards development about the value of the different artefacts that the process kicks out.

One subject that remains current is the relative value of specifications against things like compliance testing frameworks. Reasonable people tend to place different weight on tests, with a wide range of attitudes. In the past, more people were willing to reject attempts to invest in any shared test or compliance infrastructure.

In recent years however, it has become very clear that a common test infrastructure is critical to developing a high quality standard. Developing tests in conjunction with the standardization effort has improved the quality of specifications and implementations a great deal.

Recently, I encountered an example where a standards group deliberately chose not to document behaviour, relying exclusively on the common test framework. Understanding what is lost when this

Background

My experience with compliance testing in standards development is patchy. It might help to describe how these have worked out.

Standardize First

Some of the early projects I was involved in relied on testing being entirely privately driven. This can lead to each team relying almost exclusively on tests they develop internally. Occasional pairwise interoperability testing occurs, but it is ad hoc and unreliable.

This loose arrangement does tend to result in specifications being published sooner. The cost is in less scrutiny, especially when it comes to details, so the quality of the output is not as good as it could be.

This doesn’t mean that there is no compliance testing, but it requires effort. That effort can pay off, as I have seen with WebSockets, FIPS-140, HTTP Caching, and others.

Implement in Parallel

My experience with HTTP/2 was not a whole lot different to those early projects. The major improvement there was the level of active engagement from implementers in developing the specification.

This process did not involve active development of a compliance testing framework, but there were regular interoperability tests. I still remember Jeff Pinner deploying draft -04 to production on twitter.com during a meeting. Not everyone was so fearless, but live deployment was something we saw routinely in the 13 subsequent drafts it took finalize the work.

Good feedback from implementations was key to the success of HTTP/2, which now drives well over half of the HTTP requests in Firefox.

The excellent h2spec came out a little after the release of the specification. It has since become a valuable compliance testing tool for implementers.

Test in Parallel

TLS 1.3 followed a similar trajectory to HTTP/2, with a few interesting twists. Part of the testing that occurred during the development of the protocol was formal verification. For example, a Tamarin model of TLS 1.3 was developed alongside the protocol, which both informed the design and provided validation of the design. Some implementations automated compliance testing based on a tool developed for BoringSSL, which turned out to be very useful.

With QUIC, Marten Seeman and Jana Iyengar developed a framework that automates testing between QUIC implementations. This runs regularly and produces a detailed report showing how each implementation stands up under a range of conditions, some of them quite adversarial. This has had a significant positive effect on the quality of both implementations and specifications^[1].

Overall, I can see no way of going back to anything less. In all cases, tests have been so valuable that there is no way I would go back to a world without them.

Test First

Of course, no mention of testing would be complete with remiss here not to mention the excellent Web Platform Tests, which are now critical parts of the process adopted by the WHATWG and some W3C groups. Web Platform Tests are considered a prerequisite for normative specification changes under these processes.

Akin to test driven development, this ensures that new features and changes are not just testable, but tested, before anything is documented. In practice the work continues in parallel, with tight feedback between development, specification, and testing. Shorter feedback cycles means that work can be completed faster and with higher quality.

The Role of Specifications

An obvious question that might be asked when it comes to this process, particularly where there are firm requirements for tests, is what value the specification provides. Given sufficiently thorough testing, it should be possible to construct an interoperable implementation based solely on those tests.

To go further, when specifications consist of mostly code-like constructs and real implementations are open source anyway, the value of a specification seems greatly diminished. As empirical observation of how things actually work is of more value to how they work in theory, it is reasonable to ask what value the specification provides.

As my own recent experience with the Cubic congestion control algorithm taught me, what is implemented and deployed is what matters. The RFC that purports to document Cubic is not really implementable and barely resembles what real implementations do.

So if testing is a central part of the development of new standards and people rely increasingly on tests or observing the behaviour of other implementations, it is reasonable to question what value specifications provide.

A Specification Can Teach

Specification documents often come with a bunch of normative language. Some of the most critical text defines what it means to be conformant, describing what is permitted and what is forbidden in precise terms.

Strictly normative text is certainly at risk from displacement from good testing. But there is often a bunch of non-normative filler in specifications. Though text might be purely informative, it is often of significant value to people who are attempting to understand the specification in detail:

Informative text can motivate the existence of the specification.
Filler can provide insights into why things are.
Notes can point to outcomes results that might not be obvious.

For specifications that are developed using an open process, much of this information is not hidden, but it can be difficult to find^[2]. Presenting timely, relevant information to readers is useful in putting things into context.

A Specification Can Capture Other Forms Of Agreement

One of the hardest lessons out of recent standards work has been the realization that many decisions are made with only superficial justification. Developing standards based on shared principles is much harder than agreeing on what happens in certain conditions, or which bit goes where.

Though it might be harder, reaching agreement^[3] on principles is far more enduring and valuable. A specification can document that agreement.

Reaching agreement or consensus on a principle can be hard for a variety of reasons:

Dealing with abstractions can be challenging because people can develop different abstract models based on their own perspective and biases. Subtle differences can mean a lot of talking past each other.
Abstractions can also become too far removed from reality to be useful. This might serve you well when filing a patent application, but ultimately we depend on principles being applicable to the current work^[4].
Agreement on principles can be difficult because it forces people to fully address differences of opinion.

Without first addressing disagreements in principle, it is possible that concrete decisions could be consistent with different perspectives. This might not have any immediate effect, but could produce inconsistencies. Some inconsistency can result in real problems, especially if it becomes necessary to rely more extensively on a principle that was in contention^[5].

However hard agreement might be to achieve, a principle that is agreed can inform multiple decisions. Documenting a principle that has achieved agreement can therefore more efficient over time. Documentation can also help avoid application of inconsistent or conflicting principles over time.

Documenting principles does not have a direct normative effect. But a specification offers an opportunity to document more than just conformance requirements, it can capture other types of agreement.

Conformance Test Suites Can Overreach

A problem that can occur with conformance testing is that the tests can disagree with specifications. If implementations depend more on the test than the specification, this can make the conformance test the true source of the definition of what it means to interoperate.

This is not inherently bad. It can be that the tests capture something that is inherently better, because it reflects what people need, because it is easier to implement, or just because that is what interoperates.

Of course, disagreement between two sources that claim authority does implementations a disservice. A new implementation now has to know which is “correct”. Ensuring that deployments, tests, and specifications align is critical to ensuring the viability of new implementations.

The true risk with relying on tests is the process by which conformance tests are maintained. Specification development processes are burdened with rules that govern how agreement is reached. Those rules exist for good reason.

Change control processes for conformance testing projects might not provide adequate protection for anti-trust or intellectual property. They also might lack opportunities for affected stakeholders to engage. This doesn’t have to be the case, but the governance structures underpinning most conformance suites is usually less robust than that of standards^[6].

Conclusions

The exact nature of how specifications are used to guiding the development of interoperable standards is something of a fluid situation. Here I’ve laid out a case for the value of specifications: for the non-normative language they provide, for their ability to capture agreement on more than just normative functions, and for the governance structures that they use. There are probably other reasons too, and likely counter-arguments, both of which I would be delighted to hear about.

I should also point at QUIC Tracker and Kazu Yamamoto has also started work on an h3spec, both of which have made significant contributions too. ↩︎
For example, the development of even a relatively small specification like QUIC involved more than 4000 issues and pull requests, more than 8000 email messages, not to mention all the chat messages that are not in public archives. ↩︎
…or consensus if that is how you spell it. ↩︎
This is perhaps a criticism that might be levelled at the priority of constituencies or text like that in RFC 8890. However, these might be more correctly viewed as meta-principles, or ideals that guide the development of more specific and actionable principles. ↩︎
An example of this might be DNS, where the need for agreement on principles was neglected. As such, the global community has no documented principles that might guide decisions on issues such as having a single global namespace or whether network operators are entitled to be involved in name resolution. Now that encrypted DNS is being rolled out, reflective of a principle that values individual privacy, it is become obvious that people differing views but no shared principles have been coexisting. ↩︎
Not that these too lack opportunities for improvement, but they are the best we have. ↩︎

RFCs in HTML

2020-12-18T00:00:00Z

I spend a shocking amount of my time staring at IETF documents, both Internet-Drafts and RFCs. I have spend quite a bit of time looking at GitHub README files and W3C specifications.

For reading prose, the format I routinely find to be the most accessible is the text versions. This is definitely not based on the quality of the writing, all of these formats produce unreadable documents. What I refer to here is not the substance, but the form. That is, how the text is laid out on my screen^[1].

There is clearly a degree of familiarization and bias involved in this. A little while ago, I worked out that there is just one thing that elevates that clunky text format above the others: line length.

Relearning Old Lessons

This is hardly a new insight. A brief web search will return numerous articles on the subject^[2]. All of them say the same thing: shorter lines are more readable.

I was unable to find a single print newspaper that didn’t take this advice to heart, if not to extremes^[3]. Some magazines have ignored this, but those too turned out to be ill suited to reading prose and more geared toward looking at the pictures.

Recommendations from most sources put a hard stop somewhere around 80 characters. Some go a little lower or higher, but the general advice is pretty consistent. Of course, variable-width fonts make this imprecise, but it tends to average out.

Why Text Is So Good

I suppose that it is no accident that this corresponds to the width of the screen on a DEC 52. The text format of old RFCs^[4] might have been to fit on these small screens, or it might have been to make printing easier, but the net effect is that you can get just 72 characters on a line. The standard tools spend three of those on a left margin for text, so that means just 69 fixed-width characters per line.

That turns out to be very readable.

Why HTML Is So Bad

The “official” HTML renderings of RFCs on rfc-editor.org is a little wider than this. If I measure using whole alphabets^[5], this results in a width of 98 characters. That’s more than the maximum in any recommendation I found.

Performing a similar test on the W3C specification style^[6] used for W3C publications, I got 102 characters. The WHATWG Fetch Standard had room for a massive 163 characters!

All of these wrap earlier than this on a smaller screen, but these are relatively small font sizes, so many screens will be wide enough to reach these values. Many people have a screen that has the 1300 horizontal pixels^[7] needed to get to 100 characters in a W3C specification. The official IETF HTML crams its 98 characters into just 724 pixels.

High text density comes from the font size and line height being quite small in official renderings of IETF documents. This compounds the problem as it makes tracking from one line to the next when reading more difficult. I consider the 14px/22.4px of the official IETF rendering to be positively tiny. I use a 9px (monospace) font in terminals, but I wouldn’t inflict that choice on others. That W3C and WHATWG settled on 16px/24px is far more humane, though with the selected font I still find this a little on the small side.

What is interesting here is that the text rendering on tools.ietf.org uses a value of 13.33px. This seems smaller, but - at least subjectively - it is no harder to read than the 16px W3C/WHATWG specifications. Also, the default font configuration in Firefox is 16px for most fonts and 13px for monospace, suggesting that smaller font sizes are better tolerated for monospace fonts. That’s especially convenient here as it happens.

Making HTML Readable

The fix is pretty simple, make the max-width small enough that lines don’t run so long. I set a value of 600px. Combine this with a font size of 16px and the result is a line length of 72^[8].

The editor’s copy of the QUIC spec is a fairly thorough example of this.

Fonts

I chose to change the font to something that is a little wider at the same time. Using Arial - the default sans-serif font on Windows and the font chosen by the W3C and WHATWG - adds 4-5 characters to line length and is noticeably smaller on screen. Times New Roman - the default serif font - adds 9-10 characters and is smaller again.

Lora, which has a light serif, was my choice for text. I know little enough about fonts that this was ultimately subjective. Noto Sans, the font used in IETF official renderings, is comparable here, but I find it a little boring.

Some people don’t like the visual noise of a serif font for reading on a screen. Modern displays with high pixel density are less vulnerable to that and this is a light font with enough serif noise to add a little flair without adversely affecting readability. Lora is very readable at 16px, where many other serif fonts require a larger size to be similarly clear.

Headings

Fitting headings on a single line given the shorter line length turned out to be fiddly. I didn’t want headings to wrap, or to use too small a font. And IETF people have a deep and abiding love for very long headings. For this, a condensed font was ideal.

A semi-condensed font might have been ideal, but there are fewer of those and it was a little hard to find one that didn’t look too jarring next to the main text^[9]. Again Google Fonts was a great resource and Cabin Condensed is OK.

ASCII Art

In setting this size, it is then necessary to consider the effect on diagrams. IETF documents are still stuck in the dark ages when it comes to diagrams and ASCII Art still dominates there. As the text format accepts 72 column text, so too must the figures in the HTML output.

This turns out to be a bit of a compromise. Styling of figures to include an offset from text, a border, and background shading eats up horizontal space. In the end, I managed to reduce the text size to 13.5px and set letter-spacing: -0.2px to slightly compress the text further and fit 72 columns in^[10].

Minimizing Distractions

The styles used here are based on those from an earlier version of the official renderings. Once the major pieces were in place, the details need to be aligned to fit. After fixing major items like margins and line heights to match font and size choices, a bunch of work is needed to make documents look consistent. The first task was removing a bunch of design elements that I found distracting.

The HTML rendering includes a pilcrow at the end of each paragraph. This enables linking to specific paragraphs, which is a great feature.

The official styling only renders the pilcrow when the paragraph is hovered^[11], but it renders very strongly when shown and so can be distracting. That needed softening.

The default blue (#00f) for links is strongly saturated, which is too assertive. Reducing the saturation makes links blend into text better.

Changing background colours on hover for titles is a nice way of indicating the presence of links, but that too was very strong. Making that lighter made moving the mouse less of a light show.

Cleanup

Then there was a bunch of maintenance and tidying:

Negative margins on headings, presumably to tweak the position of headings when following internal links to section headings, went^[12].
Rules that were overwritten later in the file were consolidated.
The table of contents was moved closer to content.
Horizontal lines were given the flick.
Table and figure captions were tightened up.
Authors addresses were put into multiple columns.
The References section got a big cleanup too.
I use CSS variables (var(--foo)), which is a great feature.

Finally, a bunch of work was put into making this look decent on a small screen.

Conclusion

What I’ve learned from this is a newfound respect for the work designers do. My amateur fumbling here has helped me appreciate just how many detail work goes into making something like this look good.

Immense thanks are owed to Anitra Nottingham, who graciously provided feedback on earlier versions of this work. Those versions were obviously much worse. I also owe thanks to Mark Nottingham, James Gruessing, Adam Roach, Jeffrey Yasskin and those I’ve forgotten who each took the time to provide feedback and expertise.

None of this is truly professional. I’m still finding things that I don’t like. I’m still not happy with various pieces of spacing, for instance.

Even learning this much design is more of a curse than I’d like. I might not ace cantunsee, but I know enough to notice things like alignment issues and bad kerning^[13] now. I’m not sure that that has enriched my life all that much.

But the main thing remains: I can read these documents now. Cutting the line length was what did that. I now prefer HTML if it uses this stylesheet^[14]. The rest was just gravy.

The stylesheet can be found here. Contributions are welcome. Anyone using my GitHub template for generating Internet-Drafts already benefits from this work.

Reading from paper is not something I can countenance; the cost of in paper on my specification reading alone would be devastating and I like tree too much to do that to them. ↩︎
And those are just links from my browsing history ↩︎
So many hyphens… ↩︎
Officially, they are all XML now and only rendered to text or HTML. ↩︎
The way to do this is to find a paragraph and open it in browser developer tools. Add a style rule of overflow: hidden then modify the content to be “abcdef…” and repeat until the text cuts off. This follows the advice in Butterick’s Practical Typography. ↩︎
I tested the Push API, which uses ReSpec, but specifications using Bikeshed produced exactly the same result. ↩︎
Using the browser measure for pixel, which doesn’t correspond to dots on screen for devices with high pixel density. ↩︎
I wasn’t going for this deliberately, but that is how it worked out. ↩︎
In particular, I have this thing about the shape of ‘e’ and ‘a’. They can’t be dramatically different. ↩︎
The need for packing this tightly came when I discovered that pilcrows for figures were possible, but the official rendering put them on a blank line. That broke the document flow badly and I wanted space for those on the line as well. See this example for how that turned out. ↩︎
Is this an accessibility problem? I don’t know. ↩︎
I’ve learned that with CSS, like many other things, can lend itself easily to making small hacks. The net effect of introducing a hack is invariably that you have to add a whole bunch more corrective hacks in a death spiral. Avoid hacks. ↩︎
My 9 year old son finds signs for this real estate company, which seem deliberately bad, amusing. It’s clearly infectious. ↩︎
Mark Nottingham has a different stylesheet that is also acceptable. He also uses a very nice font. (Edit 2024-12-11: Mark is now using my stylesheet. Shame about that font.) ↩︎

Next Level Version Negotiation

2020-12-11T00:00:00Z

The IAB EDM Program^[1] met this morning. While the overall goal of the meeting, we ended up talking a lot a document I wrote a while back and how to design version negotiation in protocols.

This post provides a bit of background and shares some of what we learned today after what was quite a productive discussion.

Protocol Ossification

The subject of protocol ossification has been something of a live discussion in the past several years. The community has come to the realization that it is effectively impossible to extend many Internet protocols without causing a distressing number of problems with existing deployments. It seems like no protocol is unaffected^[2]. IP, TCP, TLS, and HTTP all have various issues that prevent extensions from working correctly.

A number of approaches have been tried. HTTP/2, which was developed early in this process, was deployed only for HTTPS. Even though a cleartext variant was defined, many implementations explicitly decided not to implement that, partly motivated by these concerns. QUIC doubles down on this by encrypting as much as possible.

TLS 1.3, which was delayed by about a year by related problems, doesn’t have that option so it ultimately used trickery to avoid notice by problematic middleboxes: TLS 1.3 looks a lot like TLS 1.2 unless you are paying close attention.

One experiment that turned out to quite successful in revealing ossification in TLS was GREASE. David Benjamin and Adam Langley, who maintain the TLS stack used by Google^[3] found that inserting random values into different extension points had something of a cleansing effect on the TLS ecosystem. Several TLS implementations were found to be intolerant of new extensions.

One observation out of the experiments with TLS was that protocol elements that routinely saw new values, like cipher suites, were less prone to failing when previously unknown values were encountered. Those that hadn’t seen new values as often, like server name types or signature schemes, were more likely to show problems. This caused Adam Langley to advise that protocols “have one joint and keep it well oiled.”

draft-iab-use-it-or-lose-it explores the problem space a little more thoroughly. The draft looks at a bunch of different protocols and finds that in general the observations hold. The central thesis is that for an extension point to be usable, it needs to be actively used.

Version Negotiation

The subject of the discussion today was version negotiation. Of all the extension points available in protocols, the one that often sees the least use is version negotation. A version negotiation mechanism has to exist in the first version of a protocol, but it is never really tested until the second version is deployed.

No matter how carefully the scheme is designed^[4], the experience with TLS shows that even a well-designed scheme can fail.

The insight for today, thanks largely to Tommy Pauly, was that the observation about extension points could be harnessed to make version negotiation work. Tommy observed that some protocols don’t design in-protocol version negotiation schemes, but instead rely on the protocol at the next layer down. And these protocols have been more successful at avoid some of the pitfalls inherent to version negotiation.

At the next layer down the stack, the codepoints for the higher-layer protocol are just extension codepoints. They aren’t exceptional for the lower layer and they probably get more use. Therefore, these extension points are less likely to end up being ossified when the time comes to rely on them.

Supporting Examples

Tommy offered a few examples and we discussed several others.

IPv6 was originally intended to use the IP EtherType (0x0800) in 802.1, with routers looking at the IP version number to determine how to handle packets. That didn’t work out^[5]. What did work was assigning IPv6 its own EtherType (0x86dd). This supports the idea that a function that was already in use for other reasons^[6] was better able to support the upgrade than the in-protocol mechanisms that were originally designed for that purpose.

HTTP/2 was floated as another potential example of this effect. Though the original reason for adding ALPN was performance - we wanted to ensure that we wouldn’t have to do another round trip after the TLS handshake to do Upgrade exchange - the effect is that negotiation of HTTP relied on a mechanism that was well-tested and proven at the TLS layer^[7].

We observed that ALPN doesn’t work for the HTTP/2 to HTTP/3 upgrade as these protocols don’t share a protocol. Here, we observed that we would likely end up relying on SVCB and the HTTPS DNS record.

Carsten Bormann also pointed at SenML, which deliberately provides no inherent version negotiation. I suggest that this is an excellent example of relying on lower-layer negotiation, in this case the content negotiation functions provided by underlying protocols like CoAP or HTTP.

It didn’t come up at the time, but one of my favourite examples comes from the people building web services at Mozilla. They do not include version numbers in URLs or hostnames for their APIs and they don’t put version numbers in request or response formats. The reasoning being that, should they need to roll a new version that is incompatible with the current one, they can always deploy to a new domain name. I always appreciated the pragmatism of that approach, though I still see lots of /v1/ in public HTTP API documentation.

These all seem to provide good support for the basic idea.

Counterexamples

Any rule like this isn’t worth anything without counterexamples. Understanding counterexamples helps us understand what conditions are necessary for the theory to hold.

SNMP, which was already mentioned in the draft as having successfully managed a version transition using an in-band mechanism, was a particularly interesting case study. Several observations were made, suggesting several inter-connected reasons for success. It was observed that there was no especially strong reason to prefer SNMPv3 over SNMPv2 (or SNMPv2c), a factor which resulted in both SNMP versions coexisting for years.

There was an interesting sidebar at this point. It was observed that SNMP doesn’t have any strong need to avoid version downgrade attacks in the way that a protocol like TLS might. Other protocols might not tolerate such phlegmatic coexistence.

While SNMP clients do include probing code to determine what protocols were supported. However, as network management systems include provisioning information for devices, it is usually the case that protocol support for managed devices is stored alongside other configuration. Thus we concluded that SNMP - to the extent that it even needs version upgrades - was closest to the “shove it in the DNS” approach used for the upgrade to HTTP/3.

In Practice

The lesson here is that planning for the next version doesn’t mean designing a version negotiation mechanism. It’s possible that a perfectly good mechanism already exists. If it does, it’s almost certainly better than anything you might cook up.

This is particularly gratifying to me as I had already begun following the practice of SenML with other work. For instance, RFC 8188 provides no in-band negotiation of version or even cryptographic agility. Instead, it relies on the existing content-coding negotiation mechanisms as a means of enabling its own eventual replacement. This was somewhat controversial at the time, especially the cryptographic agility part, but in retrospect it seems to be a good choice.

It’s also good to have a strong basis for rejecting profligate addition of extension points in protocols^[8], but now it seems like we have firm reasons to avoid designing version negotiation mechanisms into every protocol.

Maybe version negotiation can now be put better into context. Version negotiation might only belong in protocols at the lowest levels of the stack^[9]. For most protocols, which probably need to run over TLS for other reasons, ALPN and maybe SVCB can stand in for version negotiation, with the bonus that these are specifically designed to avoid adding latency. HTTP APIs can move to a different URL.

As this seems solid, I now have the task of writing a brief summary of this conclusion for the next revision of the “use it or lose it” draft. That might take some time as there are a few open issues that need some attention.

Not electronic dance music sadly, it’s about Evolvability, Deployability, & Maintainability of Internet protocols ↩︎
UDP maybe. UDP is simple enough that it doesn’t have ~~features~~/bugs. Not to say that it is squeaky clean, it has plenty of baggage, with checksum issues, a reputation for being used for DoS, and issues with flow termination in NATs. ↩︎
BoringSSL, which is now used by a few others, including Cloudflare and Apple. ↩︎
Section 4.1 of RFC 6709 contains some great advice on how to design a version negotiation scheme, so that you can learn from experience. Though pay attention to the disclaimer in the last paragraph. ↩︎
No one on the call was paying sufficient attention at the time, so we don’t know precisely why. We intend to find out, of course. ↩︎
At the time, there was still reasonable cause to think that IP wouldn’t be the only network layer protocol, so other values were being used routinely. ↩︎
You might rightly observe here that ALPN was brand new for HTTP/2, so the mechanism itself wasn’t exactly proven. This is true, but there are mitigating factors. The negotiation method is exactly the same as many other TLS extensions. And we tested the mechanism thoroughly during HTTP/2 deployment as each new revision from the -04 draft onwards was deployed widely with a different ALPN string. By the time HTTP/2 shipped, ALPN was definitely solid. ↩︎
There is probably enough material for a long post on why this is not a problem in JSON, but I’ll just assert for now - without support - that there really is only one viable extension point in any JSON usage. ↩︎
It doesn’t seem like TLS or QUIC can avoid having version negotiation. ↩︎

Oblivious DoH

2020-12-09T00:00:00Z

Today we heard an announcement that Cloudflare, Apple, and Fastly are collaborating on a new technology for improving privacy of DNS queries using a technology they call Oblivious DoH (ODoH).

This is an exciting development. This posting examines the technology in more detail and looks at some of the challenges this will need to overcome before it can be deployed more widely.

How ODoH Provides Privacy for DNS Queries

Oblivious DoH is a simple mixnet protocol for making DNS queries. It uses a proxy server to provide added privacy for query streams.

This looks something like:

digraph ODoH {
  graph [overlap=true, splines=line, nodesep=1.0, ordering=out];
  node [shape=rectangle, fontname=" "];
  edge [arrowhead=none];
  { rank=same; Client->Proxy; Proxy->Resolver; }
}

A common criticism of DNS over HTTPS (DoH) is that it provides DoH resolvers with lots of privacy-sensitive information^[1]. Currently all DNS resolvers, including DoH resolvers, see the contents of queries and can link that to who is making those queries. DoH includes connection reuse, so resolvers can link requests from the same client using the connection.

In Oblivious DoH, a proxy aggregates queries from multiple clients so that the resolver is unable to link queries to individual clients. ODoH protects the IP address of the client, but it also prevents the resolver from linking queries from the same client together. Unlike an ordinary HTTP proxy, which handle TLS connections to servers^[2], ODoH proxies handle queries that are individually encrypted.

ODoH prevents resolvers from assembling profiles on clients by collecting the queries they make, because resolvers see queries from a large number of clients all mixed together.

An ODoH proxy learns almost nothing from this process as ODoH uses HPKE to encrypt the both query and answer with keys chosen by the client and resolver.

The privacy benefits of ODoH can only be undone if both the proxy and resolver cooperate. ODoH therefore recommends that the two services be run independently, with the operator of each making a commitment to respecting privacy.

Costs

The privacy advantages provided by the ODoH design come at a higher cost than DoH, where a client just queries the resolver directly:

The proxy adds a little latency as it needs to forward queries and responses.
HPKE encryption adds up to about 100 bytes to each query.
The client and resolver need to spend a little CPU time to add and remove the encryption.

Cloudflare’s tests show that the overall effect of ODoH on performance is quite modest. These early tests even suggest some improvement for the slowest queries. If those performance gains can be kept as they scale up their deployment, that would be strong justification for deployment.

Why This Design

A similar outcome might be achieved using a proxy that supports HTTP CONNECT. However, in order to avoid the resolver from learning which queries come from the same client, each query would have to use a new connection.

That gets pretty expensive. While you might be able to tricks to drive down latency like sending the TLS handshake with the HTTP CONNECT, it means that every request uses a separate TCP connection and a round trip to establish the connection^[3].

It is also possible to use something like Tor, which provides superior privacy protection. Tor is a lot more expensive.

Using HPKE and a multiplexed protocol like HTTP/2 or HTTP/3 avoids per-query connection setup costs. However, the most important thing is that it involves only minimal additional latency to get the privacy benefits^[4].

Key Management in DNS

The proposal puts HPKE keys for the resolver in the DNS^[5]. The idea is that clients can talk to the resolver directly to get these, then use that information to protect its queries. As the keys are DNS records, they can be retrieved from any DNS resolver, which is a potential advantage.

This also means that this ODoH design depends on DNSSEC. Many clients rely on their resolver to perform DNSSEC validation, which doesn’t help here. So this makes it difficult to deploy something like this incrementally in clients.

A better option might be to offer the HPKE public key information in response to a direct HTTP request to the resolver. That would ensure that the key could be authenticated by the client using HTTPS and the Web PKI.

Trustworthiness of Proxies

Both client and resolver will want to authenticate the proxy and only allow a trustworthy proxy. The protocol design means that the need for trust in the proxy is limited, but it isn’t zero.

Clients need to trust that the proxy is hiding their IP address. A bad proxy could attach the client IP address to every query they forward. Clients will want some way of knowing that the proxy won’t do this^[6].

Resolvers will likely want to limit the number of proxies that they will accept requests from, because the aggregated queries from a proxy of any reasonable size will look a lot like a denial of service attack. Mixing all the queries together denies resolvers the ability to do per-client rate limiting, which is a valuable denial of service protection measure. Resolvers will need to apply much more generous rate limits for these proxies and trust that the proxies will take reasonable steps to ensure that individual clients are not able to generate abusive numbers of queries.

This means that proxies will need to be acceptable to both client and resolver. Early deployments will be able to rely on contracts and similar arrangements to guarantee this. However, if use of ODoH is to scale out to support large numbers of providers of both proxies and resolvers, it could be necessary to build systems for managing these relationships.

Proxying For Other Applications

One obvious thing with this design is that it isn’t unique to DNS queries. In fact, there are a large number of request-response exchanges that would benefit from the same privacy benefits that ODoH provides. For example, Google this week announced a trial of a similar technology for preloading content.

A generic design that enabled protection for HTTP queries of any sort would be ideal. My hope is that we can design that protocol.

Once you look to designing a more generic solution, there are a few extra things that might improve the design. Automatic discovery of HTTP endpoints that allow oblivious proxying is one potential enhancement. Servers could advertise both keys and the proxies they support so that clients can choose to use those proxies to mask their address. This might involve automated proxy selection or discovery and even systems for encoding agreements. There are lots of possibilities here.

Centralization

One criticism regarding DoH deployments is that they encourage consolidation of DNS resolver services. For ODoH - at least in the short term - options for ODoH resolvers will be limited, which could push usage toward a small number of server operators in exchange for the privacy gains ODoH provides.

During initial roll-out, the number of proxy operators will be limited. Also, using a larger proxy means that your queries are mixed in with more queries from other people, providing marginally better privacy. That might provide some impetus to consolidate.

Deploying automated discovery systems for acceptable proxies might help mitigate the worst centralization effects, but it seems likely that this will not be a feature of early deployments.

In the end, it would be a mistake to cry “centralization” in response to early trial deployments of a technology, which are naturally limited in scope. Furthermore, it’s hard to know what the long term impact on the ecosystem will be. We might never be able to separate the effect of existing trends toward consolidation from the effect of new technology.

Conclusion

I like the model adopted here. The use of a proxy neatly addresses one of the biggest concerns with the rollout of DoH: the privacy risk of having a large provider being able to gather information about streams of queries that can be linked to your IP address.

ODoH breaks streams of queries into discrete transactions that are hard to assemble into activity profiles. At the same time, ODoH makes it hard to attribute queries to individuals as it hides the origin of queries.

My sense is that the benefits very much outweigh the performance costs, the protocol complexity, and the operational risks. ODoH is a pretty big privacy win for name resolution. The state of name resolution is pretty poor, with much of it still unprotected from snooping, interception, and poisoning. The deployment of DoH went some way to address that, but came with some drawbacks. Oblivious DoH takes the next logical step.

This is something all current DNS resolvers get, but the complaint is about the scale at which this information is gathered. Some people are unhappy that network operators are unable to access this information, but I regard that as a feature. ↩︎
OK, proxies do handle individual, unencrypted HTTP requests, but that capability is hardly ever used any more now that 90% of the web is HTTPS. ↩︎
Using 0-RTT doesn’t work here without some fiddly changes to TLS because the session ticket used for TLS allows the server to link connections together, which isn’t what we need. ↩︎
This also makes ODoH far more susceptible to traffic analysis, but it relies on volume and the relative similarity of DNS queries to help manage that risk. ↩︎
The recursion here means that the designers of ODoH probably deserve a prize of some sort. ↩︎
The willful IP blindness proposal goes into more detail on what might be required for this. ↩︎