Everything you need to know about selective disclosure

Why does this matter?

A lot of governments are engaging with projects to build “Digital Public Infrastructure”. That term covers a range of projects, but one of the common and integral pieces relates to government-backed identity services. While some places have had some form of digital identity system for years — hi Estonia! — there are many more governments looking to roll out some sort of digital identity wallet for their citizens. Notably, the European Union recently passed a major update to their European Digital Identity Regulation, which seeks to have a union-wide digital identity system for all European citizens. India’s Aadhaar is still the largest such project with well over a billion people enrolled.

There are a few ways that these systems end up being implemented, but most take the same basic shape. A government agency will be charged with issuing people with credentials. That might be tied to driver licensing, medical services, passports, or it could be a new identity agency. That agency issues digital credentials that are destined for wallets in phones. Then, services can request that people present these credentials at certain points, as necessary.

The basic model that is generally used looks something like this:

Three boxes with arrows between each in series, in turn labeled: Issuer, Holder, Verifier

The government agency is the “issuer”, your wallet app is a “holder”, and the service that wants your identity information is a “verifier”.

This is a model for digital credentials that is useful in describing a lot of different interactions. A key piece of that model is the difference between a credential, which is the thing that ends up in a wallet, and a presentation, which is what you show a verifier.

This document focuses on online use cases. That is, where you might be asked to present information about your identity to a website Though there are many other uses for identity systems, online presentation of identity is becoming more common. How we use identity online is likely to shape how identity is used more broadly.

The goal of this post is to provide information and maybe a fresh perspective on the topic. This piece also has a conclusion that suggests that the truly hard problems in online identity are not technical in nature, so do not necessarily benefit from the use of selective disclosure. As much as selective disclosure is useful in some contexts, there are significant challenges in deploying it on the Web.

What is selective disclosure?

A presentation might be a reduced form of the credential. Let’s say that you have a driver license, like the following:

A photo of a (fake) Hawaii driver license

One way of thinking about selective disclosure is to think of it as redacting those parts of the credential that you don’t want to share.

Let’s say that you want to show that you are old enough to buy alcohol. You might imagine doing something like this:

A photo of a (fake) Hawaii driver license with some fields covered with black boxes

That is, if you were presenting that credential to a store in person, you would want to show that the card truly belongs to you and that you are old enough.

If you aren’t turning up in person, the photo and physical description are not that helpful, so you might cover those as well.

You don’t need to share your exact birth date to show that you are old enough. You might be able to cover the month and day of those too. That is still too much information, but the best you can easily manage with a black highlighter.

If there was a “can buy alcohol” field on the license, that might be even better. But the age at which you can legally buy alcohol varies quite a bit across the world. And laws apply to the location, not the person. A 19 year old from Canada can’t buy alcohol in the US just because they can buy alcohol at home^[1]. Most digital credential systems have special fields to allow for this sort of rule, so that a US^[2] liquor store could use an “over_21” property, whereas a purchase in Canada might check for “over_18” or “over_19” depending on the province.

Simple digital credentials

The simplest form of digital credential is a bag of attributes, covered by a digital signature from a recognized authority. For instance, this might be a JSON Web Token, which is basically just a digitally-signed chunk of JSON.

For our purposes, let’s run with the example, which we’d form into something like this:

{
  "number": "01-47-87441",
  "name": "McLOVIN",
  "address": "892 MOMONA ST, HONOLULU, HI 96820",
  "iss": "1998-06-18",
  "exp": "2008-06-03",
  "dob": "1981-06-03",
  "over_18": true,
  "over_21": true,
  "over_55": false,
  "ht": "5'10",
  ...
}

That could then be wrapped up and signed by whatever Hawaiian DMV issues the license. Something like this:

Two nested boxes, the inner containing text "McLOVIN's Details"; the outer containing text "Digital Signature"

That isn’t perfect, because a blob of bytes like that can just be copied around by anyone that receives that credential. Anyone that received a credential could “impersonate” our poor friend.

The way that problem is addressed is through the use of a digital wallet. The issuer requires that the wallet hold a second signing key. The wallet provides the issuer with an attestation, which is just evidence from the wallet maker (which is often the maker of your phone) that they are holding a private key in a place where it can’t be moved or copied^[3]. That attestation includes the public key that matches that private key.

Once the issuer is sure that the private key is tied to the device, the issuer produces a credential that lists the public key from the wallet.

In order to use the credential, the wallet signs the credential along with some other stuff, like the current time and maybe the identity of the verifier^[4], as follows:

Nested boxes, the outer containing text "Digital signature using the Private Key from McLOVIN's Wallet"; two at the next level the first containing text "Verifier Identity, Date and Time, etc...", the other containing text "Digital Signature using the Private Key of the Hawaii DMV"; the latter box contains two further boxes containing text "McLOVIN's Details" and "McLOVIN's Wallet Public Key"

With something like this, unless someone is able to use the signing key that is in the wallet, they can’t generate a presentation that a verifier will accept. It also ensures that the wallet can use a biometric or password check to ensure that a presentation is only created when the person allows it.

That is a basic presentation that includes all the information that the issuer knows about. The problem is that this is probably more than you might be comfortable with sharing with a liquor store. After all, while you might be able to rely on the fact that the cashier in a store isn’t copying down your license details, you just know that any digital information you present is going to be saved, stored, and sold. That’s where selective disclosure is supposed to help.

Salted hash selective disclosure

One basic idea behind selective disclosure is to replace all of the data elements in a credential — or at least the ones that someone might want to keep to themselves — with placeholders. Those placeholders are replaced with a commitment to the actual values. Any values that someone wants to reveal are then included in the presentation. A verifier can validate that the revealed value matches the commitment.

The most basic sort of commitment is a hash commitment. That uses a hash function, which is really anything where it is hard to produce two inputs that result in the same output. The commitment to a value of X is H(X).

That is, you might replace the (“name”, “McLOVIN”) with a commitment like H(“name” || “McLOVIN”). The hash function ensures that it is easy to validate that the underlying values match the commitment, because the verifier can compute the hash for themselves. But it is basically impossible to recover the original values from the hash. And it is similarly difficult to find another set of values that hash to the same value, so you can’t easily substitute false information.

A key problem with a hash commitment is that a simple hash commitment only works to protect the value of the input if that input is hard to guess in the first place. But most of the stuff on a license is pretty easy to guess in one way or another. For simple stuff like “over_21”, there are just two values: “true” or “false”. If you want to know the original value, you can just check each of the values and see which matches.

Even for fields that have more values, it is possible to build a big table of hash values for every possible (or likely) value. This is called a “rainbow table”^[5].

A diagram showing mappings from hashes to values

Rainbow tables don’t work if the committed value very hard to guess. So, in addition to the value of the field, a large random number is added to the hidden value. This number is called “salt” and a different value needs to be generated for every field that can be hidden, with different values for every new credential. As long as there are many more values for the salt than can reasonably be stored in a rainbow table, there is no easy way to work out which commitment corresponds to which value.

So for each field, the issuer generates a random number and replaces all fields in the credential with H(salt || name || value), using some agreed encoding. The issuer then signs over those commitments and provides the wallet with a credential that is full of commitments, plus the full set of values that were committed to, including the associated salt.

A credential containing commitments to values, with the value and associated salt alongside

The wallet can then use the salt and the credential to reveal a value and prove that it was included in the credential, creating a presentation something like this:

A presentation using the credential, with selected values and their salt alongside

The verifier then gets a bunch of fields with the key information replaced with commitments. All of the commitments are then signed by the issuer. The verifier also gets some number of unsigned tuples of (salt, name, value). The verifier can then check that H(salt || name || value) matches one of the commitments.

This is the basic design that underpins a number of selective disclosure designs. Salted hash selective disclosure is pretty simple to build because it doesn’t require any fancy cryptography. However, salted hash designs have some limitations that can be a little surprising.

Other selective disclosure approaches

There are other approaches that might be used to solve this problem. Imagine that you had a set of credentials, each of which contained a single attribute. You might imagine sharing each of those credentials separately, choosing which ones you show based on what the situation demanded.

That might look something like this:

A presentation that includes multiple separate credentials, each with a single attribute

Having multiple signatures can be nefficient, but this basic idea is approximately sound^[7]. There are a lot of signatures, which would make a presentation pretty unwieldy if there were lots of properties. There are digital signature schemes that make this more efficient though, like the BLS scheme, which allows multiple signatures to be folded into one.

That is the basic idea behind SD-BLS. SD-BLS doesn’t make it cheaper for an issuer. An issuer still needs to sign a whole bunch of separate attributes. But combining signatures means that it can make presentations smaller and easier to verify. SD-BLS has some privacy advantages over salted hashes, but the primary problem that the SD-BLS proposal aims to solve is revocation, which is covered in more detail below.

Problems with salted hashes

Going back to the original example, the effect of the salted hash is that you probably get something like this:

A Hawaii driver license with all the fields covered with gray rectangles, except the expiry date

Imagine that every field on the license is covered with the gray stuff you get on scratch lottery tickets. You can choose which to scratch off before you hand it to someone else^[8]. Here’s what they learn:

That this is a valid Hawaii driver license. That is, they learn who issued the credential.
When the license expires.
The value of the fields that you decided to reveal.
How many fields you decided not to reveal.
Any other places that you present that same credential, as discussed below.

On the plus side, and contrary to what is shown for a physical credential, the size and position of fields is not revealed for a digital credential.

Still, that is likely a bit more information than might be expected. If you only wanted to reveal the “over_21” field so that you could buy some booze, having to reveal all those other things isn’t exactly ideal.

Revealing who issued the credential seems like it might be harmless, but for a digital credential, that’s revealing a lot more than your eligibility to obtain liquor. Potentially a lot more. Maybe in Hawaii, holding a Hawaii driver license isn’t notable, but it might be distinguishing — or even disqualifying — in other places. A Hawaii driver license reveals that you likely live in Hawaii, which is not exactly relevant to your alcohol purchase. It might not even be recognized as valid in some places.

If the Hawaiian DMV uses multiple keys to issue credentials, you’ll also reveal which of those keys was used. That’s unlikely to be a big deal, but worth keeping in mind as we look at alternative approaches.

Revealing the number of fields is a relatively minor information leak. This constrains the design a little, but not in a serious way. Basically, it means that you should probably have the same set of fields for everyone.

For instance, you can’t include only the “over_XX” age fields that are true; you have to include the false ones as well or the number of fields would reveal an approximate age. That is, avoid:

{ ..., "older_than": [16, 18], ... }

Note: Some formats allow individual items in lists like this to be committed separately. The name of the list is generally revealed in that case, but the specific values are hidden. These usually just use H(salt || value) as the commitment.

And instead use:

{ ..., "over_16": true, "over_18": true, "over_21": false, "over_55": false, ... }

Expiration dates are tricky. For some purposes, like verifying that someone is allowed to drive, the verifier will need to know if the credential is not expired.

On the other hand, expiry is probably not very useful for something like age verification. After all, it’s not like you get younger once your license expires.

The exact choice of expiration date might also carry surprising information. Imagine that only one person was able to get a license one day because the office had to close or the machine broke down. If the expiry date is a fixed time after issuance, the expiry date on their license would then be unique to them, which means that revealing that expiration date would effectively be identifying them.

The final challenge here is the least obvious and most serious shortcoming of this approach: linkability.

Linkability and selective disclosure

A salted hash credential carries several things that makes the credential itself identifiable. This includes the following:

The value of each commitment is unique and distinctive.
The public key for the wallet.
The signature that the issuer attaches to the credential.

Each of these is unique, so if the same credential is used in two places, it will clearly indicate that this is the same person, even if the information that is revealed is very limited.

For example, you might present an “over_21” to purchase alcohol in one place, then use the full credential somewhere else. If those two presentations use the same credential, those two sites will be able to match up the presentations. The entity that obtains the full credential can then share all that knowledge with the one that only knows you are over 21, without your involvement.

A version of the issuer-holder-verifier diagram with multiple verifiers

Even if the two sites only receive limited information, they can still combine the information they obtain — that you are over 21 and what you did on each site — into a profile. The building of that sort of profile online is known as unsanctioned tracking and generally regarded as a bad thing.

This sort of matching is technically called verifier-verifier linkability. The way that it can be prevented is to ensure that a completely fresh credential is used for every presentation. That includes a fresh set of commitments, a new public key from the wallet, and a new signature from the issuer (naturally, the thing that is being signed is new). At the same time, ensuring that the presentation doesn’t include any extraneous information, like expiry dates, helps.

A system like this means that wallets need to be able to handle a whole lot of credentials, including fresh public keys for each. The wallet also needs to be able to handle cases where its store of credentials run out, especially when the wallet is unable to contact the issuer.

Issuers generally need to be able to issue larger batches of credentials to avoid that happening. That involves a lot of computationally intensive work for the issuer. This makes wallets quite a bit more complex. It also increases the cost of running issuance services because they need better availability, not just because they need more issuance capacity.

In this case, SD-BLS has a small advantage over salted hashes because its “unregroupability” property means that presentations with differing sets of attributes are not linkable by verifiers. That’s a weaker guarantee than verifier-verifier unlinkability, because presentations with the same set of attributes can still be linked by a verifier; for that, fresh credentials are necessary.

Using a completely fresh credential is a fairly effective way to protect against linkability for different verifiers, but it does nothing to prevent verifier-issuer linkability. An issuer can remember the values they saw when they issued the credential. A verifier can take any one of the values from a presentation they receive (commitments, public key, or signature) and ask the issuer to fill in the blanks. The issuer and verifier can then share anything that they know about the person, not limited to what is included in the credential.

A version of the issuer-holder-verifier diagram with a bidirectional arrow between issuer and verifier

What the issuer and verifier can share isn’t limited to the credential. They can share anything they know, not just the stuff that was included in the credential. Maybe McLovin needed to show a passport and a utility bill in order to get a license and the DMV kept a copy. The issuer could give that information to the verifier. The verifier can also share what they have learned about the person, like what sort of alcohol they purchased.

Useful linkability

In some cases, linkability might be a useful or essential feature. Imagine that selective disclosure is used to authorize access to a system that might be misused. Selective disclosure avoids exposing the system to information that is not essential. Maybe the system is not well suited to safeguarding private information. The system only logs access attempts and the presentation that was used.

In the event that the access results in some abuse, the abuse could be investigated using verifier-issuer linkability. For example, the access could be matched to information available to the issuer to find out who was responsible for the abuse.

The IETF is developing a couple of salted hash formats (in JSON and CBOR) that should be well suited to a number of applications where linkability is a desirable property.

All of this is a pretty serious problem for use for something like online age verification. Having issuers, which are often government agencies, being in a position to trace activity, might have an undesirable chilling effect. This is something that legislators generally recognize and laws often include provisions that require unlinkability^[9].

In short, salted hash based systems only work if you trust the issuer.

Linkable attributes

There is not much point in avoiding linkability when the disclosed information is directly linkable. For instance, if you selectively disclose your name and date of birth, that information is probably unique or highly identifying. Revealing identifying information to a verifier makes verifier-issuer linkability easy; just like revealing the same information to two verifiers makes verifier-verifier linkability simple.

This makes linkability for selective disclosure less concerning when it comes to revealing information that might be identifying.

Unlinkability therefore tends to be most useful for non-identifying attributes. Simple attributes — like whether someone meets a minimum age requirement, holds a particular qualification, or has authorization — are less likely to be inherently linkable, so are best suited to being selectively disclosed.

Privacy Pass

If the goal is to provide a simple signal, such as whether a person is older than a target age, Privacy Pass is specifically designed to prevent verifier-issuer linkability.

Privacy Pass also includes options that split the issuer into two separate functions — an issuer and an attester — where the attester is responsible for determining if a holder (or client) has the traits required for token issuance and the issuer only creates the tokens. This might be used to provide additional privacy protection.

The four entities of the Privacy Pass architecture: Issuer, Attester, Holder/Client, and Verifier/Service

A Privacy Pass issuer could produce a token that signifies possession of a given trait. Only those with the trait would receive the token. For age verification, the token might signify that a person is at a selected age or older.

Token formats for Privacy Pass that include limited public information are also defined, which might be used to support selective disclosure. This is far less flexible than the salted hash approach as a fresh token needs to be minted with the set of traits that will be public. That requires that the issuer is more actively involved or that the different sets of public traits are known ahead of time.

Privacy Pass does not naturally provide verifier-verifier unlinkability, but a fresh token could be used for each usage, just like for the salted hash design. Some of the Privacy Pass modes can issue a batch of tokens for this reason.

In order to provide tokens for different age thresholds or traits, an issuer would need to use different public keys, each corresponding to a different trait.

Privacy Pass is therefore a credible alternative to the use of salted hash selective disclosure for very narrow cases. It is somewhat inflexible in terms of what can be expressed, but that could mean more deliberate additions of capabilities. The strong verifier-issuer unlinkability is definitely a plus, but it isn’t without shortcomings.

Key consistency

One weakness of Privacy Pass is that it depends on the issuer using the same key for everyone. The ideal privacy is provided when there is a single issuer with just one key for each trait. With more keys or more issuers, the key that is used to generate a token carries information, revealing who issued the token. This is just like the salted hash example where the verifier needs to learn that the Hawaiian DMV issued the credential.

The privacy of the system breaks down if every person receives tokens that are generated using a key that is unique to them. This risk can be limited through the use of key consistency schemes. This makes the system a little bit harder to deploy and operate.

As foreshadowed earlier, the same key switching concern also applies to a salted hash design if you don’t trust the issuer. Of course, we’ve already established that a salted hash design basically only works if you trust the issuer. Salted hash presentations are linkable based on commitments, keys, or signatures, so there is no real need to play games with keys.

Anonymous credentials

A zero knowledge proof enables the construction of evidence that a prover knows something, without revealing that information. For an identity system, it allows a holder to make assertions about a credential without revealing that credential. That creates what is called an anonymous credential.

Anonymous credentials are appealing as the basis for a credential system because the proofs themselves contain no information that might link them to the original credential.

Verifier-issuer unlinkability is a natural consequence of using a zero knowledge proof. Verifier-verifier unlinkability would be guaranteed by providing a fresh proof for each verifier, which is possible without obtaining a fresh credential. The result is that anonymous credentials provide excellent privacy characteristics.

Zero knowledge proofs trace back to systems of provable computation, which mean that they are potentially very flexible. A proof can be used to prove any property that can be computed. The primary cost is in the amount of computation it takes to produce and validate the proof^[10]. If the underlying credential can be adjusted to support the zero knowledge system, these costs can be reduced, which is what the BBS signature scheme does. Unmodified credentials can be used if necessary.

Thus, a proof statement for use in age verification might be a machine translation of the following compound statement:

this holder has a credential signed by the Hawaiian DMV;
the expiration date on the credential is later than the current date;
the person is 21 or older (or the date of birth plus 21 years is earlier than the current date);
the holder knows the secret key associated with the public key mentioned in the credential; and,
the credential has not been used with the current verifier more than once on this day^[11].

A statement in that form should be sufficient to establish that someone is old enough to purchase alcohol, while providing assurances that the credential was not stolen or reused. The only information that is revealed is that this is a valid Hawaiian license. We’ll see below how hiding that last bit is also possible and probably a good idea.

Reuse protections

The last statement from the set of statements above provides evidence that the credential has not been shared with others. This condition, or something like it, is a necessary piece of building a zero-knowledge system. Otherwise, the same credential can be used and reused many times by multiple people.

Limiting the number of uses doesn’t guarantee that a credential isn’t shared, but it limits the number of times that it can be reused. If the credential can only be used once per day, then that is how many times the credential can be misused by someone other than the person it was issued to.

Choosing how many times a credential might be used will vary on the exact circumstances. For instance, it might not be necessary to have the same person present proof of age to an alcohol vendor multiple times per day. Maybe it would be reasonable for the store to remember them if they come back to make multiple purchases on any given day. One use per day might be reasonable on that assumption.

In practice, multiple rate limits might be used. This can make the system more flexible over short periods (to allow for people making multiple alcohol purchases in a day) but also stricter over the long term (because people rarely need to make multiple purchases every day). For example, age checks for the purchase of alcohol might combine a three per day limit with a weekly limit of seven. Multiple conditions can be easily added to the proof, with a modest cost.

It is also possible for each verifier to specify their own rate limits according to their own conditions. A single holder would then limit the use of credentials according to those limits.

Tracking usage is easy for a single holder. An actor looking to abuse credentials by sharing and reusing them has more difficulty. A bad actor would need to carefully coordinate their reuse of a credential so that any rate limits were not exceeded.

Hiding the issuer of credentials

People often do not get to choose who issues them a credential. Revealing the identity of an issuer might be more identifying than is ideal. This is especially true for people who have credentials issued by an atypical issuer.

Consider that Europe is building a union-wide system of identity. That means that verifiers will be required to accept credentials from any country in the EU. Someone accessing a service in Portugal with an Estonian credential might be unusual if most people use a Portuguese credential. Even if the presentation is limited to something like age verification, the choice of issuer becomes identifying.

This could also mean that a credential that should be valid is not recognized as such by an issuer, simply because they chose not to consider that issuer. Businesses in Greece might be required by law to recognize other EU credentials, but what about a credential issued by Türkiye?

Zero knowledge proofs can also hide the issuer, only revealing that a credential was issued by one of a set of issuers. This means that a verifier is unable to discriminate on the basis of issuer. For a system that operates at scale, that creates positive outcomes for those who hold credentials from atypical issuers.

Credential revocation

Perhaps the hardest problem in any system that involves the issuance of credentials is what to do when the credential suddenly becomes invalid. For instance, if a holder is a phone, what do you do if the phone is lost or stolen?

That is the role of revocation. On the Web, certificate authorities are required to have revocation systems to deal with lost keys, attacks, change of ownership, and a range of other problems. For wallets, the risk of loss or compromise of wallets might also be addressed with revocation.

Revocation typically involves the verifier confirming with the issuer that the credential issued to the holder (or the holder itself) has not been revoked. That produces a tweak to our original three-entity system as follows:

Issuer-holder-verifier model with an arrow looping back from verifier to issuer

Revocation is often the most operationally challenging aspect of running identity infrastructure. While issuance might have real-time components — particularly if the issuer needs to ensure a constant supply of credentials to maintain unlinkability — credentials might be issued ahead of time. However, revocation often requires a real-time response or something close to it. That makes a system with revocation much more difficult to design and operate.

Revoking full presentations

When a full credential or more substantive information is compromised, lack of revocation creates a serious impersonation risk. The inability to validate biometrics online means that a wallet might be exploited to perform identity theft or similarly serious crimes. Being able to revoke a wallet could be a necessary component of such a system.

The situation with a complete credential presentation, or presentations that include identifying information, is therefore fairly simple. When the presentation contains identifying information, like names and addresses, preventing linkability provides no benefit. So providing a direct means of revocation checking is easy.

With verifier-issuer linkability, the verifier can just directly ask the issuer whether the credential was revoked. This is not possible if there is a need to perform offline verification, but it might be possible to postpone such checks or rely on batched revocations (CRLite is a great example of a batched revocation system). Straightforward or not, providing adequate scale and availability make the implementation of a reliable revocation system a difficult task.

Revoking anonymous credentials

When you have anonymous credentials, which protect against verifier-issuer linkability, revocation is very challenging. A zero-knowledge assertion that the credential has not been revoked is theoretically possible, but there are a number of serious challenges. One issue is that proof of non-revocation depends on providing real-time or near-real-time information about the underlying credential. Research into solving the problem is still active.

It is possible that revocation for some selective disclosure cases is unnecessary. Especially those cases where zero-knowledge proofs are used. We have already accepted some baseline amount of abuse of credentials, by virtue of permitting non-identifying and unlinkable presentations. Access to a stolen credential is roughly equivalent to sharing or borrowing a credential. So, as long as the overall availability of stolen credentials is not too high relative to the availability of borrowed credentials, the value of revocation is low. In other words, if we accept some risk that credentials will be borrowed, then we can also tolerate some use of stolen credentials.

Revocation complications

Even with linkability, revocation is not entirely trivial. Revocation effectively creates a remote kill switch for every credential that exists. The safeguards around that switch are therefore crucial in determining how the system behaves.

For example, if any person can ask for revocation, that might be used to deny a person the use of a perfectly valid credential. There are well documented cases where organized crime has deprived people of access to identification documents in order to limit their ability to travel or access services.

These problems are more tied to the processes that are used, rather than the technical design. However, technical measures might be used to improve the situation. For instance, SD-BLS suggests that threshold revocation be used, where multiple actors need to agree before a credential can be revoked.

All told, and especially if dealing with revocation on the Web has taught us anything, it might not be worth the effort to add revocation. It might be easier — and no less safe — to frequently update credentials.

Authorizing Verifiers

Selective disclosure systems can fail to achieve their goals if there is a power imbalance between verifiers and holders. For instance, a verifier might withhold services unless a person agrees to provide more information than the verifier genuinely requires. That is, the verifier might effectively extort people to provide non-essential information. A system that can withhold information to improve privacy is pointless unless attempts to exercise withholding are supported.

One way to work around this is to require that verifiers be certified before they can request certain information. For instance, EU digital identity laws require that it be possible to restrict who can request a presentation. This might involve the certification of verifiers, so that verifiers would be required to provide holders with evidence that they are authorized to receive certain attributes.

A system of verifier authorization could limit overreach, but it might also render credentials ineffective in unanticipated situations, including for interactions in foreign jurisdictions.

Authorizations also need monitoring for compliance. Businesses — particularly larger businesses that engage in many activities — might gain authorization for many different purposes. Abuse might occur if a broad authorization is used where a narrower authorization is needed. That means more than a system of authorization, but creating a way to ensure that businesses or agencies are accountable for their use of credentials.

Quantum computers

Some of these systems depend on cryptography that is only classically secure. That is, a sufficiently powerful quantum computer might be able to attack the system.

Salted hash selective disclosure relies only on digital signatures and hash functions, which makes them the most resilient to attacks that use a quantum computer. However, many of the other systems described rely on some version of the discrete logarithm problem being difficult, which can make them vulnerable. Predicting when a cryptographically-relevant quantum computer might be created is as hard as any other attempt to look into the future, but we can understand some of the risks.

Quantum computers present two potential threats to any system that relies on classical cryptographic algorithms: forgery and linkability.

A sufficiently powerful quantum computer might use something like Shor’s algorithm to recover the secret key used to issue credentials. Once that key has been obtained, new credentials could be easily forged. Of course, forgeries are only a threat after the key is recovered.

Some schemes that rely on classical algorithms could be vulnerable to linking by a quantum computer, which could present a very serious privacy risk. This sort of linkability is a serious problem because it potentially affects presentations that are made before the quantum computer exists. Presentations that were saved by verifiers could later be linked.

Some of the potential mechanisms, such as the BBS algorithm, are still able to provide privacy, even if that the underlying cryptography is broken by a quantum computer. The quantum computer would be able to create forgeries, but not break privacy by linking presentations.

If we don’t need to worry about forgery until a quantum computer exists and privacy is maintained even then, we are therefore largely concerned with how long we might be able to use these systems. That gets back to the problem of predictions and balancing the cost of deploying a system against how long the system is going to remain secure. Credential systems take a long time to deploy, so — while they are not vulnerable to a future advance in the same way as encryption — planning for that future is likely necessary.

The limitations of technical solutions

If there is a single conclusion to this article is that the problems that exist in identity systems are not primarily technical. There are several very difficult problems to consider when establishing a system. Those problems only start with the selection of technology.

Any technological choice presents its own problems. Selective disclosure is a powerful tool, but with limited applicability. Properties like linkability need to be understood or managed. Otherwise, the actual privacy properties of the system might not meet expectations. The same goes for any rate limits or revocation that might be integrated.

How different actors might participate in the system needs further consideration. Decisions about who might act as an issuer in the system needs a governance structure. Otherwise, some people might be unjustly denied the ability to participate.

For verifiers, their incentives need to be examined. A selective disclosure system might be built to be flexible, which might seem to empower people with choice about what they disclose, however that might be abused by powerful verifiers to extort additional information from people.

All of which to say is: better technology does not always help as much as you might hope. Many of the problems are people problems, social problems, and governance problems, not technical problems. Technical mechanisms tend to only change the shape of non-technical problems. That is only helpful if the new shape of the problem is something that people are better able to deal with.

This is different from licensing to drive, where most countries recognize driving permits from other jurisdictions. That’s probably because buying alcohol is a simple check based on an objective measure, whereas driving a car is somewhat more involved. ↩︎
Well, most of the US. It has to do with highways. ↩︎
The issuer might want some additional assurances, like some controls over how the credential can be accessed, controls over what happens if a device is lost, stolen, or sold, but they all basically reduce to this basic idea. ↩︎
If the presentation didn’t include information about the verifier and time of use, one verifier could copy the presentation they receive and impersonate the person. ↩︎
Rainbow tables can handle relatively large numbers of values without too much difficulty. Even some of the richer fields can probably be put in a rainbow table. For example, there are about 1.4 million people in Hawaii. All the values for some fields are known, such as the complete set of possible addresses. Even if every person has a unique value, a very simple rainbow table for a field would take a few seconds to build and around 100Mb to store, likely a lot less. A century of birthdays would take much less storage^[6]. ↩︎
In practice, a century of birthdays (40k values) will have no collisions with even a short hash. You don’t need much more than 32 bits for that many values. Furthermore, if you are willing to have a small number of values associated with each hash, you can save even more space. 40k values can be indexed with a 16-bit value and a 32-bit hash will produce very few collisions. A small number of collisions are easy to resolve by hashing a few times, so maybe this could be stored in about 320kB with no real loss of utility. ↩︎
There are a few things that need care, like whether different attributes can be bound to a different wallet key and whether the attributes need to show common provenance. With different keys, the holder might mix and match attributes from different people into a single presentation. ↩︎
To continue the tortured analogy, imagine that you take a photo of the credential to present, so that the recipient can’t just scratch off the stuff that you didn’t. Or maybe you add a clear coat of enamel. ↩︎
For example, Article 5a, 16 of the EU Digital Identity Framework requires that wallets “not allow providers of electronic attestations of attributes or any other party, after the issuance of the attestation of attributes, to obtain data that allows transactions or user behaviour to be tracked, linked or correlated, or knowledge of transactions or user behaviour to be otherwise obtained, unless explicitly authorised by the user”. ↩︎
A proof can be arbitrarily complex, so this isn’t always cheap, but most of the things we imagine here are probably very manageable. ↩︎
This isn’t quite accurate. The typical approach involves the use of tokens that repeat if the credential is reused too often. That makes it possible to catch reuse, not prevent it. ↩︎