<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Low Entropy</title>
  <subtitle>Standard Considerations</subtitle>
  
  <link href="https://lowentropy.net/feed/feed.xml" rel="self"/>
  <link href="https://lowentropy.net/"/>
  <updated>2026-02-04T00:00:00Z</updated>
  <id>https://lowentropy.net/</id>
  <author>
    <name>Martin Thomson</name>
    <email>mt@lowentropy.net</email>
  </author>
  
  <entry>
    <title>Versioning JSON for APIs</title>
    <link href="https://lowentropy.net/posts/versioning-json/"/>
    <updated>2026-02-04T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/versioning-json/</id>
    <content type="html">&lt;p&gt;I see this often.
Someone comes out with new protocol.
Almost invariably, the first examples look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;version&amp;quot;: &amp;quot;1.0&amp;quot;,
  &amp;quot;some&amp;quot;: [&amp;quot;protocol&amp;quot;, &amp;quot;stuff&amp;quot;, &amp;quot;...&amp;quot;]
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I’m sorry to say that my first reaction is 🤦.&lt;/p&gt;
&lt;p&gt;It’s that version field.
Version fields like that are close to useless for versioning.&lt;/p&gt;
&lt;p&gt;This post explains why.&lt;/p&gt;
&lt;h2 id=&quot;what-can-you-do-with-a-version-field%3F&quot;&gt;What can you do with a version field?&lt;/h2&gt;
&lt;p&gt;What you concretely do with a version field
is rarely documented well in specifications for new protocols.&lt;/p&gt;
&lt;p&gt;The semantic versioning model has the recipient of the document
check that it “understands” the version
according to &lt;a href=&quot;https://semver.org/&quot;&gt;semver rules&lt;/a&gt;.
In those rules, you might have something like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Each major version will have dramatically different handling,
so any major version you aren’t prepared to handle
is an error.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Minor versions can add features,
so you might have some minimum value for minor version,
but only if there are features in that version you depend on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Patch versions are rarely signaled in protocols,
because they shouldn’t affect compatibility.
If patch version information is available,
its only real use is to work around specific implementations bugs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In general then,
the recipient of the version field checks it.
If it checks out, they proceed;
if it is not supported, they abort.&lt;/p&gt;
&lt;p&gt;Aborting is safe, but not &lt;em&gt;useful&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id=&quot;disagree-and-abort&quot;&gt;Disagree and abort&lt;/h3&gt;
&lt;p&gt;Fundamentally, these sorts of version checks are a safeguard
against a disagreement about what protocol you are talking.&lt;/p&gt;
&lt;p&gt;A disagreement about the protocol is quite bad.
Any disagreement is highly likely to lead to bugs.
There’s a good chance those are going to be security-relevant bugs.&lt;/p&gt;
&lt;p&gt;If you are especially unlucky,
things will continue to work for a lot of people.
That might hide the problem for a while.&lt;/p&gt;
&lt;p&gt;If you are managing the evolution of a protocol
having a peer abort when there is a disagreement
about which protocol is in use
is sort of the minimum-viable protection.&lt;/p&gt;
&lt;p&gt;The possibility of an abort is all that a version field like this can deliver.
It does not help you evolve the protocol.&lt;/p&gt;
&lt;h3 id=&quot;aborting-is-not-a-migration-plan&quot;&gt;Aborting is not a migration plan&lt;/h3&gt;
&lt;p&gt;Roll around to the point that you have a new version to roll out.
Version 2.0 adds a bunch of shiny, new, and incompatible features.&lt;/p&gt;
&lt;p&gt;So you tell your server to start talking the new version:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;version&amp;quot;: &amp;quot;2.0&amp;quot;,
  &amp;quot;some&amp;quot;: {&amp;quot;new&amp;quot;: &amp;quot;protocol&amp;quot;, &amp;quot;stuff&amp;quot;: [&amp;quot;...&amp;quot;]}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Congratulations, you avoided the bugs and security nightmare.
Also, all your existing clients have stopped working.&lt;/p&gt;
&lt;p&gt;That’s not a migration plan,
that’s a plan for future headaches.&lt;/p&gt;
&lt;p&gt;The only way to avoid those headaches
is to design a migration strategy into the initial version.
That means having a way to get off that initial version
onto the next version.&lt;/p&gt;
&lt;h2 id=&quot;incremental-additions-don%E2%80%99t-need-versions&quot;&gt;Incremental additions don’t need versions&lt;/h2&gt;
&lt;p&gt;So the first step is acknowledging that –
especially with JSON –
you probably already have a simple way to add features.&lt;/p&gt;
&lt;p&gt;One of the greatest JSON features
was never formalized.
It is the ability to add to objects/structs/dictionaries.
It’s rarely written down,
but the fact that most software ignores anything they don’t understand
is amazingly powerful&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/versioning-json/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;This is the best and easiest way to evolve a JSON format.
This doesn’t require any version signaling.
You don’t need to update the minor version number for new features,
you just add the new things you need.
As long as old software that ignores the new stuff continues to work,
you can add as much as you like.&lt;/p&gt;
&lt;p&gt;That approach is exactly like minor versioning.
Except that you don’t need to signal the minor version.
Except that implementations can use the presence or absence of new members
rather than the version number to decide if things are OK,
which can lead to things working more often than otherwise.&lt;/p&gt;
&lt;p&gt;Signaling of a minor version number therefore becomes utterly pointless.&lt;/p&gt;
&lt;h2 id=&quot;big-changes-don%E2%80%99t-need-versions-either&quot;&gt;Big changes don’t need versions either&lt;/h2&gt;
&lt;p&gt;Major changes that might need to be rejected by old software
are best avoided.
Still, there can come a time when incremental feature additions
have stretched the format –
and the code that handles it –
so much that you need a clean break.&lt;/p&gt;
&lt;p&gt;At that point,
you might consider leaving older software behind
and starting over with a completely redesigned format.&lt;/p&gt;
&lt;p&gt;A version field inside the format
can stop the old software from trying to use your new stuff.
Well, that assumes that old software bothers
to check that version field that was lying around unused for years.
Some won’t and that will be fun.&lt;/p&gt;
&lt;p&gt;Either way, the best case for that
is a future where you come up with increasingly complex methods
for managing how many interactions abort.&lt;/p&gt;
&lt;p&gt;A better approach is pretty straightforward:
make the version switch at a higher layer.
In many cases, the ability to switch
is already part of the systems you are using.&lt;/p&gt;
&lt;h2 id=&quot;higher-level-switches-work&quot;&gt;Higher-level switches work&lt;/h2&gt;
&lt;p&gt;In a lot of cases, putting the new format at a new URL is the best option.
It’s easy, cheap, and gives you a bunch of really interesting options
for managing the evolution of implementations and deployments.&lt;/p&gt;
&lt;p&gt;If new clients can be configured with the new URL,
that’s going to be much easier for all involved.&lt;/p&gt;
&lt;p&gt;In cases where the location of endpoints is part of protocols,
a new field can be added to include alternative URLs.
For example:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;{
  &amp;quot;url&amp;quot;: &amp;quot;https://example.com/the/old/location&amp;quot;,
  &amp;quot;urlv2&amp;quot;: &amp;quot;https://example.com/the/new/location&amp;quot;,
  &amp;quot;...&amp;quot;: {}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This moves the version migration problem
so that it uses a well-established method for adding features.
What was a breaking change for the format
is now a minor feature addition in a different part of the system.
The hard problem has transformed into an easy one.&lt;/p&gt;
&lt;p&gt;Prefer extension points that are already in use for other purposes.
Making use of fewer, well-tested extension points
is a major lesson of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9170#section-4.1&quot;&gt;RFC 9170&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;in-http&quot;&gt;In HTTP&lt;/h3&gt;
&lt;p&gt;Just for completeness,
here are some ways you can do higher-level switching with HTTP.
After all, a lot of these cases involve HTTP at some level.&lt;/p&gt;
&lt;p&gt;The high-level switching pattern can be used in HTTP header fields:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-http&quot;&gt;My-App: &amp;quot;https://example.com/the/old/location&amp;quot;
My-App-v2: &amp;quot;https://example.com/the/new/location&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or maybe:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-http&quot;&gt;My-App: url=&amp;quot;...old/location&amp;quot;, url-v2=&amp;quot;...new/location&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The same applies to anywhere you can make that switch,
but it especially applies to places
where you have easy and well-used extension points already.&lt;/p&gt;
&lt;p&gt;HTTP also offers content negotiation,
which has something of an uneven recognition by practitioners.
Still, it can be a place to use that higher-level switching practice.&lt;/p&gt;
&lt;p&gt;The advantage of content negotiation
is that you can use the same URL as before.&lt;/p&gt;
&lt;p&gt;To use content negotiation, your format is given a media type.
Your new format is given a new and different media type.
The HTTP &lt;code&gt;Accept&lt;/code&gt; header field is populated by clients
and the server chooses the format it prefers from that set.
The choice of format is conveyed using &lt;code&gt;Content-Type&lt;/code&gt; in the response.&lt;/p&gt;
&lt;p&gt;Requests can also use content negotiation,
though this costs a round trip if you guess wrong.
Switching URLs is a better way to manage migrating the formats that clients produce.&lt;/p&gt;
&lt;p&gt;Don’t be tempted to put a version attribute on the media type,
just define an entirely new one.
It’s far easier that way.
Content negotiation works best by selecting from a list;
attributes require special handling
that won’t be automatically managed by servers.
Also, attributes are often stripped or ignored,
which will cause them to fail when you need them.&lt;/p&gt;
&lt;h3 id=&quot;fun-times-with-ipv6&quot;&gt;Fun times with IPv6&lt;/h3&gt;
&lt;p&gt;The IPv6 migration is a great object lesson here.
IP uses an in-band version indicator:
the first four bits of the IP packet.&lt;/p&gt;
&lt;p&gt;The hope during IPv6 development was that this version indication would be enough.
Routers would drop IPv6 packets until they were taught IPv6.&lt;/p&gt;
&lt;p&gt;In practice, that failed.
Ethernet now has a distinct code (or EtherType) for IPv6&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/versioning-json/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.
That transformed a hard migration –
teaching routers not to choke on IPv6 packets –
for one they already managed gracefully.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The &lt;a href=&quot;https://w3ctag.github.io/design-principles/#dictionaries-for-configuration&quot;&gt;web platform design principles&lt;/a&gt;  do say something.
We recently updated this language.
“Dictionaries, because of how they are treated by user agents, are also relatively future-proof.
Dictionary members that are not understood by an implementation are ignored.
New members therefore can be added without breaking older code.” &lt;a href=&quot;https://lowentropy.net/posts/versioning-json/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I realize that I’m potentially getting higher and lower confused
when talking about higher-layers conceptually
or in networking stacks. &lt;a href=&quot;https://lowentropy.net/posts/versioning-json/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>The Hacklore Letter and Privacy</title>
    <link href="https://lowentropy.net/posts/hacklore-privacy/"/>
    <updated>2025-12-11T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/hacklore-privacy/</id>
    <content type="html">&lt;p&gt;Before I start, go and read &lt;a href=&quot;https://www.hacklore.org/letter&quot;&gt;https://www.hacklore.org/letter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When it comes to endpoint security,
unless you are operating in the &lt;a href=&quot;https://www.usenix.org/system/files/1401_08-12_mickens.pdf&quot;&gt;“Mossad”&lt;/a&gt; threat model&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;,
this is solid advice.
The letter is absolutely right that the advice we used to give people about operational security practices has not aged well.&lt;/p&gt;
&lt;p&gt;However, completely rejecting some of the defunct advice might come with privacy costs.&lt;/p&gt;
&lt;p&gt;The letter’s authors seem to have given up on online privacy, which disappoints me greatly.
Privacy nihilism isn’t really a healthy attitude and it has tainted the advice.&lt;/p&gt;
&lt;h2 id=&quot;the-good-parts&quot;&gt;The Good Parts&lt;/h2&gt;
&lt;p&gt;Let’s discharge the obviously good stuff.
Items 1 (Avoid public WiFi),
3 (Never charge devices from public USB ports),
4 (Turn off Bluetooth and NFC),
and 6 (Regularly change passwords) are all very bad advice today.&lt;/p&gt;
&lt;p&gt;The only reservations I have are minor.
The advice on USB &lt;em&gt;devices&lt;/em&gt; is true for phones and devices on the smarter end (watches, tablets, e-readers, etc…),
where this is true.
Less so for peripherals and other USB whatsits&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The updated advice on security practices is also pretty good.
Updates, multi-factor authentication, and password managers are the best security advice you can give people today&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;privacy-nihilism&quot;&gt;Privacy Nihilism&lt;/h2&gt;
&lt;p&gt;Unfortunately, privacy is a different story.
We exist in a world where – if they could – many companies would collect and analyze everything you do.&lt;/p&gt;
&lt;p&gt;In terms of the letter, item 5 (Regularly “clear cookies”) is basically pure nihilism.
The implication is that you can be tracked no matter what you do.&lt;/p&gt;
&lt;p&gt;I don’t subscribe to that perspective.
Fingerprinting &lt;em&gt;is&lt;/em&gt; pretty effective, but not as good as this implies.
Not everyone is uniquely identifiable through their fingerprint.
Also, browsers are making meaningful progress at making fingerprints less useful for many people.&lt;/p&gt;
&lt;p&gt;You do have to stop giving websites your email and phone number though.
It’s absolutely true that sites are using that.
Use temporary email addresses when you can&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;That said, I don’t clear cookies.
The resulting inconvenience is just not worth it.
There is absolutely no security advantage from purging cookies.
Instead, I recommend targeted use of private browsing modes, profiles, or &lt;a href=&quot;https://support.mozilla.org/en-US/kb/how-use-firefox-containers&quot;&gt;containers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;scanning-qr-codes-is-following-a-link&quot;&gt;Scanning QR Codes is Following a Link&lt;/h2&gt;
&lt;p&gt;Item 2 in the letter is “Never scan QR codes”.
The claim is that this is bad advice.&lt;/p&gt;
&lt;p&gt;Security-wise, this is mostly true.
Sticker attacks&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; are probably the main reason that the security situation is not perfect.
But that’s because of a more general phishing problem&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;From a pure security perspective, the letter is absolutely correct.
Opening any link in a browser is so overwhelmingly likely to be fine
that it’s not worth worrying about.
You won’t get pwned by even the most malicious link.&lt;/p&gt;
&lt;p&gt;Browser security has gotten pretty good lately.
Browsers aren’t 100% there, but you should not worry about the gap
unless you are someone who operates in that “Mossad” threat model.&lt;/p&gt;
&lt;p&gt;It’s also a bit worse if an app –
rather than your browser –
handles the link&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.
Either way, the risks to security are pretty remote.
I don’t worry about getting poisoned by the food I buy at the supermarket;
in the same way, you should not worry about following links.&lt;/p&gt;
&lt;p&gt;The phishing problem is that you really need to trust whatever provides you with a link
if you are going to enter information at the other end&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn6&quot; id=&quot;fnref6:1&quot;&gt;[6:1]&lt;/a&gt;&lt;/sup&gt;.
Otherwise, they could send you to some place that will steal your information&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;.
That is the case though, no matter where you find the link.&lt;/p&gt;
&lt;h2 id=&quot;scanning-qr-codes-is-not-great-for-privacy&quot;&gt;Scanning QR Codes is Not Great for Privacy&lt;/h2&gt;
&lt;p&gt;Privacy-wise, QR codes are not as straightforward as this makes out.
If you care about privacy, sadly the old advice holds some wisdom.&lt;/p&gt;
&lt;p&gt;The privacy risk for QR codes is related to &lt;a href=&quot;https://privacycg.github.io/nav-tracking-mitigations/&quot;&gt;navigation tracking&lt;/a&gt;.
If scanning a QR code is just following a link,
following links in any context comes with a privacy cost&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;There are small differences between links in QR codes, email&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;, or on ordinary websites, but there’s one common factor:
the site that you go to can learn everything about the place you found the link&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
and add that to your profile.&lt;/p&gt;
&lt;p&gt;Every time you follow a link you are adding to the information
that the destination website (or app) has about your activities.&lt;/p&gt;
&lt;p&gt;QR codes are generally only placed in one physical location,
so visiting the site almost always means that you are at that location.&lt;/p&gt;
&lt;p&gt;That is, unlike links you find online,
following a QR code can take information about where you are physically located
and adds that to tracking databases.&lt;/p&gt;
&lt;p&gt;Take the QR codes that restaurants use for menus and ordering.
Many restaurants outsource all the online stuff to external services.
This is fair, restaurants would probably much rather focus on making and selling food,
which is more than difficult enough.&lt;/p&gt;
&lt;p&gt;Outsourcing means that there’s a good chance that you will end up on the same site
as you visit different restaurants.
That website now has a log of the places you visited,
including details of
when you visited,
what you ate,
the size of the bill,
and whatever else the restaurant shares with them about you.
You can almost guarantee that the information they collect is for sale,
unless the terms and conditions promise otherwise&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;avoiding-qr-code-tracking&quot;&gt;Avoiding QR Code Tracking&lt;/h2&gt;
&lt;p&gt;So if you would rather not help people build profiles about you
every time you scan a QR code,
what can you do?&lt;/p&gt;
&lt;p&gt;Personally, I only open QR codes in a private browsing window.
That way, at least the tracking sites can’t use cookies to connect your QR code into a single profile.
They just get isolated visits from what might be different people.&lt;/p&gt;
&lt;p&gt;To help with that, you can maybe set your default browser to one that doesn’t keep cookies,
like &lt;a href=&quot;https://www.firefox.com/en-US/browsers/mobile/focus/&quot;&gt;Firefox Focus&lt;/a&gt;,
&lt;a href=&quot;https://duckduckgo.com/app&quot;&gt;DuckDuckGo’s Browser&lt;/a&gt;,
or any browser that you set up to not keep cookies.&lt;/p&gt;
&lt;p&gt;Products could be better in this regard.
As far as I’m aware, you can’t set a different browser for QR codes on most devices&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;.
For my sins, I use an iPhone&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt;.
&lt;a href=&quot;https://www.firefox.com/en-US/browsers/mobile/ios/&quot;&gt;Firefox iOS&lt;/a&gt; used to have a QR code scanning button,
which made it easy to switch to private browsing and open those links in a cookie- and tracking-free tab.
A recent change made scanning QR codes much more annoying&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn16&quot; id=&quot;fnref16&quot;&gt;[16]&lt;/a&gt;&lt;/sup&gt;,
so I’m still looking for a better option there.&lt;/p&gt;
&lt;p&gt;In the end, it’s easy to see why the authors of the letter have adopted a nihilistic attitude toward privacy.
Personally, I don’t accept that outcome, even if it means a little more work on my part.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;If you are, you know already. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Those devices can be vulnerable in ways your phone isn’t.
Some will allow firmware to be updated by anything they attach to.
That means they will become a risk to any machine that they are subsequently plugged in to. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I will take the opportunity to quibble about the way they present their advice on passphrases.
My advice is to let your password manager suggest a high entropy password
and only use passwords for those things that separate you from your password manager.
That’s usually just operating system login and unlocking the password manager.
Given how few of these passwords are likely needed,
suggesting passphrases over strong passwords seems largely academic.
The usability difference between a passphrase and a strong password is tiny;
the passphrase might be more memorable, but the password might be quicker to type. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://relay.firefox.com/&quot;&gt;Firefox Relay&lt;/a&gt;,
&lt;a href=&quot;https://support.apple.com/en-au/guide/icloud/mm9d9012c9e8/icloud&quot;&gt;iCloud Hide My Email&lt;/a&gt;,
and &lt;a href=&quot;https://www.fastmail.com/blog/10-things-only-privacy-conscious-people-know-about-email-aliases/&quot;&gt;Fastmail Email Aliases&lt;/a&gt;
are examples I’m aware of, but many mail providers have similar features. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is where an original QR code is covered with a sticker directing someone to a different site.
A QR code on a parking meter for payments is a great example.
An attacker can collect parking payments – at inflated prices – for a while before the attack is noticed. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref5:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The golden rule of the web is:
If you are going to enter information into a site,
especially when money is involved,
type its address in to get to the site&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn8&quot; id=&quot;fnref8:1&quot;&gt;[8:1]&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref6:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref6:2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Links can also target any app that registers interest in handling URIs.
A little more so on phones than desktop computers.
Apps generally aren’t as well hardened against attack as browsers,
but they are also generally easier to defend,
because they have less functionality.
The best advice I can give there is to be careful about what apps you install.
I liken visiting a web site as a casual encounter, installing an app is much more personal.
Either way, the extent to which you are exposed to infection increases with intimacy. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Passwords especially.
You should never type passwords into a website.
That is what a password manager is for.
You should only type passwords to get to your password manager. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref8:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, this is a straight cost, not a risk.
There’s no probability involved. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is a very different reason not to click on links in email&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;.
A scammer might attempt to convince you that they are someone you trust and get you to send them something you might regret.
Like your banking password or money.
This is much like the QR code sticker attack&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn5&quot; id=&quot;fnref5:1&quot;&gt;[5:1]&lt;/a&gt;&lt;/sup&gt;,
except that the attacker only has to send you mail that passes mail filters and looks legit. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;On the web, the place that shows you a link also learns that you clicked it.
This is not true for email and QR codes, but that makes very little difference privacy-wise. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Clicking on a link in email isn’t always a bad idea.
Clicking the link lets the site know that you received their message.
That’s the whole point of emails asking you to confirm that you own an email address,
so go ahead and click those.
Just make sure to close the tab immediately.
At least before you put any other information into the site&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fn6&quot; id=&quot;fnref6:2&quot;&gt;[6:2]&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Not like you could have read terms and conditions before scanning the QR code.
Or that anyone has time to read them. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I’d love to know if there are any operating systems that let you set a different app for QR code links,
that seems like it would be a useful feature. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The 13 mini is still the only phone in a reasonable form factor that is still relatively current.
All other phones are too big.
It’s a shame that most web experiences a) run on Safari and b) awful.
The latter being the fault of sites, not so much the device. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn16&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;OK, here goes: Unlock your phone, go to the home screen.
Open Firefox, go to the tabs view, hit the Private option, open a new tab.
Switch to the camera, scan the code, tab the option to open the link.
You need to open the tab, because Firefox will use the browsing mode that was last used. &lt;a href=&quot;https://lowentropy.net/posts/hacklore-privacy/#fnref16&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Status Update: AI Preferences</title>
    <link href="https://lowentropy.net/posts/aipref-update/"/>
    <updated>2025-10-30T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/aipref-update/</id>
    <content type="html">&lt;p&gt;AI preferences exist to give those who create and distribute content online a way to express how they would like to see that content used.&lt;/p&gt;
&lt;p&gt;This post is a detailed technical update on the present state of the work, focusing on the recent developments. I make an attempt to explore some of the issues that arise from those recent discussions.&lt;/p&gt;
&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
&lt;p&gt;This is for those who weren’t following along, others can &lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#req&quot;&gt;skip to the substance&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Websites are created for many reasons, but many sites have the same motivations: to reach an audience of humans. This might be to inform, entertain, or to provide services. In many cases, gaining the attention of an audience is a complementary goal, with that attention being used to show advertisements. Those advertisements can be a significant source of revenue for site operators, supporting their ability to continue to provide other services.&lt;/p&gt;
&lt;p&gt;Automated systems, or bots, have long been a part of the web. Web crawlers in particular are those bots that seek to explore the entirety of the web. This is done for many reasons, including archival and research, but a particularly important use of crawling is in the development of web search engines.&lt;/p&gt;
&lt;p&gt;Web search engines operate an important class of crawler. These crawlers are part of providing a valuable service to sites, as the search engines they support drive traffic – and attention – to those sites. In exchange, search engines receive their own highly valuable form of attention. Search advertising is a highly lucrative business.&lt;/p&gt;
&lt;h3 id=&quot;enter-artificial-intelligence&quot;&gt;Enter Artificial Intelligence&lt;/h3&gt;
&lt;p&gt;AI has disrupted this equilibrium significantly for two reasons.&lt;/p&gt;
&lt;p&gt;AI products have displaced some of the functions of search. If the purpose of a given search is to answer a question, a chatbot or AI-generated answer is far more convenient than search. The effect of this has been a reduction in attention for certain classes of sites.&lt;/p&gt;
&lt;p&gt;The second reason is that the sites that are the source of the knowledge that AI uses need to be crawled to gain that information. This has caused a large increase in the volume of queries from AI, both to train models and to provide grounding in the use of those models. The significant increase in costs for site operators in answering requests does not result in the valuable human attention that might have been the reason for deploying a website in the first place.&lt;/p&gt;
&lt;p&gt;In other words, AI can provide a substitute for the work of websites, depriving them of support, while also adding operational costs.&lt;/p&gt;
&lt;p&gt;This is far from being a wholly bad outcome. There are good reasons for this change: AI can provide a vastly superior experience. Accurate and specific answers to questions are just one of the potential improvements. High quality sites will continue to engage audiences in meaningful ways. And many sites exist for reasons other than to attract attention.&lt;/p&gt;
&lt;p&gt;Nonetheless, change is disruptive and there are risks that need to be managed. AI is potentially subject to the biases of its creators, which can distort messages, intentionally or accidentally. AI hallucination can mean that people or organizations can be misrepresented, propagating misinformation in ways that can be hard to trace. And the potential for AI to act as a substitute for the work of human writers and artists is a serious one.&lt;/p&gt;
&lt;h3 id=&quot;existing-tools&quot;&gt;Existing Tools&lt;/h3&gt;
&lt;p&gt;Presently, there are two tools that site operators can deploy to influence how AI interacts with the content on their sites: access control and &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9309&quot;&gt;&lt;code&gt;robots.txt&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Access control can outright block requests from crawlers. Well-behaved crawlers make requests from IP address ranges that their operators publish and use distinctive identifiers in HTTP &lt;code&gt;User-Agent&lt;/code&gt; headers. Requests that are identified in either way can be blocked. Even a malicious crawler would struggle to use IP addresses for long without being identified and blocked.&lt;/p&gt;
&lt;p&gt;Most web crawler operators also respect the &lt;code&gt;robots.txt&lt;/code&gt; file. Site operators can publish a &lt;code&gt;robots.txt&lt;/code&gt; file that describes which parts of their site are off-limits. This is little more than a polite request to crawlers, rather than an access control mechanism. A site deploying &lt;code&gt;robots.txt&lt;/code&gt; depends somewhat more on trusting the AI crawler to act responsibly.&lt;/p&gt;
&lt;p&gt;Most crawlers do respect these requests in &lt;code&gt;robots.txt&lt;/code&gt;. The risk to the reputation of an AI company that did not respect &lt;code&gt;robots.txt&lt;/code&gt; seems sufficient to ensure compliance. After all, other AI companies would not look favorably on a competitor who gave sites cause to deploy access control instead, because they might be next.&lt;/p&gt;
&lt;p&gt;However, these are both crude tools. By convention, crawlers use their crawler name&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; to identify the purpose of their crawling. If a company crawls for several reasons, they need to crawl multiple times, each time with a different crawler name. Sites can then use &lt;code&gt;robots.txt&lt;/code&gt; to target the specific crawlers they approve of. Effective use of &lt;code&gt;robots.txt&lt;/code&gt; then requires that sites are able to identify the many different crawlers that exist, understand the purpose of each, before they can set the scope that each can crawl.&lt;/p&gt;
&lt;p&gt;New crawlers, which are added all the time, only get default instructions until the site operator learns about them. This can overly constrain use for novel purposes by restricting more than necessary. Alternatively, setting more permissive defaults could endorse uses that the site might not intend.&lt;/p&gt;
&lt;h3 id=&quot;how-ai-preferences-might-help&quot;&gt;How AI Preferences Might Help&lt;/h3&gt;
&lt;p&gt;The AI preferences work aims to address this by giving site operators a means to directly express preferences about how the content they serve might be used.&lt;/p&gt;
&lt;p&gt;AI preferences are a collection of statements about different categories of use that are associated with a given piece of content (or asset). Sites can express a positive preference that indicates permission to use the content for that purpose, they can express a negative preference that requests that the usage not be applied to that content, or make no information about their preferences known.&lt;/p&gt;
&lt;p&gt;Sites &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-aipref-attach-04&quot;&gt;associate AI preferences with content&lt;/a&gt; using content metadata, HTTP headers, or their &lt;code&gt;robots.txt&lt;/code&gt; file.&lt;/p&gt;
&lt;h4 id=&quot;standard-terms&quot;&gt;Standard Terms&lt;/h4&gt;
&lt;p&gt;A set of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-aipref-vocab-04&quot;&gt;standard definitions for content use&lt;/a&gt; ensures that site operators do not need a comprehensive understanding of what every web crawler is used for. Preferences can be expressed in common terms that crawlers – both existing and newly introduced – can understand.&lt;/p&gt;
&lt;p&gt;Companies that crawl the web for multiple reasons can unify their crawling. Content can be acquired once and different applications can limit their use to the content that has compatible preferences.&lt;/p&gt;
&lt;p&gt;The cost of this is that terms will draw from a limited vocabulary of terms. A limited vocabulary is an asset in terms of making the terms understandable to those who will express their preferences. Applications are also better able to write software that respects preferences.&lt;/p&gt;
&lt;p&gt;Importantly, common definitions make it more likely that different entities can agree on what each term means, even as new applications are developed.&lt;/p&gt;
&lt;p&gt;Around a common vocabulary, the goal is to allow for preferences to be expressed in several ways. That way, the preferences mean the same thing, no matter how they are expressed. That ensures that content metadata can use forms of expression that fit the metadata idioms of a format, but ensure that the meaning is consistent with the core vocabulary.&lt;/p&gt;
&lt;h4 id=&quot;just-preferences&quot;&gt;Just Preferences&lt;/h4&gt;
&lt;p&gt;A key aspect of the design of AI preferences is their discretionary nature. Preferences are not an access control system. They have no means to force crawlers into compliance. Just like &lt;code&gt;robots.txt&lt;/code&gt;, AI preferences rely on AI crawlers choosing to respect their requests.&lt;/p&gt;
&lt;p&gt;In choosing a design that includes preferences the work recognizes that when sites express preferences they cannot perfectly anticipate the conditions where content might be used. Sometimes there are overriding interests that will mean that the right choice is to ignore the preference that was expressed. In developing the standard, a diverse set of reasons that might be cause to ignore preferences were discussed, including accommodating accessibility needs, public interest, research, identifying illegal or abusive content, archival, and more.&lt;/p&gt;
&lt;p&gt;In choosing to use a design that expresses preferences, rather than a stricter prohibition, sites do rely more on trust than is even necessary with &lt;code&gt;robots.txt&lt;/code&gt;. With &lt;code&gt;robots.txt&lt;/code&gt;, a crawler might be caught out when it requests content after it was asked not to. In comparison, AI preferences say nothing about whether content is fetched or not, only about how it might be used, once obtained.&lt;/p&gt;
&lt;p&gt;Site operators therefore have no obvious way of checking that their preferences are respected. For AI applications, models can leak their inputs, but this is something that AI developers seek to avoid, not to avoid getting caught out this way, but more to ensure that their models are capable and flexible. That means that using AI preferences is also an expression of trust in those who seek to use content. Sites are trusting AI companies to respect their preferences and to exercise good judgment in determining whether to override those preferences.&lt;/p&gt;
&lt;p&gt;Finally, choosing preferences also acknowledges that there is no widely agreed legal protection involved. Different jurisdictions are working through the implications of AI on their copyright laws, but we cannot predict the outcome of those processes. It’s possible that some laws will have something to say about how expressions of preference need to be treated, but nothing is settled.&lt;/p&gt;
&lt;p&gt;&lt;a name=&quot;req&quot;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-site-operators-(seem-to)-want&quot;&gt;What Site Operators (Seem To) Want&lt;/h2&gt;
&lt;p&gt;Though we have no formally agreed requirements, there has been a lot of discussion about these.  A recap is probably in order.&lt;/p&gt;
&lt;aside&gt;
Note that I&#39;m going to talk about *websites* as the primary source of these requirements. This is because our primary focus is on mechanisms that involve the acquisition of content from sites. You should be able to replace “site” with other labels. Those familiar with copyright law should probably be a little cautious about the use of “rightsholder” though; none of the things we&#39;re building are able to establish that the entity expressing preferences holds any rights.
&lt;/aside&gt;
&lt;p&gt;Sites are not all the same in their goals with respect to preferences. However, several key themes have emerged through the process of developing the standard.&lt;/p&gt;
&lt;h3 id=&quot;preferences-about-all-uses&quot;&gt;Preferences About All Uses&lt;/h3&gt;
&lt;p&gt;A blanket approval or disapproval preference is contested, but a number of parties have indicated that it would be a useful, albeit insufficient, requirement.&lt;/p&gt;
&lt;p&gt;The most common criticism of this is that is “too broad” or an imprecise instrument. That’s a poor argument: we cannot tell someone that they cannot hold this preference. Moreover, this is a preference that is trivial to define.&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;But there is a difficult question to answer at the heart of this objection: saying “no” unconditionally precludes unexpected and unforeseen uses. That’s a concern that cuts two ways.&lt;/p&gt;
&lt;p&gt;AI was an unanticipated use for content before it became wildly successful. Developing foundational AI models would have been harder to achieve if a system of preferences existed that constrained the use of content for AI.&lt;/p&gt;
&lt;p&gt;If you regard AI as a good thing, the potential for a lot of content to be withheld from models as a result of a broad preference could be bad. While there is a lot of content that would remain available, including public domain content, having some content excluded might reduce the quality of models.&lt;/p&gt;
&lt;p&gt;Others might view the use of content by AI companies as extractive, taking content without compensation to those who created it. More so because it might be seen to enable the creation of work that competes in the market for the content that was taken. Had there been a way to express preferences about this unexpected use, perhaps that could be a starting point from which to negotiate for fair compensation.&lt;/p&gt;
&lt;p&gt;For those who might outright oppose AI, it would be unrealistic to expect that preferences could have prevented or significantly delayed the creation of AI. There is simply too much content available for training.&lt;/p&gt;
&lt;p&gt;The question of whether to include the option of stating a preference about all conceivable uses remains contested. Perhaps we need to accept that technical standards are not well-suited to addressing this conflict and leave the resolution to legislative or regulatory bodies. That might argue for including a broad category of use, leaving it to the law to determine how preferences about that category might take effect.&lt;/p&gt;
&lt;h3 id=&quot;model-training&quot;&gt;Model Training&lt;/h3&gt;
&lt;p&gt;The ability of AI models to reproduce their inputs, in whole or part, is a major concern for sites, artists, writers, and others.&lt;/p&gt;
&lt;p&gt;One concern is direct reproduction of content by models. This could directly violate copyright, which is another reason that AI companies seek to avoid having this happen.&lt;/p&gt;
&lt;p&gt;There are also aspects of works that are not protected by copyright, such as style. Seeking to protect style from reproduction might motivate the withholding of content from training.&lt;/p&gt;
&lt;p&gt;Finally, there is a view that AI companies profit from the models they create using content that was published to the web for human consumption. Some actors might withhold content in the hopes that those performing model training might be willing to pay for its use.&lt;/p&gt;
&lt;p&gt;To that end, a common request is to have a way to request that work not be included in model training.&lt;/p&gt;
&lt;h3 id=&quot;appearing-in-search-only&quot;&gt;Appearing in Search Only&lt;/h3&gt;
&lt;p&gt;The most common request is that sites be able to express a preference not to be used for model training or to have their content otherwise processed by automated systems, with the exception of search engines. Having content be discoverable through search applications is valued by many site operators.&lt;/p&gt;
&lt;p&gt;In expressing a preference for only search, site operators seem to have two major concerns:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The substitutive effect of things like AI generated overviews and content, whether that be part of search applications, chatbots, or other modalities. For sites that depend on attention, their concern is that generated answers to questions cause a reduction in people visiting their site to seek answers. For artists, they might seek to avoid the generation of content that acts as a substitute for their own creative efforts.&lt;/li&gt;
&lt;li&gt;The reputation risk that comes from misrepresentation, either due to misinformation in model training sets affecting outputs or due to the propensity of AI systems to hallucinate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For the first, the question regarding any substitutive effect is a partly legal question. Of course, the scope of copyright law is definitely not a question for technical standards to resolve. The effort to define preferences for AI usage cannot answer that question; at best, a standard can only provide regulators with more options.&lt;/p&gt;
&lt;p&gt;The real question for standards is whether this is a coherent preference to express. In answering that question, the challenge is to construct a definition that is clear, comprehensible, and implementable.&lt;/p&gt;
&lt;p&gt;Making something implementable turns out to be especially difficult, because search engines use AI in multiple ways. While the details are intentionally obscure&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;, the operation of a search product involves both training and use of models. If content is excluded from any aspect of processing – training, inference, or otherwise – search providers could not guarantee that the content would be appropriately ranked.&lt;/p&gt;
&lt;p&gt;In discussion, it was also pointed out that there is no meaningful distinction between what people think of as “traditional” search and chatbots. These applications exist on a continuum and we can expect points of difference to continue to be less distinct as providers experiment with new features.&lt;/p&gt;
&lt;p&gt;That complication rules out simple definitions that focus on either the training of a model (that is, the process of producing model weights) or the use of trained models (or what is called inference) as part of a system.&lt;/p&gt;
&lt;h2 id=&quot;foundation-model-production-preference&quot;&gt;Foundation Model Production Preference&lt;/h2&gt;
&lt;p&gt;This change is a relatively simple change. The idea is to collapse the two existing model training categories into one. That one would only address the creation of a foundation model.&lt;/p&gt;
&lt;p&gt;Like the other categories being proposed, the goal is to focus on outputs. The output in this case is a foundation model.&lt;/p&gt;
&lt;p&gt;Aside from a shift in emphasis, this change addresses some definitional challenges that were identified with the existing AI training category. It has become clear that there is no crisp delineation between what might be regarded as simple statistical techniques – such as logistic regression or ordinary least squares – and AI. Even questions of scale are not helpful when you consider that “classic” statistical models can still be enormous, such as the meteorological models used to predict weather.&lt;/p&gt;
&lt;p&gt;In comparison, there are many well-established definitions for foundation models that all broadly agree. Producing a category of use where the output is a foundation model – or fine-tuned foundation model – seems like an approach that could get support.&lt;/p&gt;
&lt;p&gt;There’s an open question about whether various fine-tuning techniques are included in the definition. One interpretation says that the output would have to be a foundation model, because that is what has the general purpose capabilities. A fine-tuned model might be specialized to a single purpose.&lt;/p&gt;
&lt;p&gt;The proposed definition includes fine tuning, mostly because a fine-tuned model is likely to inherit most of the capabilities of the tuned foundation model. Also, this expansion to the definition closes a loophole, where applying a small amount of fine tuning could be used to avoid an obligation to respect a negative preference.&lt;/p&gt;
&lt;p&gt;This still leaves several questions unresolved, such as the question about whether techniques like low rate adaptation (LoRA), which don’t alter the parameters of a foundation model, would – or should – fit this definition.&lt;/p&gt;
&lt;h2 id=&quot;addressing-the-search-preference&quot;&gt;Addressing the Search Preference&lt;/h2&gt;
&lt;p&gt;If the goal is to enable more granular expressions of preference, such that different entities can either express a preference to either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;exclude all uses except search; or&lt;/li&gt;
&lt;li&gt;exclude model training but allow search.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In doing so, we need to account for the fact that search applications use AI.&lt;/p&gt;
&lt;p&gt;This effort might consider other statements of preference, but these are the two primary use cases to address.&lt;/p&gt;
&lt;h3 id=&quot;the-original-search-definition&quot;&gt;The Original Search Definition&lt;/h3&gt;
&lt;p&gt;Rather than approach the problem from a procedural perspective, the first attempt at a definition for “search” looked at outcomes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Using one or more assets in a search application that directs users to the location from which the assets were retrieved.&lt;/p&gt;
&lt;p&gt;Search applications can be complex and may serve multiple purposes. Only those parts of applications that direct users to the location of an asset are included in this category of use. This includes the use of titles or excerpts from assets that are used to help users select between multiple candidate options.&lt;/p&gt;
&lt;p&gt;Preferences for the Search category apply to those parts of applications that provide search capabilities, regardless of what other preferences are stated.&lt;/p&gt;
&lt;p&gt;Parts of applications that do not direct users to the location of assets, such as summaries, are not covered by this category of use.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The strength of this definition is that it says nothing about how the identified outcomes are achieved. However, there are several shortcomings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What it means to “direct users” to a location is unclear. The fairly obvious question is whether providing a link suffices or, if not, what would be sufficient to meet this condition.&lt;/li&gt;
&lt;li&gt;There are also questions about the practicality of dividing search applications in the imagined manner. To some extent, the content presented in search applications could be linked together in non-obvious ways. For instance, content in AI overviews might be generated based on the content of pages that are linked from other parts of the page.&lt;/li&gt;
&lt;li&gt;This category was defined to be a subset of an “AI Use” category, which excluded model training. What we heard from search providers is that this does not reflect common practice in search applications, where model training is an integral part of the application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This might not be a complete enumeration of the reservations. These specific concerns might  be overcome with more work. In practice, it became clear from discussions that an alternative approach was more appealing to a number of participants.&lt;/p&gt;
&lt;h3 id=&quot;ai-output&quot;&gt;AI Output&lt;/h3&gt;
&lt;p&gt;That approach is based on &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-madhavan-aipref-displaybasedpref-01&quot;&gt;a proposal&lt;/a&gt; from Microsoft that suggested a focus on “display-based” preferences. This proposal suggested that generative AI training remain in the vocabulary, but the bulk of the proposal was focused on system outputs.&lt;/p&gt;
&lt;p&gt;This proposal identified a number of controls that existing search engines respect with respect to how content is handled in the presentation of search results. These are largely just &lt;a href=&quot;https://www.bing.com/webmasters/help/which-robots-metatags-does-bing-support-5198d240&quot;&gt;existing preference signals&lt;/a&gt; specific to search, but the proposal introduced some new ideas.&lt;/p&gt;
&lt;p&gt;These ideas aren’t intended to be limited to search applications; search is just where the concepts originate. The goal is to apply the concepts to any AI application; or, if we just view AI as a new way of building software, any system that performs computation.&lt;/p&gt;
&lt;p&gt;The key idea behind this preference is to avoid constraining use that is not observable outside of the system. These preferences do not care how the system is implemented; they only relate to what the system produces in its output.&lt;/p&gt;
&lt;p&gt;If the preferences do not say anything about the internal processing that systems perform, search applications could perform ranking on content that won’t be displayed.  AI applications would be able to process content internally as long as the content is not used in producing outputs.&lt;/p&gt;
&lt;p&gt;What that means precisely is an important question that is discussed below, but the general shape is approximately: if the preference is to allow this new category of use, then the system can use the content in its output; if the preference is to disallow this usage, the content isn’t used in producing outputs.&lt;/p&gt;
&lt;h4 id=&quot;necessary-training&quot;&gt;Necessary Training&lt;/h4&gt;
&lt;p&gt;A key component of this approach is that it explicitly allows model training. This recognizes that the training of models is an integral part of the operation of this class of application. In effect, the preference would not distinguish between training a model, using a trained model, or even non-AI uses of content.&lt;/p&gt;
&lt;h4 id=&quot;exact-text-match-or-%E2%80%9Csearch%E2%80%9D&quot;&gt;Exact Text Match or “Search”&lt;/h4&gt;
&lt;p&gt;The concept of “exact text match” is the main concession to “traditional” search in the proposal. When that option is chosen, either excerpts of content are presented in outputs or the content is not presented at all.&lt;/p&gt;
&lt;p&gt;Microsoft’s proposal for “exact text match” included an extra stipulation: in addition to the usage being limited to excerpts from the content, the output needed to include a link back to the content.&lt;/p&gt;
&lt;p&gt;The combination of verbatim excerpts and links is intended to reproduce the concept of “traditional” search, while allowing all of the internal processing that search applications&lt;/p&gt;
&lt;p&gt;Without this condition, a preference to allow AI output would result in content being used by any AI system to produce outputs. This would include everything from chatbots to search. This extra condition ensures that output does not reinterpret content, but presents it verbatim.&lt;/p&gt;
&lt;h3 id=&quot;challenges-to-resolve-with-ai-output&quot;&gt;Challenges to Resolve with AI Output&lt;/h3&gt;
&lt;p&gt;As with any change, the positives that motivate the change come with a number of new problems to resolve.&lt;/p&gt;
&lt;p&gt;My goal here is to identify the issues that are central to disagreements. That is, the problems we need to address in order to make progress. No doubt there are many other issues that I haven’t identified.&lt;/p&gt;
&lt;h4 id=&quot;%E2%80%9Csearch%E2%80%9D-category-naming&quot;&gt;“Search” Category Naming&lt;/h4&gt;
&lt;p&gt;In the current proposed vocabulary, the “exact text match” category has been tentatively labeled “search”.  This is because it is intended to address this very narrow concept of “traditional” search where the output of the system is limited to links and context.&lt;/p&gt;
&lt;p&gt;Having labels that are widely understood is a very important goal. However, that also requires that the label is a good match to the thing it applies to.&lt;/p&gt;
&lt;p&gt;This choice of label has received some criticism, which might be based on this misrepresenting the value that modern search products provide. In practice, those services we think of as providing search have evolved to provide so much more than this basic means of discovering content.&lt;/p&gt;
&lt;p&gt;Resolving this means dealing with this tension.&lt;/p&gt;
&lt;h4 id=&quot;category-nesting-and-spelling&quot;&gt;Category Nesting and Spelling&lt;/h4&gt;
&lt;p&gt;The “exact text match” category is presently defined as a subcategory of a more general AI output category. This makes sense in that the behavior described is a strict subset.&lt;/p&gt;
&lt;p&gt;This differs subtly from some of the nesting in other parts of the vocabulary. As a result, this nesting might not be the best technical fit.&lt;/p&gt;
&lt;p&gt;Consider the relationship between an overarching category and a foundation model training category. It is clear that there are uses within the broader category that are not foundation model training, such that it makes sense to express preferences for in any of the four possible combinations.&lt;/p&gt;
&lt;p&gt;For AI output, allowing the broader category leaves no part of the subcategory excluded.  It almost makes no sense to indicate a preference to allow AI output while disallowing the narrower category. The result is almost nonsensical: would it be forbidden to link to content? or, would this require reinterpretation rather than including snippets?&lt;/p&gt;
&lt;p&gt;This is the only nonsensical combination of preferences that might arise from this arrangement. This could be managed by saying that the exclusion has no effect.&lt;/p&gt;
&lt;p&gt;We might instead seek a different way of spelling these preferences. For instance, the first design discussed for this included placing conditions on a preference to allow AI output. Two conditions could be attached: “link” and “quote”, each representing a distinct restriction on use. This approach is moderately more complex in its design&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;, but the option remains.&lt;/p&gt;
&lt;p&gt;Questions of spelling are normally relatively easy to resolve. The challenge here is in determining whether disagreements on the superficial matter mask real disagreements in principle.&lt;/p&gt;
&lt;h4 id=&quot;isolating-outputs&quot;&gt;Isolating Outputs&lt;/h4&gt;
&lt;p&gt;The core advantage of the proposal is that it doesn’t matter how the internals of the system are implemented, but it is still necessary to know that it is possible to respect preferences in practice.&lt;/p&gt;
&lt;p&gt;If the preference only applies to output, an important question arises: how separable is any internal processing from the production of output? That is, if a preference requests linking and excerpts, how does a system that uses content for internal processing ensure that the content does not affect outputs, except as requested?&lt;/p&gt;
&lt;p&gt;For a search application that seeks to produce a list of links, it might be possible to keep the process of ranking items separate from any presentation of those items. It seems reasonable to assume that content with preferences to disallow AI output could be presented using non-AI methods.&lt;/p&gt;
&lt;p&gt;But how would a chat bot ensure similar isolation? Even if separate models are used for internal processing and output, what mechanism would ensure that the output stage is isolated from the internal stage? The internal processing stage needs to communicate its conclusions and instructions to the output stage, which risks carrying information about the content it processes.&lt;/p&gt;
&lt;p&gt;The same applies whether the internal model is trained on the affected content, or whether that content is only provided as a reference at inference time.&lt;/p&gt;
&lt;p&gt;An alternative interpretation is that this internal processing and training is only permitted if either category – AI output or search – is permitted.  This is a much narrower interpretation, but it avoids questions about isolation.&lt;/p&gt;
&lt;p&gt;The answer to this question was not clear in Microsoft’s proposal. It is perhaps the most important question to resolve with respect to this proposed path.&lt;/p&gt;
&lt;h4 id=&quot;model-separation&quot;&gt;Model Separation&lt;/h4&gt;
&lt;p&gt;One potential consequence of allowing model training as part of an AI output category is that models will be trained (or fine-tuned) on content that might otherwise have an associated preference to disallow the production of models.&lt;/p&gt;
&lt;p&gt;The consequence is that the models produced cannot be made available for other purposes if the content they use does not permit the production of models. This seems like it is manageable for applications that are deployed today, where the internals of their operation is often closed. However, the impact on more open systems is unclear.&lt;/p&gt;
&lt;p&gt;For instance, if multiple actors cooperate to deliver a service, do these output-based preferences apply at the boundary of each system? or do they only apply to the overall system? how might preferences need to be propagated in either case?&lt;/p&gt;
&lt;h4 id=&quot;the-role-of-human-users&quot;&gt;The Role of Human Users&lt;/h4&gt;
&lt;p&gt;This question of system boundaries is most relevant when asking about whether the definition of “output” is in relation to what is presented to a human user. We need to determine whether preferences can only relate to the output of a given system, rather than require human interaction. In one model, the responsibility for respecting a preference ends at the boundary of the system that the preference-respecting entity controls. To include humans in a definition is more challenging.&lt;/p&gt;
&lt;p&gt;Having definitions that seek to apply only at the point that something is presented to a human is appealing. However, that depends on having some expectation of humanity, where identifying clients as human is increasingly a contested problem. In addition to bots, agentic browsing will fundamentally change how sites are interacted with. This is all directly on behalf of users, but with the possibility that humans are not responsible for inputs and do not see outputs.&lt;/p&gt;
&lt;p&gt;Is it sufficient to present outputs on an expectation that the recipients are human, even when you know that is not assured? Similarly, is it reasonable to allow sites to assume that their inputs come from human users?&lt;/p&gt;
&lt;h2 id=&quot;no-easy-path-to-success&quot;&gt;No Easy Path to Success&lt;/h2&gt;
&lt;p&gt;Given the discussion thus far, it is clear that there is a lot more discussion needed before we can declare success. What encourages me is that there seems to be active engagement on the central problem: finding ways to address the expressed requirements on preferences in a way that can be reliably implemented in AI systems.&lt;/p&gt;
&lt;p&gt;That is made harder by our insistence on reaching consensus. Discussions thus far have shown that there are gaps in understanding, communication, and trust to be crossed. From here, we try to reach an understanding on the principles, then build on that to develop a solution.&lt;/p&gt;
&lt;p&gt;Success could provide people seeking to balance competing interests with more options. Though there’s no guarantee that this works out, it is worth the effort.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to Paul Keller and Mark Nottingham for their feedback on drafts of this post.&lt;/em&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is not the thing that is sent in the HTTP &lt;code&gt;User-Agent&lt;/code&gt; header, but the one that the bot self-identifies with and the one that is listed in &lt;code&gt;robots.txt&lt;/code&gt;. The inconsistency is maddening, but like many things on the internet, it’s not worth getting too worked up about, because it isn’t going to change. &lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;We currently haven’t discussed the principles that might apply to defining categories of use, but this response suggests some basic guidelines: 1. Is there a reason to express a preference? 2. Can the principle be defined? 3. Can the distinction be made in real systems?  (or: can it be implemented?) &lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is in part so that it isn’t trivial to attack the system to affect search rankings. &lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;In addition to defining how to manage parameters, we’d have to resolve what it means to attach conditions to a preference to disallow AI output. &lt;a href=&quot;https://lowentropy.net/posts/aipref-update/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Expanding what HTTPS means</title>
    <link href="https://lowentropy.net/posts/local-https/"/>
    <updated>2024-12-24T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/local-https/</id>
    <content type="html">&lt;p&gt;So you have a device, maybe IoT, or just something that sits in a home somewhere.
You want to be able to talk to it with HTTPS.&lt;/p&gt;
&lt;p&gt;Recall &lt;a href=&quot;https://en.wikipedia.org/wiki/Zooko&#39;s_triangle&quot;&gt;Zooko’s “meaningful, unique, decentralized” naming trichotomy&lt;/a&gt;.
HTTPS chooses to drop “decentralized”,
relying on DNS as central control.&lt;/p&gt;
&lt;p&gt;In effect, HTTPS follows a pretty narrow definition.
To offer a server that works,
you need to offer a &lt;acryonym title=&quot;Transport Layer Security&quot;&gt;TLS endpoint
that has a certificate that meets
a pretty &lt;a href=&quot;https://cabforum.org/working-groups/server/baseline-requirements/documents/&quot;&gt;extensive set of requirements&lt;/a&gt;.
To get that certificate,
you need a name that is uniquely yours,
according to the DNS&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/acryonym&gt;&lt;/p&gt;
&lt;h2 id=&quot;unique-names&quot;&gt;Unique names&lt;/h2&gt;
&lt;p&gt;It is entirely possible to assign unique names to devices.
There’s an awful lot of IoT thingamabobs out there,
but there are far more names we could ever use.
Allocation can even be somewhat decentralized
by having manufacturers manage the assignment&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The problem with unique names for IoT devices
is that they are probably not going to be memorable (thanks Zooko).
I don’t know about you,
but &lt;code&gt;printer.&amp;lt;somehash&amp;gt;.service-provider-cloud.example&lt;/code&gt; isn’t exactly convenient.
Still, this is a system that is proven to work in real deployments.&lt;/p&gt;
&lt;p&gt;It we want to make this approach work, maybe it just needs adapting.
Following this approach, the problems we’d be seeking to solve are approximately:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;How to make the names more manageable.
For instance, how you manage to securely distribute search suffixes is a significant problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How to distribute certificates.
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8555&quot;&gt;ACME&lt;/a&gt; is an obvious choice,
but what does the device talk to?
Obviously, there is some need for something to connect to the big bad Internet,
but how and how often?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Whether rules about certificates that apply to big bad Internet services fit in these contexts.
Is it OK that you need to get fresh certificates every 45 days?
How do Certificate Transparency requirements fit in this model?
Does adding lots of devices to the system lead to scaling problems?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These problems all largely look like operational challenges.
Any protocol engineering toward this end would be aimed at smoothing over the bumps.
Many of the questions even seem to have fairly straightforward answers.&lt;/p&gt;
&lt;p&gt;I don’t want to completely dismiss this approach as infeasible,
but it seems clear that there are some pretty serious impediments.
After all,
nothing has really prevented someone from deploying systems this way.
Many have tried.
That few have succeeded&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;
is perhaps evidence in support of it being too hard.&lt;/p&gt;
&lt;h3 id=&quot;.onion-names&quot;&gt;.onion names&lt;/h3&gt;
&lt;p&gt;Tor’s solution to this problem is making names self-authenticating.
You take a public key
(something for which no one else can produce a valid signature)
and that becomes your identity.
Your server name becomes a hash of that public key.
Of course, “&amp;lt;somelongstring&amp;gt;.onion” as a name is definitely not user-friendly.
You won’t want to be typing that name into an address bar&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;That use of a name that is bound to a key
recognizes that the identity of the service is bound to its name.
In the world of DNS names,
that binding is extrinsic and validated by a CA.
In Tor, that binding is intrinsic:
the name itself carries the binding.&lt;/p&gt;
&lt;p&gt;Tor requires that endpoints follow different rules to the rest of the uniquely-named servers.
Those rules include a particular protocol and deployment.
Being, as they are, a bit onerous,
only a few systems exist that are able to resolve “.onion” names.
However, this approach does suggest
that maybe there is an expansion to the definition of HTTPS
that can be made to work.&lt;/p&gt;
&lt;h3 id=&quot;.local-with-cryptographically-bound-names&quot;&gt;.local with cryptographically bound names&lt;/h3&gt;
&lt;p&gt;The same concept as Tor could be taken to local names.
Using “&amp;lt;somehash&amp;gt;.local” could be an option&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.
The idea being that the name is verified differently, but still unique.&lt;/p&gt;
&lt;p&gt;A name that is cryptographically verified
means that you could maybe drop some of the requirements
you might otherwise apply to “normal” names.&lt;/p&gt;
&lt;p&gt;The trick here is that you are asking clients to change a fair bit.
Maybe less than Tor demands,
but they still need to recognize the difference.
Servers also need to understand that their name has changed.&lt;/p&gt;
&lt;p&gt;The biggest problem with relying on unique names remains:
these aren’t going to be easy to remember and type.&lt;/p&gt;
&lt;h3 id=&quot;nicknames&quot;&gt;Nicknames&lt;/h3&gt;
&lt;p&gt;One approach for dealing with ugly names is to add nicknames.
In a browser, you might have a bookmark labeled “printer”,
which navigates to your printer at “&amp;lt;somehash&amp;gt;.local”.
Or maybe you edit &lt;code&gt;/etc/hosts&lt;/code&gt; to add a name alias.&lt;/p&gt;
&lt;p&gt;Either way, usability depends on the creation
of a mapping from the friendly name to the unfriendly one.
From a security perspective,
the mapping becomes a critical component.&lt;/p&gt;
&lt;p&gt;The idea that you might receive this critical information
from the network –
for example, the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc3397&quot;&gt;DHCP Domain Search Option&lt;/a&gt; –
is no good.
We gave to assume that the network is hostile&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The real challenge here is that everyone will have their own nicknames,
there can no canonical mapping.
My printer and your printer are (probably) different devices,
but we might want to use the same nickname.&lt;/p&gt;
&lt;h3 id=&quot;tofu-and-nicknames&quot;&gt;TOFU and nicknames&lt;/h3&gt;
&lt;p&gt;Of course, in most of these cases,
what you get from a system like this
is effectively &lt;a href=&quot;https://en.wikipedia.org/wiki/Trust_on_first_use&quot;&gt;&lt;acryonym title=&quot;Trust On First Use&quot;&gt;TOFU&lt;/acryonym&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That is,
you visit the server the first time
and give it a friendly name.
If that first visit was to the correct server,
you can use the nickname securely thereafter.
If not,
and an attacker was present for your first visit,
then you could be visiting them forever after.&lt;/p&gt;
&lt;p&gt;This model works pretty well for SSH.
It can also be hardened further if you care to do the extra work.&lt;/p&gt;
&lt;p&gt;It’s a bit rough if the server key changes,
which leads to some &lt;a href=&quot;https://www.agwa.name/blog/post/why_tofu_doesnt_work&quot;&gt;fair criticism&lt;/a&gt;.
For use in the home,
it might be good enough.&lt;/p&gt;
&lt;h2 id=&quot;non-unique-names%2C-unique-identities&quot;&gt;Non-unique names, unique identities&lt;/h2&gt;
&lt;p&gt;Recognizing that the practical effect of nicknames
plus cryptographically-bound names,
the logical next step is to just do away with the funny name entirely.&lt;/p&gt;
&lt;p&gt;The reason we want the long and awkward label is twofold:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Firstly, we need to be able to find the thing and talk to it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then, we need to ensure that it has a unique identity,
distinct from all other servers,
so that it cannot be impersonated.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those two things don’t need to be so tightly coupled.&lt;/p&gt;
&lt;p&gt;Finding the thing works perfectly well without a ridiculous name.
I would argue that
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc6762&quot;&gt;mDNS&lt;/a&gt; works better for people
if it uses names that make sense to them.&lt;/p&gt;
&lt;p&gt;We could use the friendly name where it makes sense
and an elaborate name –
or identifier –
everywhere that impersonation matters.&lt;/p&gt;
&lt;h3 id=&quot;managing-impersonation-risk&quot;&gt;Managing impersonation risk&lt;/h3&gt;
&lt;p&gt;If there are potentially many printers that can use “printer.local”,
how do we prevent each from impersonating any other?
The basic answer is that each needs to be presented distinctly.&lt;/p&gt;
&lt;h4 id=&quot;in-the-browser&quot;&gt;In the browser&lt;/h4&gt;
&lt;p&gt;On the web at least, this could be relatively simple.
There are two concepts that are relevant to all interactions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;An origin.
An origin is a tuple of values that are combined to form an unambiguous identifier.
Origins are the basis for all web interactions.
For ordinary HTTPS,
this is a tuple that combines
the scheme or protocol (“https”),
the hostname (“&lt;a href=&quot;http://www.example.com/&quot;&gt;www.example.com&lt;/a&gt;”),
and the server port number (443).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A site.
Certain features combine multiple origins
for reasons that are convoluted and embarrassing.
A site is defined as a test,
rather than a tuple of values.
Two origins can be &lt;a href=&quot;https://html.spec.whatwg.org/multipage/browsers.html#same-site&quot;&gt;same site&lt;/a&gt;
or &lt;a href=&quot;https://html.spec.whatwg.org/multipage/browsers.html#schemelessly-same-site&quot;&gt;schemelessly same site&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither of these rely on having flat names for servers,
which makes extending them a real possibility.
For instance,
“&lt;a href=&quot;https://printer.local/&quot;&gt;https://printer.local&lt;/a&gt;” might be recognized as non-unique
and therefore be assigned a tuple
that includes the server public key,
thereby ensuring that it is distinct
from all other “&lt;a href=&quot;https://printer.local/&quot;&gt;https://printer.local&lt;/a&gt;” instances.&lt;/p&gt;
&lt;p&gt;From there,
many of the reasons for impersonation can be managed.
Passkeys, cookies, and any other state
that a browser associates with a given “&lt;a href=&quot;https://printer.local/&quot;&gt;https://printer.local&lt;/a&gt;”
are only presented to that instance,
not any other.
That’s a big chunk of the impersonation risk handled.&lt;/p&gt;
&lt;p&gt;Passwords and phishing remain a challenge&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.
Outside of the use of password manager,
it won’t be hard to convince people to enter a password
into the wrong instance.
That might be something that can be managed with UX changes,
but that’s unlikely to be perfect.&lt;/p&gt;
&lt;h4 id=&quot;elsewhere&quot;&gt;Elsewhere&lt;/h4&gt;
&lt;p&gt;Outside of the browser,
there are a lot of systems that do not update
in quite the same fashion as browsers.
Their definition of server identity is likely to be less precise
than the origin/site model browsers use.&lt;/p&gt;
&lt;p&gt;For these,
it might be easier to formulate a name
that includes a cryptographic binding to the public key.
That name could be used in place of the short, friendly name.
There are reserved names that can be used for this purpose.&lt;/p&gt;
&lt;p&gt;Working out how to separate out places where names need to be unique
and where they can be user-friendly isn’t that straightforward.
A starting point might be to use an ugly name everywhere,
with substitution of nicer names being done surgically.&lt;/p&gt;
&lt;p&gt;One place that might need to be tweaked first is the protocol interactions.
A printer might easily handle being known as “printer.local”,
but it might be less able to handle being known as “&amp;lt;somehash&amp;gt;.whatever.example”.
That would keep the changes for servers to a minimum.&lt;/p&gt;
&lt;h3 id=&quot;key-rotation-and-other-problems&quot;&gt;Key rotation and other problems&lt;/h3&gt;
&lt;p&gt;One reasonable criticism of this approach is
that no mechanisms exist to support servers changing their keys.&lt;/p&gt;
&lt;p&gt;That’s mostly OK.
Key rotation will mean a new identity,
which resets existing state.
Losing state is likely tolerable for cookies and passkeys.
the phishing risk of having to enter a password to restore state,
on the other hand,
is pretty bad.&lt;/p&gt;
&lt;p&gt;That’s a genuine problem that would need work.
Of course, if the alternative is no HTTPS,
it might be a good trade.&lt;/p&gt;
&lt;p&gt;Servers in these environments probably shouldn’t be rotating keys anyway.
Things like expiration of certificates
largely only serve to ensure that servers are equipped
to deal with change.
A server at a non-unique name doesn’t have to deal with
its name disappearing or having to renew it periodically.
Those that want to deal with all of that can get a real name.&lt;/p&gt;
&lt;p&gt;Of course, this highlights how this
would require a distinct set of rules
for non-unique names.
Working out what this differences need to be is the hard part.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Extending the definition of HTTPS to include non-unique names
is potentially a big step.
However, it might mean that we can do away
with the bizarre exceptions we have for
unsecured HTTP in certain environments.&lt;/p&gt;
&lt;p&gt;This post sketched out a model
that requires very little of servers.
Servers only need to present a certificate over TLS,
with a unique key.
It doesn’t care much what those certificates contain&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/local-https/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;.
Changes are focused on clients and what they expect from devices.&lt;/p&gt;
&lt;p&gt;Allowing a system that is obviously lesser
to share the “HTTPS” scheme with the system we know
(and love/hate/respect/loathe/dread)
might seem dishonest or misleading.
I maintain that –
as long as the servers with real names are unaffected,
as they would be –
no harm comes from a more inclusive definition.&lt;/p&gt;
&lt;p&gt;Expanding what it means to be an HTTPS server
might help eliminate unsecured local services.
After all,
cleartext HTTP is not fit for deployment to the Internet.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Or, maybe, a globally unique IP address.
Really, you don’t want that though. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Let’s pretend that the manufacturer
isn’t going to go out of business during the lifetime of the widget.
OK, I can’t pretend: this is unrealistic.
Even if they stay in business,
there is no guarantee that they will maintain the necessary services. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;With some &lt;a href=&quot;https://words.filippo.io/how-plex-is-doing-https-for-all-its-users/&quot;&gt;notable&lt;/a&gt; exceptions. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And good luck noticing the phishing attack that replaces the name.
It’s not that hard for an attacker to replace the name
with one that matches a few characters at the start and end.
How do you think Facebook got “facebookcorewwwi.onion”? &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You might use &lt;code&gt;xx--&#92;&amp;lt;somehash&amp;gt;.local&lt;/code&gt;
or some other reserved label to eliminate the risk,
however remote,
of collisions with existing names. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc3552#section-3&quot;&gt;You hand your packets to the attacker to forward&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I should be recommending the use of passkeys here,
pointing to &lt;a href=&quot;https://www.imperialviolet.org/tourofwebauthn/tourofwebauthn.html&quot;&gt;Adam Langley’s nice book&lt;/a&gt;,
but – to be perfectly frank – the user experience still sucks.
Besides, denying that people use passwords is silly. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It might not be that simple.
You probably want the server to include its name,
if only to avoid unknown key share attacks.
That might rule out the use of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7250&quot;&gt;raw public keys&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/local-https/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>C2PA Is Not Going To Fix Our Misinformation Problem</title>
    <link href="https://lowentropy.net/posts/c2pa/"/>
    <updated>2024-12-12T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/c2pa/</id>
    <content type="html">&lt;p&gt;A lot of people are deeply concerned about misinformation.&lt;/p&gt;
&lt;p&gt;People often come to believe in falsehoods as part of how they identify with a social group.
Once established, false beliefs are &lt;a href=&quot;https://en.wikipedia.org/wiki/Confirmation_bias&quot;&gt;hard to overcome&lt;/a&gt;.
Beliefs are a shorthand we use in trying to &lt;a href=&quot;https://theconversation.com/what-delusions-can-tell-us-about-the-cognitive-nature-of-belief-243627&quot;&gt;make sense of the world&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Misinformation is often propagated in order to engender delusion,
or a firmly-held belief that does not correspond with reality.
Prominent examples of delusions include belief in
&lt;a href=&quot;https://en.wikipedia.org/wiki/Flat_Earth&quot;&gt;a flat earth&lt;/a&gt;,
the risk of &lt;a href=&quot;https://www.cdc.gov/vaccine-safety/about/autism.html&quot;&gt;vaccines causing autism&lt;/a&gt;,
or that &lt;a href=&quot;https://www.bbc.co.uk/newsround/48774080&quot;&gt;moon landing was staged&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Delusions –
if sufficiently widespread
or if &lt;a href=&quot;https://medium.com/incerto/the-most-intolerant-wins-the-dictatorship-of-the-small-minority-3f1f83ce4e15&quot;&gt;promoted aggressively enough&lt;/a&gt; –
can have a &lt;a href=&quot;https://www.nbcnews.com/specials/russian-disinformation-2024-election-storm-1516/index.html&quot;&gt;significant effect&lt;/a&gt;
on the operation of our society,
particularly when it comes to involvement in democratic processes.&lt;/p&gt;
&lt;p&gt;Misinformation campaigns seek to drive these effects.
For instance,
promoting a false belief
that &lt;a href=&quot;https://www.bbc.com/news/articles/c3wp6q132p2o&quot;&gt;immigrants are eating household pets&lt;/a&gt;
might motivate the implementation of laws
that lead to unjustifiable treatment of immigrants.&lt;/p&gt;
&lt;p&gt;For some, the idea that technology might help with this sort of problem
is appealing.
If misinformation is the cause of harmful delusions,
maybe having less misinformation would help.&lt;/p&gt;
&lt;p&gt;The explosion in popularity and efficacy of generative AI
has made the creation of content that carries misinformation far easier.
This has sharpened a desire to build tools to help separate truth and falsehood.&lt;/p&gt;
&lt;h2 id=&quot;a-security-mechanism&quot;&gt;A Security Mechanism&lt;/h2&gt;
&lt;p&gt;Preventing the promotion of misinformation can be formulated a security goal.
We might set out one of two complementary goals:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It must be possible to identify fake content as fake.&lt;/li&gt;
&lt;li&gt;It must be possible to distinguish genuine content.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our adversary might seek to pass off fake content as genuine.
However, an easier goal might be easier to achieve:
the adversary only needs to
avoid having their fake content
being identified as fabrications.&lt;/p&gt;
&lt;p&gt;Note that we assume that once a story is established as fake,
most people will cease to believe it.
That’s a big assumption,
but we can at least pretend that this will happen
for the purposes of this analysis.&lt;/p&gt;
&lt;p&gt;In terms of capabilities,
any adversary can be assumed to be capable of
using generative AI and other tools
to produce fake content.
We also allow the adversary access to any mechanism
used to distinguish between real and fake content&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;technical-options&quot;&gt;Technical Options&lt;/h2&gt;
&lt;p&gt;Determining what is – or is not – truthful is not easy.
Given an arbitrary piece of content,
it is not trivial to determine whether it contains fact or fabrication.
After all, if it were that simple,
misinformation would not be that big a problem.&lt;/p&gt;
&lt;p&gt;Technical proposals in this space generally aim for a less ambitious goal.
One of two approaches is typically considered:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Mark fake content as fake.&lt;/li&gt;
&lt;li&gt;Mark genuine content as genuine.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both rely on the system that creates content knowing which of the two applies.
The creator can therefore apply the requisite mark.
As long as that mark survives to be read by the consumer of the content,
what the creator knew about whether the content was “true” can be conveyed.&lt;/p&gt;
&lt;p&gt;Evaluating these options
against the goals of our adversary –
who seeks to pass off fake content as “real” –
is interesting.
Each approach requires high levels of adoption
to be successful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If an adversary seeks to pass off fake content as real,
virtually all &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fake&quot;&gt;fake content&lt;/a&gt; needs to be marked as such.
Otherwise, people seeking to promote fake content
can simply use any means of production
that don’t add markings.
Markings also need to be very hard to remove.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In comparison,
genuine content markings might still need to be universally applied,
but it might be possible to realize benefits
when limited to &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#expectations&quot;&gt;specific outlets&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That makes markings on genuine content more appealing
as a way to help counteract misinformation.&lt;/p&gt;
&lt;h2 id=&quot;fake&quot;&gt;Attesting to Fakeness&lt;/h2&gt;
&lt;p&gt;If content (text, image, audio, or video) is produced with generative AI,
it can maybe include some way to check that it is fake.
The output of many popular LLMs often includes
both metadata and a small watermark.&lt;/p&gt;
&lt;p&gt;These indications are pretty useless if someone is seeking to promote a falsehood.
It is trivial to edit content to remove metadata.
Similarly, visible watermarks can be edited out of images.&lt;/p&gt;
&lt;p&gt;The response to that is a form of watermarking
that is supposed to be impossible to remove.
Either the generator embeds markings in the content as it is generated,
or the marking is applied to the output content by a specialized process.&lt;/p&gt;
&lt;p&gt;A separate system is then provided that can take any content
and determine whether it was marked.&lt;/p&gt;
&lt;p&gt;The question then becomes whether it is possible
to generate a watermark that cannot be removed.
&lt;a href=&quot;https://arxiv.org/abs/2311.04378&quot;&gt;This paper&lt;/a&gt; makes a strong case for the negative
by demonstrating the removal –
and re-application –
of arbitrary watermarks,
is possible,
requiring only access to the system that rules on whether the watermark is present.&lt;/p&gt;
&lt;p&gt;Various generative AI vendors companies
have implemented systems of markings,
including
&lt;a href=&quot;https://help.openai.com/en/articles/8912793-c2pa-in-dall-e-3&quot;&gt;metadata&lt;/a&gt;,
&lt;a href=&quot;https://help.openai.com/en/articles/6468065-dall-e-2-faq#h_75522bc940&quot;&gt;removable watermarks&lt;/a&gt;, and
&lt;a href=&quot;https://deepmind.google/technologies/synthid/&quot;&gt;watermarking that is supposed to be resistant to removal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Furthermore,
generative AI models have to be controlled so that people
can’t generate their own content without markings.
That is clearly &lt;a href=&quot;https://huggingface.co/&quot;&gt;not feasible&lt;/a&gt;,
as much as some would like to retain control.&lt;/p&gt;
&lt;p&gt;Even if model access could be controlled,
it seems likely that watermarks will be removable.
At best, this places the systems that apply markings
in a escalating competition with adversaries
that seek to remove (or falsify) markings.&lt;/p&gt;
&lt;h2 id=&quot;content-provenance&quot;&gt;Content Provenance&lt;/h2&gt;
&lt;p&gt;There’s a case to be made for the use of metadata
in establishing where content came from,
namely &lt;em&gt;provenance&lt;/em&gt;.
If the goal is to positively show that content was generated in a particular way,
then metadata might be sufficient.&lt;/p&gt;
&lt;p&gt;Provenance could work to label content as either fake or real.
However, it is most interesting as a means of tracing real content to its source
because that might be more feasible.&lt;/p&gt;
&lt;p&gt;The most widely adopted system is &lt;a href=&quot;https://c2pa.org/&quot;&gt;C2PA&lt;/a&gt;.
This system has received a lot of attention
and is often presented as &lt;em&gt;the&lt;/em&gt; answer to online misinformation.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/criticism&quot;&gt;An unpublished opinion piece that I wrote in 2023 about C2PA&lt;/a&gt;
is highly critical.
This blog is a longer examination
of what C2PA might offer
and its shortcomings.&lt;/p&gt;
&lt;h2 id=&quot;how-c2pa-works&quot;&gt;How C2PA Works&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html&quot;&gt;C2PA specification&lt;/a&gt;
is long and somewhat complicated&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;,
but the basics are pretty simple:&lt;/p&gt;
&lt;p&gt;Content is digitally signed by the entity that produced it.
C2PA defines a bunch of claims
that all relate to how the content was created.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;The use of digital signatures is often used
to establish a claim about something.
If you trust that an entity to make claims of a particular form,
a valid signature allows you to believe those claims
as they apply to a specific piece of content.&lt;/p&gt;
&lt;p&gt;For example, web browser trust &lt;a href=&quot;https://letsencrypt.org/&quot;&gt;Let’s Encrypt&lt;/a&gt;
to make claims about the identity of websites.
They digitally sign certificates that are presented to browsers
when establishing a connection.
The identity of the site is accepted
only if the signature on the certificate is valid.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;C2PA binds attributes to content in one of two ways.
A “hard” binding uses a cryptographic hash,
which ensures that any modification to the content invalidates the signature.
A “soft” binding binds to
a &lt;a href=&quot;https://en.wikipedia.org/wiki/Perceptual_hashing&quot;&gt;perceptual hash&lt;/a&gt;
or a watermark (more on that &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#soft&quot;&gt;below&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The C2PA metadata includes a bunch of attributes,
including a means of binding to the content,
all of which are digitally signed.&lt;/p&gt;
&lt;p&gt;An important type of attribute in C2PA
is one that points to source material
used in producing derivative content.
For instance, if an image is edited,
an attribute might refer to the original image.
This is supposed to enable the tracing of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the original work, when the present work contains edits, or&lt;/li&gt;
&lt;li&gt;the components that comprise a derivative work.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;ok&quot;&gt;What Might Work in C2PA&lt;/h2&gt;
&lt;p&gt;Cryptographic assertions
that come from secured hardware
might be able to help identify “real” content.&lt;/p&gt;
&lt;p&gt;A camera or similar capture device could use C2PA
to sign the content it captures.
Provided that the keys used cannot be extracted from the hardware&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;,
an assertion by the manufacturer
might make a good case for the image being genuine.&lt;/p&gt;
&lt;p&gt;The inclusion of metadata that includes URLs for source material –
“ingredients” in C2PA-speak&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; –
might also be useful in finding content
that contains a manufacturer signature.
That depends on the metadata including accessible URLs.
As any assertion in C2PA is optional,
this is not guaranteed.&lt;/p&gt;
&lt;h2 id=&quot;where-c2pa-does-not-deliver&quot;&gt;Where C2PA Does Not Deliver&lt;/h2&gt;
&lt;p&gt;The weaknesses in C2PA are somewhat more numerous.&lt;/p&gt;
&lt;p&gt;This section looks in more detail at some aspects of C2PA
that require greater skepticism.
These are the high-level items only;
there are other aspects of the design
that seem poorly specified or problematic&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;,
but the goal of this post is to focus on the primary problem.&lt;/p&gt;
&lt;h3 id=&quot;soft&quot;&gt;C2PA Soft Bindings&lt;/h3&gt;
&lt;p&gt;A soft binding in C2PA allows for modifications of the content.
The idea is that the content might be edited,
but the assertions would still apply.&lt;/p&gt;
&lt;p&gt;As mentioned, two options are considered in the specification:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Perceptual_hashing&quot;&gt;Perceptual hashing&lt;/a&gt;,
which are non-cryptographic digests of content
that are intended to remain stable when content is edited.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Watermarking, which binds to a watermark
that is embedded in the content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In an adversarial setting,
the use of perceptual hashes is well-studied,
with numerous results that show exploitable weaknesses.&lt;/p&gt;
&lt;p&gt;Perceptual hashes are not cryptographic hashes,
so they are often vulnerable to cryptanalytic attack.
Collision and second preimage attacks are most relevant here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Collision attacks –
such as &lt;a href=&quot;https://eprint.iacr.org/2024/1869&quot;&gt;this one&lt;/a&gt; –
give an adversary the ability
to generate two pieces of content with the same fingerprint.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Second preimage attacks –
such as implemented with &lt;a href=&quot;https://github.com/anishathalye/neural-hash-collider&quot;&gt;this code&lt;/a&gt; –
allow an adversary
to take content that produces one output
and then modify completely different content
so that it results in the same fingerprint.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Either attack allows an adversary to substitute
one piece of content for another,
though the preimage attack is more flexible.&lt;/p&gt;
&lt;p&gt;Binding to a watermark appears to be easier to exploit.
It appears to be possible to
extract a watermark from one piece of content
and apply it to another.
Watermarks are often able to be removed –
such as the TrustMark-RM mode of &lt;a href=&quot;https://arxiv.org/abs/2311.18297&quot;&gt;TrustMark&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; –
and re-applied.
That makes it possible to extract a watermark
from one piece of content
and copy it –
along with any C2PA assertions –
to entirely different content.&lt;/p&gt;
&lt;h3 id=&quot;c2pa-traceability-and-provenance&quot;&gt;C2PA Traceability and Provenance&lt;/h3&gt;
&lt;p&gt;One idea that C2PA promotes
is that source material might be traced.
When content is edited in a tool that supports C2PA,
the tool embeds information about the edits,
especially any source material.
In theory, this makes it possible to trace the provenance
of C2PA-annotated content.&lt;/p&gt;
&lt;p&gt;In practice, tracing provenance is unlikely
to be a casual process.
Some publisher sites might aid the discovery of source material
but content that is redistributed
in other places
could be quite hard to trace&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Consider photographs that are published online.
Professional images are captured in formats
like &lt;a href=&quot;https://www.adobe.com/creativecloud/file-types/image/raw.html&quot;&gt;RAW&lt;/a&gt;
that are unsuitable for publication.
Most images are often transcoded and edited for publication.&lt;/p&gt;
&lt;p&gt;To trace provenance,
editing software needs to
embed its own metadata about changes&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;,
including a means of locating the original&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Any connection between the published and original content
cannot be verified automatically in a reliable fashion.
A hard, or cryptographic, binding
is immediately invalidated
by any edit.&lt;/p&gt;
&lt;p&gt;The relationship between edited and original content
therefore cannot be validated by a machine.
Something like a perceptual hash might be used
to automate this connection.
However, as we’ve already established,
perceptual hashes are vulnerable to attack.
Any automated process based on a perceptual hash
is therefore unreliable.&lt;/p&gt;
&lt;p&gt;At best,
a human might be able to look at images
and reach their own conclusions.
That supports the view that provenance information
is unlikely to be able to take advantage
of the scaling
that might come from machine validation.&lt;/p&gt;
&lt;h3 id=&quot;drm&quot;&gt;C2PA and DRM&lt;/h3&gt;
&lt;p&gt;With a published specification,
anyone can generate a valid assertion.
That means that
C2PA verifiers need some means of deciding
which assertions to believe.&lt;/p&gt;
&lt;p&gt;For hardware capture of content (images, audio, and video),
there are relatively few manufacturers.
For the claims of a hardware manufacturer
to be credible,
they have to ensure that
the keys they use to sign assertions
can only be used with unmodified versions of their hardware.&lt;/p&gt;
&lt;p&gt;That depends on having a degree of control.
Control over access to secret keys in specialized hardware modules
means that it might be possible
to maintain the integrity of this part of the system.&lt;/p&gt;
&lt;p&gt;There is some risk
of this motivating anti-consumer actions
on the part of manufacturers.
For example, cameras could refuse to produce assertions
when used with aftermarket lenses.
Or, cameras that stop producing assertions
if they are repaired.&lt;/p&gt;
&lt;p&gt;As long as modifying hardware
only results in a loss of assertions,
that seems unlikely to be a serious concern
for many people.
Very few people seek to modify hardware&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The need to restrict editing software
is far more serious.
In order for edits to be considered trustworthy,
strict controls are necessary.&lt;/p&gt;
&lt;p&gt;The need for controls
would make it impossible for open source software
to generate trustworthy assertions.
Assertions could only be generated to cloud-based –
or maybe DRM-laden –
software.&lt;/p&gt;
&lt;h2 id=&quot;completely-new-trust-infrastructure&quot;&gt;Completely New Trust Infrastructure&lt;/h2&gt;
&lt;p&gt;The idea of creating trust infrastructure
for authenticating capture device manufacturers
and editing software vendors
is somewhat daunting.&lt;/p&gt;
&lt;p&gt;Experience with the Web &lt;acronym title=&quot;Public Key Infrastructure&quot;&gt;PKI&lt;/acronym&gt;
shows that this is a non-trivial undertaking.
A governance structure needs to be put in place
to set rules for how inclusions –
and exclusions –
are decided.
Systems need to be put in place
for distributing keys
and for managing revocation.&lt;/p&gt;
&lt;p&gt;This is not a small undertaking.
However, for this particular structure,
it is not unreasonable to expect this to work out.
With a smaller set of participants than the Web PKI,
along with somewhat lower stakes,
this seems possible.&lt;/p&gt;
&lt;h3 id=&quot;alternative-trust-infrastructure-options&quot;&gt;Alternative Trust Infrastructure Options&lt;/h3&gt;
&lt;p&gt;In discussions about C2PA,
when I raised concerns about DRM,
&lt;a href=&quot;https://jeffrey.yasskin.info/&quot;&gt;Jeffrey Yasskin&lt;/a&gt;
mentioned a possible alternative direction.&lt;/p&gt;
&lt;p&gt;In that alternative,
attestations are not made by device or software vendors.
Content authors (or editors or a publisher)
would be the ones to make any assertions.
Assertions might be tied to an existing identity,
such as a website domain name,
avoiding any need to build an entirely new PKI.&lt;/p&gt;
&lt;p&gt;A simple method would be to have content signed&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;
by a site that claims it.
That immediately helps with the problem
of people attempting to pass fake information
as coming from a particular source.&lt;/p&gt;
&lt;p&gt;The most intruiging version of this idea
relies on building a reputation system for content.
If content can then be traced to its source,
the reputation associated that source
can in some way be built up over time.&lt;/p&gt;
&lt;p&gt;The key challenge is that this latter form changes from
a definitive sort of statement –
under C2PA, content is either real or not –
to a more subjective one.
That’s potentially valuable
in that it encourages more active engagement
with the material.&lt;/p&gt;
&lt;p&gt;The idea of building new reputational systems
is fascinating
but a lot more work is needed
before anything more could be said.&lt;/p&gt;
&lt;h2 id=&quot;simpler&quot;&gt;A Simpler Provenance&lt;/h2&gt;
&lt;p&gt;The difficulty of tracing,
along with the problems associated with editing,
suggests a simpler approach.&lt;/p&gt;
&lt;p&gt;The benefits of C2PA
might be realized by a combination of
hardware-backed cryptographic assertions
and simple pointers
(that is, without digital signatures)
from edited content
to original content.&lt;/p&gt;
&lt;p&gt;Even then,
an adversary still has a few options.&lt;/p&gt;
&lt;h3 id=&quot;trickery&quot;&gt;Trickery&lt;/h3&gt;
&lt;p&gt;When facial recognition systems were originally built,
researchers found that some of these could be defeated
by &lt;a href=&quot;https://www.which.co.uk/news/article/face-recognition-mobile-phones-axNDM2P9VvyO&quot;&gt;showing the camera a photo&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Generating a fake image with a valid assertion
could as simple as showing a C2PA camera a photograph&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt;.
The use of trick photography to create a false impression
is also possible.&lt;/p&gt;
&lt;h3 id=&quot;expectations&quot;&gt;No Expectations&lt;/h3&gt;
&lt;p&gt;It is probably fair to say that –
despite some uptake of C2PA –
most content in existence
does not include C2PA assertions.&lt;/p&gt;
&lt;p&gt;Limited availability
seriously undermines the value
of any provenance system
in countering misinformation.
An attacker can remove metadata
if people do not expect it to be present.&lt;/p&gt;
&lt;p&gt;This might be different for media outlets
that implement policies
that result in universal –
or at least near-universal –
use of something like C2PA.
Then, people can expect content
produced by that outlet
will contain provenance information.&lt;/p&gt;
&lt;p&gt;Articles on social media
can still claim to be from that outlet.
However, it might become easier
to refute that sort of false claim.&lt;/p&gt;
&lt;p&gt;That might be reason enough for a media outlet
to insist on implementing something like C2PA.
After all,
the primary currency in which journalistic institutions trade
is their reputation.
Having a technical mechanism
that can support refutation of falsified articles
has some value
in terms of being able to defend their reputation.&lt;/p&gt;
&lt;p&gt;The cost might be significant,
if the benefits are not realized
until nearly all content is traceable.
That might entail replacing every camera used by journalists
and outside contributors.
Given the interconnected nature of news media,
with many outlets publishing content that is sourced from partners,
that’s likely a big ask.&lt;/p&gt;
&lt;h3 id=&quot;a-lack-of-respect-for-the-truth&quot;&gt;A Lack of Respect for the Truth&lt;/h3&gt;
&lt;p&gt;For any system like this to be effective,
people need to care
about whether something is real or not.&lt;/p&gt;
&lt;p&gt;It is not just about &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#expectations&quot;&gt;expectations&lt;/a&gt;,
people have to be motivated to interrogate claims
and seek the truth.
That’s not a problem that can be solved by technical means.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#ok&quot;&gt;narrow applicability&lt;/a&gt; of the assertions
for capture hardware
suggests that a &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#simpler&quot;&gt;simpler approach&lt;/a&gt;
might be better and more feasible.
Some applications –
such as in marking generated content –
are probably ineffectual
as a means of countering misinformation.
The &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#drm&quot;&gt;DRM aspect&lt;/a&gt; is pretty ugly,
while not really adding any value.&lt;/p&gt;
&lt;p&gt;All of which is to say that
the technical aspects of provenance systems
like C2PA
are not particularly compelling.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;We have to assume that
people will need to be able to ask whether content is real or fake
for the system to work. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And – it pains me to say – it is not very good.
I write specifications for a living,
so I appreciate how hard it is to produce something on this scale.
Unfortunately, this specification needs far more rigor.
I suspect that the only way to implement C2PA successfully
would be to look at one of the implementations. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;That’s a big “if”, though not implausible.
Though &lt;a href=&quot;https://freedom-to-tinker.com/2010/09/16/understanding-hdcp-master-key-leak/&quot;&gt;hardware keys used in consumer hardware have been
extracted&lt;/a&gt;,
the techniques used for protecting secrets require considerable resources.
That would only invalidate the signatures from a single manufacturer
or limited product lines.
C2PA might not be worth the effort. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;C2PA can also indicate generative AI ingredients
such as the text prompt used
and the details of the generative model.
That’s not much use
in terms of protecting against use of content for misinformation,
but it might have other uses. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For instance,
the method by which assertions can be redacted
is pretty questionable.
See &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure&quot;&gt;my post on selective disclosure&lt;/a&gt;
for more on what that sort of system
might need to do. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref5:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;TrustMark is one of the soft binding mechanisms
that C2PA recognizes.
It’s also the first one I looked into.
I have no reason to believe that other systems are better. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;C2PA does not use standard locators
(such as &lt;code&gt;https://&lt;/code&gt;),
defining a new URI scheme.
That suggests that the means of locating source material
is likely not straightforward. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I did not look into how much detail about edits is recorded.
Some of the supporting material for C2PA
suggests that this could be quite detailed,
but that seems impractical
and the specification only includes
&lt;a href=&quot;https://c2pa.org/specifications/specifications/2.1/specs/C2PA_Specification.html#_actions&quot;&gt;a limited set of edit attributes&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;C2PA also defines metadata
for an image thumbnail.
Nothing prevents this from including a false representation. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This might be more feasible for images and video
than for audio.
Image and video capture equipment
is often integrated into a single unit.
Audio often features analog interconnections
between components,
which makes it harder to detect falsified inputs. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, &lt;a href=&quot;https://lowentropy.net/posts/bundles&quot;&gt;we’ve been here before&lt;/a&gt;.
Sort of. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Modern systems use infrared or depth cameras
that are harder to spoof so trivially,
though not completely impossible:
&lt;a href=&quot;https://www.cyberark.com/resources/threat-research-blog/bypassing-windows-hello-without-masks-or-plastic-surgery&quot;&gt;hardware spoofing&lt;/a&gt; and
&lt;a href=&quot;https://ieeexplore.ieee.org/document/10179429&quot;&gt;depth spoofing&lt;/a&gt;
both appear to be feasible. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;C2PA has the means to attest to depth information,
but who would expect that?
Especially when you can redact any clues
that might lead someone to expect it to be present&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fn5&quot; id=&quot;fnref5:1&quot;&gt;[5:1]&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;https://lowentropy.net/posts/c2pa/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Everything you need to know about selective disclosure</title>
    <link href="https://lowentropy.net/posts/selective-disclosure/"/>
    <updated>2024-11-21T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/selective-disclosure/</id>
    <content type="html">&lt;h2 id=&quot;why-does-this-matter%3F&quot;&gt;Why does this matter?&lt;/h2&gt;
&lt;p&gt;A lot of governments are engaging with projects to build “Digital Public Infrastructure”. That term covers a range of projects, but one of the common and integral pieces relates to government-backed identity services. While some places have had some form of digital identity system for years — hi Estonia! — there are many more governments looking to roll out some sort of digital identity wallet for their citizens. Notably, the European Union recently passed a major update to their &lt;a href=&quot;https://digital-strategy.ec.europa.eu/en/policies/eudi-regulation&quot;&gt;European Digital Identity Regulation&lt;/a&gt;, which seeks to have a union-wide digital identity system for all European citizens. India’s &lt;a href=&quot;https://en.wikipedia.org/wiki/Aadhaar&quot;&gt;Aadhaar&lt;/a&gt; is still the largest such project with well over a billion people enrolled.&lt;/p&gt;
&lt;p&gt;There are a few ways that these systems end up being implemented, but most take the same basic shape. A government agency will be charged with issuing people with credentials. That might be tied to driver licensing, medical services, passports, or it could be a new identity agency. That agency issues digital credentials that are destined for wallets in phones. Then, services can request that people present these credentials at certain points, as necessary.&lt;/p&gt;
&lt;p&gt;The basic model that is generally used looks something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/3party.svg&quot; eleventy:width=&quot;640&quot; alt=&quot;Three boxes with arrows between each in series, in turn labeled: Issuer, Holder, Verifier&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The government agency is the “issuer”, your wallet app is a “holder”, and the service that wants your identity information is a “verifier”.&lt;/p&gt;
&lt;p&gt;This is a model for digital credentials that is useful in describing a lot of different interactions. A key piece of that model is the difference between a &lt;em&gt;credential&lt;/em&gt;, which is the thing that ends up in a wallet, and a &lt;em&gt;presentation&lt;/em&gt;, which is what you show a verifier.&lt;/p&gt;
&lt;p&gt;This document focuses on online use cases. That is, where you might be asked to present information about your identity to a website Though there are many other uses for identity systems, online presentation of identity is becoming more common. How we use identity online is likely to shape how identity is used more broadly.&lt;/p&gt;
&lt;p&gt;The goal of this post is to provide information and maybe a fresh perspective on the topic. This piece also has a &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#the-limitations-of-technical-solutions&quot;&gt;conclusion&lt;/a&gt; that suggests that the truly hard problems in online identity are not technical in nature, so do not necessarily benefit from the use of selective disclosure. As much as selective disclosure is useful in some contexts, there are significant challenges in deploying it on the Web.&lt;/p&gt;
&lt;h2 id=&quot;what-is-selective-disclosure%3F&quot;&gt;What is selective disclosure?&lt;/h2&gt;
&lt;p&gt;A presentation might be a reduced form of the credential. Let’s say that you have a driver license, like the following:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/mclovin.png&quot; alt=&quot;A photo of a (fake) Hawaii driver license&quot; /&gt;&lt;/p&gt;
&lt;p&gt;One way of thinking about selective disclosure is to think of it as redacting those parts of the credential that you don’t want to share.&lt;/p&gt;
&lt;p&gt;Let’s say that you want to show that you are old enough to buy alcohol. You might imagine doing something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/redacted.png&quot; alt=&quot;A photo of a (fake) Hawaii driver license with some fields covered with black boxes&quot; /&gt;&lt;/p&gt;
&lt;p&gt;That is, if you were presenting that credential to a store in person, you would want to show that the card truly belongs to you and that you are old enough.&lt;/p&gt;
&lt;p&gt;If you aren’t turning up in person, the photo and physical description are not that helpful, so you might cover those as well.&lt;/p&gt;
&lt;p&gt;You don’t need to share your exact birth date to show that you are old enough. You might be able to cover the month and day of those too. That is still too much information, but the best you can easily manage with a &lt;a href=&quot;https://theonion.com/cia-realizes-its-been-using-black-highlighters-all-thes-1819568147/&quot;&gt;black highlighter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If there was a “can buy alcohol” field on the license, that might be even better. But &lt;a href=&quot;https://en.wikipedia.org/wiki/Legal_drinking_age&quot;&gt;the age at which you can legally buy alcohol&lt;/a&gt; varies quite a bit across the world. And laws apply to the location, not the person. A 19 year old from Canada can’t buy alcohol in the US just because they can buy alcohol at home&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;. Most digital credential systems have special fields to allow for this sort of rule, so that a US&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt; liquor store could use an “over_21” property, whereas a purchase in Canada might check for “over_18” or “over_19” depending on the province.&lt;/p&gt;
&lt;h2 id=&quot;simple-digital-credentials&quot;&gt;Simple digital credentials&lt;/h2&gt;
&lt;p&gt;The simplest form of digital credential is a bag of attributes, covered by a digital signature from a recognized authority. For instance, this might be a JSON Web Token, which is basically just a digitally-signed chunk of JSON.&lt;/p&gt;
&lt;p&gt;For our purposes, let’s run with the example, which we’d form into something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-javascript&quot;&gt;{
  &amp;quot;number&amp;quot;: &amp;quot;01-47-87441&amp;quot;,
  &amp;quot;name&amp;quot;: &amp;quot;McLOVIN&amp;quot;,
  &amp;quot;address&amp;quot;: &amp;quot;892 MOMONA ST, HONOLULU, HI 96820&amp;quot;,
  &amp;quot;iss&amp;quot;: &amp;quot;1998-06-18&amp;quot;,
  &amp;quot;exp&amp;quot;: &amp;quot;2008-06-03&amp;quot;,
  &amp;quot;dob&amp;quot;: &amp;quot;1981-06-03&amp;quot;,
  &amp;quot;over_18&amp;quot;: true,
  &amp;quot;over_21&amp;quot;: true,
  &amp;quot;over_55&amp;quot;: false,
  &amp;quot;ht&amp;quot;: &amp;quot;5&#39;10&amp;quot;,
  ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That could then be wrapped up and signed by whatever Hawaiian DMV issues the license. Something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/credential.svg&quot; alt=&quot;Two nested boxes, the inner containing text &amp;quot;McLOVIN&#39;s Details&amp;quot;; the outer containing text &amp;quot;Digital Signature&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;That isn’t perfect, because a blob of bytes like that can just be copied around by anyone that receives that credential. Anyone that received a credential could “impersonate” our poor friend.&lt;/p&gt;
&lt;p&gt;The way that problem is addressed is through the use of a digital wallet. The issuer requires that the wallet hold a second signing key. The wallet provides the issuer with an attestation, which is just evidence from the wallet maker (which is often the maker of your phone) that they are holding a private key in a place where it can’t be moved or copied&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;. That attestation includes the public key that matches that private key.&lt;/p&gt;
&lt;p&gt;Once the issuer is sure that the private key is tied to the device, the issuer produces a credential that lists the public key from the wallet.&lt;/p&gt;
&lt;p&gt;In order to use the credential, the wallet signs the credential along with some other stuff, like the current time and maybe the identity of the verifier&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;, as follows:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/presentation.svg&quot; alt=&quot;Nested boxes, the outer containing text &amp;quot;Digital signature using the Private Key from McLOVIN&#39;s Wallet&amp;quot;; two at the next level the first containing text &amp;quot;Verifier Identity, Date and Time, etc...&amp;quot;, the other containing text &amp;quot;Digital Signature using the Private Key of the Hawaii DMV&amp;quot;; the latter box contains two further boxes containing text &amp;quot;McLOVIN&#39;s Details&amp;quot; and &amp;quot;McLOVIN&#39;s Wallet Public Key&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;With something like this, unless someone is able to use the signing key that is in the wallet, they can’t generate a presentation that a verifier will accept. It also ensures that the wallet can use a biometric or password check to ensure that a presentation is only created when the person allows it.&lt;/p&gt;
&lt;p&gt;That is a basic presentation that includes all the information that the issuer knows about. The problem is that this is probably more than you might be comfortable with sharing with a liquor store. After all, while you might be able to rely on the fact that the cashier in a store isn’t copying down your license details, you just &lt;em&gt;know&lt;/em&gt; that any digital information you present is going to be saved, stored, and sold. That’s where selective disclosure is supposed to help.&lt;/p&gt;
&lt;h2 id=&quot;salted-hash-selective-disclosure&quot;&gt;Salted hash selective disclosure&lt;/h2&gt;
&lt;p&gt;One basic idea behind selective disclosure is to replace all of the data elements in a credential — or at least the ones that someone might want to keep to themselves — with placeholders. Those placeholders are replaced with a commitment to the actual values. Any values that someone wants to reveal are then included in the presentation. A verifier can validate that the revealed value matches the commitment.&lt;/p&gt;
&lt;p&gt;The most basic sort of commitment is a hash commitment. That uses a hash function, which is really anything where it is hard to produce two inputs that result in the same output. The commitment to a value of X is H(X).&lt;/p&gt;
&lt;p&gt;That is, you might replace the (“name”, “McLOVIN”) with a commitment like H(“name” || “McLOVIN”). The hash function ensures that it is easy to validate that the underlying values match the commitment, because the verifier can compute the hash for themselves. But it is basically impossible to recover the original values from the hash.  And it is similarly difficult to find another set of values that hash to the same value, so you can’t easily substitute false information.&lt;/p&gt;
&lt;p&gt;A key problem with a hash commitment is that a simple hash commitment only works to protect the value of the input if that input is hard to guess in the first place. But most of the stuff on a license is pretty easy to guess in one way or another. For simple stuff like “over_21”, there are just two values: “true” or “false”. If you want to know the original value, you can just check each of the values and see which matches.&lt;/p&gt;
&lt;p&gt;Even for fields that have more values, it is possible to build a big table of hash values for every possible (or likely) value. This is called a “rainbow table”&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/rainbow.svg&quot; alt=&quot;A diagram showing mappings from hashes to values&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Rainbow tables don’t work if the committed value is very hard to guess. So, in addition to the value of the field, a large random number is added to the hidden value. This number is called “salt” and a different value needs to be generated for every field that can be hidden, with different values for every new credential. As long as there are many more values for the salt than can reasonably be stored in a rainbow table, there is no easy way to work out which commitment corresponds to which value.&lt;/p&gt;
&lt;p&gt;So for each field, the issuer generates a random number and replaces all fields in the credential with H(salt || name || value), using some agreed encoding. The issuer then signs over those commitments and provides the wallet with a credential that is full of commitments, plus the full set of values that were committed to, including the associated salt.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/sh-credential.svg&quot; alt=&quot;A credential containing commitments to values, with the value and associated salt alongside&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The wallet can then use the salt and the credential to reveal a value and prove that it was included in the credential, creating a presentation something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/sh-presentation.svg&quot; alt=&quot;A presentation using the credential, with selected values and their salt alongside&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The verifier then gets a bunch of fields with the key information replaced with commitments. All of the commitments are then signed by the issuer. The verifier also gets some number of unsigned tuples of (salt, name, value). The verifier can then check that H(salt || name || value) matches one of the commitments.&lt;/p&gt;
&lt;p&gt;This is the basic design that underpins a number of selective disclosure designs. Salted hash selective disclosure is pretty simple to build because it doesn’t require any fancy cryptography. However, salted hash designs have some limitations that can be a little surprising.&lt;/p&gt;
&lt;h3 id=&quot;other-selective-disclosure-approaches&quot;&gt;Other selective disclosure approaches&lt;/h3&gt;
&lt;p&gt;There are other approaches that might be used to solve this problem. Imagine that you had a set of credentials, each of which contained a single attribute. You might imagine sharing each of those credentials separately, choosing which ones you show based on what the situation demanded.&lt;/p&gt;
&lt;p&gt;That might look something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/sd-bls-credential.svg&quot; alt=&quot;A presentation that includes multiple separate credentials, each with a single attribute&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Having multiple signatures can be inefficient, but this basic idea is approximately sound&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;. There are a lot of signatures, which would make a presentation pretty unwieldy if there were lots of properties. There are digital signature schemes that make this more efficient though, like the &lt;a href=&quot;https://en.wikipedia.org/wiki/BLS_digital_signature&quot;&gt;BLS&lt;/a&gt; scheme, which allows multiple signatures to be folded into one.&lt;/p&gt;
&lt;p&gt;That is the basic idea behind &lt;a href=&quot;https://arxiv.org/abs/2406.19035&quot;&gt;SD-BLS&lt;/a&gt;. SD-BLS doesn’t make it cheaper for an issuer. An issuer still needs to sign a whole bunch of separate attributes. But combining signatures means that it can make presentations smaller and easier to verify. SD-BLS has some privacy advantages over salted hashes, but the primary problem that the SD-BLS proposal aims to solve is revocation, which is covered in more detail &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#credential-revocation&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;problems-with-salted-hashes&quot;&gt;Problems with salted hashes&lt;/h3&gt;
&lt;p&gt;Going back to the original example, the effect of the salted hash is that you probably get something like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/sd-redacted.png&quot; alt=&quot;A Hawaii driver license with all the fields covered with gray rectangles, except the expiry date&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Imagine that every field on the license is covered with the gray stuff you get on scratch lottery tickets. You can choose which to scratch off before you hand it to someone else&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;. Here’s what they learn:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;That this is a valid Hawaii driver license. That is, they learn who issued the credential.&lt;/li&gt;
&lt;li&gt;When the license expires.&lt;/li&gt;
&lt;li&gt;The value of the fields that you decided to reveal.&lt;/li&gt;
&lt;li&gt;How many fields you decided not to reveal.&lt;/li&gt;
&lt;li&gt;Any other places that you present that same credential, as discussed &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#linkability-and-selective-disclosure&quot;&gt;below&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;On the plus side, and contrary to what is shown for a physical credential, the size and position of fields is not revealed for a digital credential.&lt;/p&gt;
&lt;p&gt;Still, that is likely a bit more information than might be expected. If you only wanted to reveal the “over_21” field so that you could buy some booze, having to reveal all those other things isn’t exactly ideal.&lt;/p&gt;
&lt;p&gt;Revealing who issued the credential seems like it might be harmless, but for a digital credential, that’s revealing a lot more than your eligibility to obtain liquor. Potentially a lot more. Maybe in Hawaii, holding a Hawaii driver license isn’t notable, but it might be distinguishing — or even disqualifying — in other places. A Hawaii driver license reveals that you likely live in Hawaii, which is not exactly relevant to your alcohol purchase. It might not even be recognized as valid in some places.&lt;/p&gt;
&lt;p&gt;If the Hawaiian DMV uses multiple keys to issue credentials, you’ll also reveal which of those keys was used. That’s unlikely to be a big deal, but worth keeping in mind as we look at alternative approaches.&lt;/p&gt;
&lt;p&gt;Revealing the number of fields is a relatively minor information leak. This constrains the design a little, but not in a serious way. Basically, it means that you should probably have the same set of fields for everyone.&lt;/p&gt;
&lt;p&gt;For instance, you can’t include only the “over_XX” age fields that are true; you have to include the false ones as well or the number of fields would reveal an approximate age. That is, avoid:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-javascript&quot;&gt;{ ..., &amp;quot;older_than&amp;quot;: [16, 18], ... }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note: Some formats allow individual items in lists like this to be committed separately. The name of the list is generally revealed in that case, but the specific values are hidden. These usually just use H(salt || value) as the commitment.&lt;/p&gt;
&lt;p&gt;And instead use:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-javascript&quot;&gt;{ ..., &amp;quot;over_16&amp;quot;: true, &amp;quot;over_18&amp;quot;: true, &amp;quot;over_21&amp;quot;: false, &amp;quot;over_55&amp;quot;: false, ... }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Expiration dates are tricky. For some purposes, like verifying that someone is allowed to drive, the verifier will need to know if the credential is not expired.&lt;/p&gt;
&lt;p&gt;On the other hand, expiry is probably not very useful for something like age verification. After all, it’s not like you get younger once your license expires.&lt;/p&gt;
&lt;p&gt;The exact choice of expiration date might also carry surprising information. Imagine that only one person was able to get a license one day because the office had to close or the machine broke down. If the expiry date is a fixed time after issuance, the expiry date on their license would then be unique to them, which means that revealing that expiration date would effectively be identifying them.&lt;/p&gt;
&lt;p&gt;The final challenge here is the least obvious and most serious shortcoming of this approach: linkability.&lt;/p&gt;
&lt;h2 id=&quot;linkability-and-selective-disclosure&quot;&gt;Linkability and selective disclosure&lt;/h2&gt;
&lt;p&gt;A salted hash credential carries several things that makes the credential itself identifiable. This includes the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The value of each commitment is unique and distinctive.&lt;/li&gt;
&lt;li&gt;The public key for the wallet.&lt;/li&gt;
&lt;li&gt;The signature that the issuer attaches to the credential.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these is unique, so if the same credential is used in two places, it will clearly indicate that this is the same person, even if the information that is revealed is very limited.&lt;/p&gt;
&lt;p&gt;For example, you might present an “over_21” to purchase alcohol in one place, then use the full credential somewhere else. If those two presentations use the same credential, those two sites will be able to match up the presentations. The entity that obtains the full credential can then share all that knowledge with the one that only knows you are over 21, without your involvement.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/vv-linkability.svg&quot; alt=&quot;A version of the issuer-holder-verifier diagram with multiple verifiers&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Even if the two sites only receive limited information, they can still combine the information they obtain — that you are over 21 and what you did on each site — into a profile. The building of that sort of profile online is known as &lt;a href=&quot;https://www.w3.org/2001/tag/doc/unsanctioned-tracking/&quot;&gt;unsanctioned tracking&lt;/a&gt; and generally regarded as a bad thing.&lt;/p&gt;
&lt;p&gt;This sort of matching is technically called &lt;strong&gt;verifier-verifier linkability&lt;/strong&gt;. The way that it can be prevented is to ensure that a completely fresh credential is used for every presentation. That includes a fresh set of commitments, a new public key from the wallet, and a new signature from the issuer (naturally, the thing that is being signed is new). At the same time, ensuring that the presentation doesn’t include any extraneous information, like expiry dates, helps.&lt;/p&gt;
&lt;p&gt;A system like this means that wallets need to be able to handle a whole lot of credentials, including fresh public keys for each. The wallet also needs to be able to handle cases where its store of credentials run out, especially when the wallet is unable to contact the issuer.&lt;/p&gt;
&lt;p&gt;Issuers generally need to be able to issue larger batches of credentials to avoid that happening. That involves a lot of computationally intensive work for the issuer. This makes wallets quite a bit more complex. It also increases the cost of running issuance services because they need better availability, not just because they need more issuance capacity.&lt;/p&gt;
&lt;p&gt;In this case, SD-BLS has a small advantage over salted hashes because its “unregroupability” property means that presentations with differing sets of attributes are not linkable by verifiers. That’s a weaker guarantee than verifier-verifier unlinkability, because presentations with the same set of attributes can still be linked by a verifier; for that, fresh credentials are necessary.&lt;/p&gt;
&lt;p&gt;Using a completely fresh credential is a fairly effective way to protect against linkability for different verifiers, but it does nothing to prevent &lt;strong&gt;verifier-issuer linkability&lt;/strong&gt;. An issuer can remember the values they saw when they issued the credential. A verifier can take any one of the values from a presentation they receive (commitments, public key, or signature) and ask the issuer to fill in the blanks. The issuer and verifier can then share anything that they know about the person, not limited to what is included in the credential.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/iv-linkability.svg&quot; alt=&quot;A version of the issuer-holder-verifier diagram with a bidirectional arrow between issuer and verifier&quot; /&gt;&lt;/p&gt;
&lt;p&gt;What the issuer and verifier can share isn’t limited to the credential.  They can share anything they know, not just the stuff that was included in the credential. Maybe McLovin needed to show a passport and a utility bill in order to get a license and the DMV kept a copy. The issuer could give that information to the verifier. The verifier can also share what they have learned about the person, like what sort of alcohol they purchased.&lt;/p&gt;
&lt;h3 id=&quot;useful-linkability&quot;&gt;Useful linkability&lt;/h3&gt;
&lt;p&gt;In some cases, linkability might be a useful or essential feature. Imagine that selective disclosure is used to authorize access to a system that might be misused. Selective disclosure avoids exposing the system to information that is not essential. Maybe the system is not well suited to safeguarding private information. The system only logs access attempts and the presentation that was used.&lt;/p&gt;
&lt;p&gt;In the event that the access results in some abuse, the abuse could be investigated using verifier-issuer linkability. For example, the access could be matched to information available to the issuer to find out who was responsible for the abuse.&lt;/p&gt;
&lt;p&gt;The IETF is developing a couple of salted hash formats (in &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt&quot;&gt;JSON&lt;/a&gt; and &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-spice-sd-cwt&quot;&gt;CBOR&lt;/a&gt;) that should be well suited to a number of applications where linkability is a desirable property.&lt;/p&gt;
&lt;p&gt;All of this is a pretty serious problem for use for something like online age verification. Having issuers, which are often government agencies, being in a position to trace activity, might have an undesirable &lt;a href=&quot;https://en.wikipedia.org/wiki/Chilling_effect&quot;&gt;chilling effect&lt;/a&gt;. This is something that legislators generally recognize and laws often include provisions that require unlinkability&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;In short, salted hash based systems only work if you trust the issuer.&lt;/p&gt;
&lt;h3 id=&quot;linkable-attributes&quot;&gt;Linkable attributes&lt;/h3&gt;
&lt;p&gt;There is not much point in avoiding linkability when the disclosed information is directly linkable. For instance, if you selectively disclose your name and date of birth, that information is probably unique or highly identifying. Revealing identifying information to a verifier makes verifier-issuer linkability easy; just like revealing the same information to two verifiers makes verifier-verifier linkability simple.&lt;/p&gt;
&lt;p&gt;This makes linkability for selective disclosure less concerning when it comes to revealing information that might be identifying.&lt;/p&gt;
&lt;p&gt;Unlinkability therefore tends to be most useful for non-identifying attributes. Simple attributes — like whether someone meets a minimum age requirement, holds a particular qualification, or has authorization — are less likely to be inherently linkable, so are best suited to being selectively disclosed.&lt;/p&gt;
&lt;h2 id=&quot;privacy-pass&quot;&gt;Privacy Pass&lt;/h2&gt;
&lt;p&gt;If the goal is to provide a simple signal, such as whether a person is older than a target age, &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9576&quot;&gt;Privacy Pass&lt;/a&gt; is specifically designed to prevent verifier-issuer linkability.&lt;/p&gt;
&lt;p&gt;Privacy Pass also includes options that split the issuer into two separate functions — an issuer and an attester — where the attester is responsible for determining if a holder (or client) has the traits required for token issuance and the issuer only creates the tokens. This might be used to provide additional privacy protection.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/privacy-pass.svg&quot; alt=&quot;The four entities of the Privacy Pass architecture: Issuer, Attester, Holder/Client, and Verifier/Service&quot; /&gt;&lt;/p&gt;
&lt;p&gt;A Privacy Pass issuer could produce a token that signifies possession of a given trait. Only those with the trait would receive the token. For age verification, the token might signify that a person is at a selected age or older.&lt;/p&gt;
&lt;p&gt;Token formats for Privacy Pass that include limited &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-privacypass-public-metadata-issuance&quot;&gt;public information&lt;/a&gt; are also defined, which might be used to support selective disclosure. This is far less flexible than the salted hash approach as a fresh token needs to be minted with the set of traits that will be public. That requires that the issuer is more actively involved or that the different sets of public traits are known ahead of time.&lt;/p&gt;
&lt;p&gt;Privacy Pass does not naturally provide verifier-verifier unlinkability, but a fresh token could be used for each usage, just like for the salted hash design. Some of the Privacy Pass modes can &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-privacypass-batched-tokens&quot;&gt;issue a batch of tokens&lt;/a&gt; for this reason.&lt;/p&gt;
&lt;p&gt;In order to provide tokens for different age thresholds or traits, an issuer would need to use different public keys, each corresponding to a different trait.&lt;/p&gt;
&lt;p&gt;Privacy Pass is therefore a credible alternative to the use of salted hash selective disclosure for very narrow cases. It is somewhat inflexible in terms of what can be expressed, but that could mean more deliberate additions of capabilities. The strong verifier-issuer unlinkability is definitely a plus, but it isn’t without shortcomings.&lt;/p&gt;
&lt;h3 id=&quot;key-consistency&quot;&gt;Key consistency&lt;/h3&gt;
&lt;p&gt;One weakness of Privacy Pass is that it depends on the issuer using the same key for everyone. The ideal privacy is provided when there is a single issuer with just one key for each trait. With more keys or more issuers, the key that is used to generate a token carries information, revealing who issued the token. This is just like the salted hash example where the verifier needs to learn that the Hawaiian DMV issued the credential.&lt;/p&gt;
&lt;p&gt;The privacy of the system breaks down if every person receives tokens that are generated using a key that is unique to them. This risk can be limited through the use of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-privacypass-key-consistency&quot;&gt;key consistency&lt;/a&gt; schemes. This makes the system a little bit harder to deploy and operate.&lt;/p&gt;
&lt;p&gt;As foreshadowed earlier, the same key switching concern also applies to a salted hash design if you don’t trust the issuer. Of course, we’ve already established that a salted hash design basically only works if you trust the issuer. Salted hash presentations are linkable based on commitments, keys, or signatures, so there is no real need to play games with keys.&lt;/p&gt;
&lt;h2 id=&quot;anonymous-credentials&quot;&gt;Anonymous credentials&lt;/h2&gt;
&lt;p&gt;A zero knowledge proof enables the construction of evidence that a prover knows something, without revealing that information. For an identity system, it allows a holder to make assertions about a credential without revealing that credential. That creates what is called an anonymous credential.&lt;/p&gt;
&lt;p&gt;Anonymous credentials are appealing as the basis for a credential system because the proofs themselves contain no information that might link them to the original credential.&lt;/p&gt;
&lt;p&gt;Verifier-issuer unlinkability is a natural consequence of using a zero knowledge proof. Verifier-verifier unlinkability would be guaranteed by providing a fresh proof for each verifier, which is possible without obtaining a fresh credential. The result is that anonymous credentials provide excellent privacy characteristics.&lt;/p&gt;
&lt;p&gt;Zero knowledge proofs trace back to systems of provable computation, which mean that they are potentially very flexible. A proof can be used to prove any property that can be computed. The primary cost is in the amount of computation it takes to produce and validate the proof&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;. If the underlying credential can be adjusted to support the zero knowledge system, these costs can be reduced, which is what the BBS signature scheme does. Unmodified credentials can be used if necessary.&lt;/p&gt;
&lt;p&gt;Thus, a proof statement for use in age verification might be a machine translation of the following compound statement:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;this holder has a credential signed by the Hawaiian DMV;&lt;/li&gt;
&lt;li&gt;the expiration date on the credential is later than the current date;&lt;/li&gt;
&lt;li&gt;the person is 21 or older (or the date of birth plus 21 years is earlier than the current date);&lt;/li&gt;
&lt;li&gt;the holder knows the secret key associated with the public key mentioned in the credential; and,&lt;/li&gt;
&lt;li&gt;the credential has not been used with the current verifier more than once on this day&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A statement in that form should be sufficient to establish that someone is old enough to purchase alcohol, while providing assurances that the credential was not stolen or reused. The only information that is revealed is that this is a valid Hawaiian license. We’ll see below how hiding that last bit is also possible and probably a good idea.&lt;/p&gt;
&lt;h3 id=&quot;reuse-protections&quot;&gt;Reuse protections&lt;/h3&gt;
&lt;p&gt;The last statement from the set of statements above provides evidence that the credential has not been shared with others. This condition, or something like it, is a necessary piece of building a zero-knowledge system. Otherwise, the same credential can be used and reused many times by multiple people.&lt;/p&gt;
&lt;p&gt;Limiting the number of uses doesn’t guarantee that a credential isn’t shared, but it limits the number of times that it can be reused. If the credential can only be used once per day, then that is how many times the credential can be misused by someone other than the person it was issued to.&lt;/p&gt;
&lt;p&gt;Choosing how many times a credential might be used will vary on the exact circumstances. For instance, it might not be necessary to have the same person present proof of age to an alcohol vendor multiple times per day. Maybe it would be reasonable for the store to remember them if they come back to make multiple purchases on any given day.  One use per day might be reasonable on that assumption.&lt;/p&gt;
&lt;p&gt;In practice, multiple rate limits might be used. This can make the system more flexible over short periods (to allow for people making multiple alcohol purchases in a day) but also stricter over the long term (because people rarely need to make multiple purchases every day). For example, age checks for the purchase of alcohol might combine a three per day limit with a weekly limit of seven. Multiple conditions can be easily added to the proof, with a modest cost.&lt;/p&gt;
&lt;p&gt;It is also possible for each verifier to specify their own rate limits according to their own conditions. A single holder would then limit the use of credentials according to those limits.&lt;/p&gt;
&lt;p&gt;Tracking usage is easy for a single holder. An actor looking to abuse credentials by sharing and reusing them has more difficulty. A bad actor would need to carefully coordinate their reuse of a credential so that any rate limits were not exceeded.&lt;/p&gt;
&lt;h3 id=&quot;hiding-the-issuer-of-credentials&quot;&gt;Hiding the issuer of credentials&lt;/h3&gt;
&lt;p&gt;People often do not get to choose who issues them a credential. Revealing the identity of an issuer might be more identifying than is ideal. This is especially true for people who have credentials issued by an atypical issuer.&lt;/p&gt;
&lt;p&gt;Consider that Europe is building a union-wide system of identity. That means that verifiers will be required to accept credentials from any country in the EU. Someone accessing a service in Portugal with an Estonian credential might be unusual if most people use a Portuguese credential. Even if the presentation is limited to something like age verification, the choice of issuer becomes identifying.&lt;/p&gt;
&lt;p&gt;This could also mean that a credential that should be valid is not recognized as such by an issuer, simply because they chose not to consider that issuer. Businesses in Greece might be required by law to recognize other EU credentials, but what about a credential issued by Türkiye?&lt;/p&gt;
&lt;p&gt;Zero knowledge proofs can also hide the issuer, only revealing that a credential was issued by one of a set of issuers. This means that a verifier is unable to discriminate on the basis of issuer. For a system that operates at scale, that creates positive outcomes for those who hold credentials from atypical issuers.&lt;/p&gt;
&lt;h2 id=&quot;credential-revocation&quot;&gt;Credential revocation&lt;/h2&gt;
&lt;p&gt;Perhaps the hardest problem in any system that involves the issuance of credentials is what to do when the credential suddenly becomes invalid. For instance, if a holder is a phone, what do you do if the phone is lost or stolen?&lt;/p&gt;
&lt;p&gt;That is the role of revocation. On the Web, certificate authorities are required to have revocation systems to deal with lost keys, attacks, change of ownership, and a range of other problems. For wallets, the risk of loss or compromise of wallets might also be addressed with revocation.&lt;/p&gt;
&lt;p&gt;Revocation typically involves the verifier confirming with the issuer that the credential issued to the holder (or the holder itself) has not been revoked. That produces a tweak to our original three-entity system as follows:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://lowentropy.net/posts/selective-disclosure/revocation.svg&quot; alt=&quot;Issuer-holder-verifier model with an arrow looping back from verifier to issuer&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Revocation is often the most operationally challenging aspect of running identity infrastructure. While issuance might have real-time components — particularly if the issuer needs to ensure a constant supply of credentials to maintain unlinkability — credentials might be issued ahead of time.  However, revocation often requires a real-time response or something close to it. That makes a system with revocation much more difficult to design and operate.&lt;/p&gt;
&lt;h3 id=&quot;revoking-full-presentations&quot;&gt;Revoking full presentations&lt;/h3&gt;
&lt;p&gt;When a full credential or more substantive information is compromised, lack of revocation creates a serious impersonation risk. The inability to validate biometrics online means that a wallet might be exploited to perform identity theft or similarly serious crimes. Being able to revoke a wallet could be a necessary component of such a system.&lt;/p&gt;
&lt;p&gt;The situation with a complete credential presentation, or presentations that include identifying information, is therefore fairly simple. When the presentation contains identifying information, like names and addresses, preventing linkability provides no benefit. So providing a direct means of revocation checking is easy.&lt;/p&gt;
&lt;p&gt;With verifier-issuer linkability, the verifier can just directly ask the issuer whether the credential was revoked. This is not possible if there is a need to perform offline verification, but it might be possible to postpone such checks or rely on batched revocations (&lt;a href=&quot;https://blog.mozilla.org/security/2020/01/09/crlite-part-1-all-web-pki-revocations-compressed/&quot;&gt;CRLite&lt;/a&gt; is a great example of a batched revocation system). Straightforward or not, providing adequate scale and availability make the implementation of a reliable revocation system a difficult task.&lt;/p&gt;
&lt;h3 id=&quot;revoking-anonymous-credentials&quot;&gt;Revoking anonymous credentials&lt;/h3&gt;
&lt;p&gt;When you have anonymous credentials, which protect against verifier-issuer linkability, revocation is very challenging. A zero-knowledge assertion that the credential has not been revoked is theoretically possible, but there are a number of serious challenges. One issue is that proof of non-revocation depends on providing real-time or near-real-time information about the underlying credential. Research into solving the problem is still active.&lt;/p&gt;
&lt;p&gt;It is possible that revocation for some selective disclosure cases is unnecessary. Especially those cases where zero-knowledge proofs are used. We have already accepted some baseline amount of abuse of credentials, by virtue of permitting non-identifying and unlinkable presentations. Access to a stolen credential is roughly equivalent to sharing or borrowing a credential. So, as long as the overall availability of stolen credentials is not too high relative to the availability of borrowed credentials, the value of revocation is low. In other words, if we accept some risk that credentials will be borrowed, then we can also tolerate some use of stolen credentials.&lt;/p&gt;
&lt;h3 id=&quot;revocation-complications&quot;&gt;Revocation complications&lt;/h3&gt;
&lt;p&gt;Even with linkability, revocation is not entirely trivial. Revocation effectively creates a remote kill switch for every credential that exists. The safeguards around that switch are therefore crucial in determining how the system behaves.&lt;/p&gt;
&lt;p&gt;For example, if any person can ask for revocation, that might be used to deny a person the use of a perfectly valid credential. There are well documented cases where organized crime has deprived people of access to identification documents in order to limit their ability to travel or access services.&lt;/p&gt;
&lt;p&gt;These problems are more tied to the processes that are used, rather than the technical design. However, technical measures might be used to improve the situation. For instance, SD-BLS suggests that threshold revocation be used, where multiple actors need to agree before a credential can be revoked.&lt;/p&gt;
&lt;p&gt;All told, and especially if dealing with revocation on the Web has taught us anything, it might not be worth the effort to add revocation. It might be easier — and no less safe — to frequently update credentials.&lt;/p&gt;
&lt;h2 id=&quot;authorizing-verifiers&quot;&gt;Authorizing Verifiers&lt;/h2&gt;
&lt;p&gt;Selective disclosure systems can fail to achieve their goals if there is a power imbalance between verifiers and holders. For instance, a verifier might withhold services unless a person agrees to provide more information than the verifier genuinely requires.  That is, the verifier might effectively extort people to provide non-essential information. A system that can withhold information to improve privacy is pointless unless attempts to exercise withholding are supported.&lt;/p&gt;
&lt;p&gt;One way to work around this is to require that verifiers be certified before they can request certain information.  For instance, EU digital identity laws require that it be possible to restrict who can request a presentation. This might involve the certification of verifiers, so that verifiers would be required to provide holders with evidence that they are authorized to receive certain attributes.&lt;/p&gt;
&lt;p&gt;A system of verifier authorization could limit overreach, but it might also render credentials ineffective in unanticipated situations, including for interactions in foreign jurisdictions.&lt;/p&gt;
&lt;p&gt;Authorizations also need monitoring for compliance. Businesses — particularly larger businesses that engage in many activities — might gain authorization for many different purposes.  Abuse might occur if a broad authorization is used where a narrower authorization is needed. That means more than a system of authorization, but creating a way to ensure that businesses or agencies are accountable for their use of credentials.&lt;/p&gt;
&lt;h2 id=&quot;quantum-computers&quot;&gt;Quantum computers&lt;/h2&gt;
&lt;p&gt;Some of these systems depend on cryptography that is only classically secure. That is, a sufficiently powerful quantum computer might be able to attack the system.&lt;/p&gt;
&lt;p&gt;Salted hash selective disclosure relies only on digital signatures and hash functions, which makes them the most resilient to attacks that use a quantum computer. However, many of the other systems described rely on some version of the discrete logarithm problem being difficult, which can make them vulnerable. Predicting when a cryptographically-relevant quantum computer might be created is as hard as any other attempt to look into the future, but we can understand some of the risks.&lt;/p&gt;
&lt;p&gt;Quantum computers present two potential threats to any system that relies on classical cryptographic algorithms: forgery and linkability.&lt;/p&gt;
&lt;p&gt;A sufficiently powerful quantum computer might use something like &lt;a href=&quot;https://en.wikipedia.org/wiki/Shor%27s_algorithm&quot;&gt;Shor’s algorithm&lt;/a&gt; to recover the secret key used to issue credentials. Once that key has been obtained, new credentials could be easily forged. Of course, forgeries are only a threat after the key is recovered.&lt;/p&gt;
&lt;p&gt;Some schemes that rely on classical algorithms could be vulnerable to linking by a quantum computer, which could present a very serious privacy risk. This sort of linkability is a serious problem because it potentially affects presentations that are made before the quantum computer exists. Presentations that were saved by verifiers could later be linked.&lt;/p&gt;
&lt;p&gt;Some of the potential mechanisms, such as the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-bbs-signatures&quot;&gt;BBS algorithm&lt;/a&gt;, are still able to provide privacy, even if that the underlying cryptography is broken by a quantum computer. The quantum computer would be able to create forgeries, but not break privacy by linking presentations.&lt;/p&gt;
&lt;p&gt;If we don’t need to worry about forgery until a quantum computer exists and privacy is maintained even then, we are therefore largely concerned with how long we might be able to use these systems. That gets back to the problem of predictions and balancing the cost of deploying a system against how long the system is going to remain secure. Credential systems take a long time to deploy, so — while they are not vulnerable to a future advance in the same way as encryption — planning for that future is likely necessary.&lt;/p&gt;
&lt;h2 id=&quot;the-limitations-of-technical-solutions&quot;&gt;The limitations of technical solutions&lt;/h2&gt;
&lt;p&gt;If there is a single conclusion to this article is that the problems that exist in identity systems are not primarily technical. There are several very difficult problems to consider when establishing a system. Those problems only start with the selection of technology.&lt;/p&gt;
&lt;p&gt;Any technological choice presents its own problems. Selective disclosure is a powerful tool, but with limited applicability. Properties like linkability need to be understood or managed. Otherwise, the actual privacy properties of the system might not meet expectations. The same goes for any rate limits or revocation that might be integrated.&lt;/p&gt;
&lt;p&gt;How different actors might participate in the system needs further consideration. Decisions about who might act as an issuer in the system needs a governance structure. Otherwise, some people might be unjustly denied the ability to participate.&lt;/p&gt;
&lt;p&gt;For verifiers, their incentives need to be examined. A selective disclosure system might be built to be flexible, which might seem to empower people with choice about what they disclose, however that might be abused by powerful verifiers to extort additional information from people.&lt;/p&gt;
&lt;p&gt;All of which to say is: better technology does not always help as much as you might hope. Many of the problems are people problems, social problems, and governance problems, not technical problems. Technical mechanisms tend to only change the shape of non-technical problems. That is only helpful if the new shape of the problem is something that people are better able to deal with.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is different from licensing to drive, where most countries recognize driving permits from other jurisdictions. That’s probably because buying alcohol is a simple check based on an objective measure, whereas driving a car is somewhat more involved. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Well, most of the US.  It has to do with &lt;a href=&quot;https://en.wikipedia.org/wiki/National_Minimum_Drinking_Age_Act&quot;&gt;highways&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The issuer might want some additional assurances, like some controls over how the credential can be accessed, controls over what happens if a device is lost, stolen, or sold, but they all basically reduce to this basic idea. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;If the presentation didn’t include information about the verifier and time of use, one verifier could copy the presentation they receive and impersonate the person. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Rainbow tables can handle relatively large numbers of values without too much difficulty. Even some of the richer fields can probably be put in a rainbow table. For example, there are about 1.4 million people in Hawaii. All the values for some fields are known, such as the complete set of possible addresses. Even if every person has a unique value, a very simple rainbow table for a field would take a few seconds to build and around 100Mb to store, likely a lot less. A century of birthdays would take much less storage&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;In practice, a century of birthdays (40k values) will have no collisions with even a short hash. You don’t need much more than 32 bits for that many values. Furthermore, if you are willing to have a small number of values associated with each hash, you can save even more space. 40k values can be indexed with a 16-bit value and a 32-bit hash will produce very few collisions. A small number of collisions are easy to resolve by hashing a few times, so maybe this could be stored in about 320kB with no real loss of utility. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There are a few things that need care, like whether different attributes can be bound to a different wallet key and whether the attributes need to show common provenance. With different keys, the holder might mix and match attributes from different people into a single presentation. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;To continue the tortured analogy, imagine that you take a photo of the credential to present, so that the recipient can’t just scratch off the stuff that you didn’t. Or maybe you add a clear coat of enamel. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For example, Article 5a, 16 of the &lt;a href=&quot;https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32024R1183&quot;&gt;EU Digital Identity Framework&lt;/a&gt; requires that wallets “not allow providers of electronic attestations of attributes or any other party, after the issuance of the attestation of attributes, to obtain data that allows transactions or user behaviour to be tracked, linked or correlated, or knowledge of transactions or user behaviour to be otherwise obtained, unless explicitly authorised by the user”. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A proof can be arbitrarily complex, so this isn’t always cheap, but most of the things we imagine here are probably very manageable. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This isn’t quite accurate.  The typical approach involves the use of tokens that repeat if the credential is reused too often. That makes it possible to catch reuse, not prevent it. &lt;a href=&quot;https://lowentropy.net/posts/selective-disclosure/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Thoughts on TAG Design Reviews</title>
    <link href="https://lowentropy.net/posts/tag2023/"/>
    <updated>2023-11-21T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/tag2023/</id>
    <content type="html">&lt;p&gt;Before I start on my thoughts, if you work for a W3C member organization, please
head to &lt;a href=&quot;https://www.w3.org/2023/10/tag-nominations&quot;&gt;the 2023 TAG Election
page&lt;/a&gt;.  Voting is open until
2023-12-14.&lt;/p&gt;
&lt;p&gt;If you are considering how you might like to rank me when voting, read on.  I
can’t promise that this post will provide much additional context, but it might.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;The W3C &lt;acronym title=&quot;Technical Architecture Group&quot;&gt;TAG&lt;/acronym&gt; is a bit of
a strange institution.  The TAG occupies a position of some privilege due to its
standing within the W3C and the long-standing participation and sponsorship of
Sir Tim Berners-Lee.&lt;/p&gt;
&lt;p&gt;The TAG also has a history marked by notable documents produced under its
letterhead.  The TAG, through its &lt;a href=&quot;https://tag.w3.org/findings/&quot;&gt;findings&lt;/a&gt;, has
been responsible for recognizing and analyzing certain key trends in the
evolution of the Web, providing some key pieces of architectural guidance.  The
TAG also publishes documents with general guidance for people seeking to improve
the Web, like &lt;a href=&quot;https://w3ctag.github.io/design-principles/&quot;&gt;design principles&lt;/a&gt;
and a &lt;a href=&quot;https://www.w3.org/TR/security-privacy-questionnaire/&quot;&gt;security and privacy
questionnaire&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;On a day-to-day basis, however, the TAG provides hands-on guidance to people
looking to add new capabilities to the Web, primarily through &lt;a href=&quot;https://github.com/w3ctag/design-reviews&quot;&gt;design
reviews&lt;/a&gt;.  Records of early reviews
trace back to 2013 in the TAG repository, but the practice has deeper roots.&lt;/p&gt;
&lt;p&gt;The modern review record starts with a meager 5 reviews in the latter half of
2013. More recently, the TAG closed a total of 85 design reviews in 2022&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/tag2023/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.  Already, in 2023, there have been 106 design review
requests opened.&lt;/p&gt;
&lt;p&gt;The function of the TAG as a body primarily focused on reviewing new Web APIs is
one that took a while to settle.  A key driver of this increase in volume has
clearly been the inclusion of TAG review as a formal precondition for shipping
Web-visible changes in the &lt;a href=&quot;https://www.chromium.org/blink/launching-features/&quot;&gt;Chromium
project&lt;/a&gt;.  Chromium
consequently drives a lot of this review load with 73 of the 106 new requests
that arrived in 2023 clearly marked as originating from “Google”, “Chromium”, or
“Microsoft” as a primary driver or funder of the work&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/tag2023/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.
That is nearly 70% of the total review load attributed to Chromium.  This is in
addition to those design reviews that were initiated on behalf of a W3C group in
which Chromium contributors were instrumental in the work.&lt;/p&gt;
&lt;p&gt;Obviously, at a rate of more than 2 reviews a week, that’s a fairly major outlay
in terms of time for the TAG.  Proposals vary in size, but some of them are
quite substantial.  A good review requires reading lengthy explainers and
specifications, filling gaps in understanding by talking to different people,
considering alternative options, and building an understanding of the broader
context.  A proper review for a more substantial proposal can take weeks or even
months to research, discuss, and write up.&lt;/p&gt;
&lt;p&gt;The TAG is &lt;a href=&quot;https://github.com/w3c/AB-memberonly/issues/171&quot;&gt;expanding in size&lt;/a&gt;
this year. An increase to 12 members (8 elected, 4 appointed) does give the TAG
more capacity, albeit with added coordination costs reducing efficiency.  This
is predicated on the idea is that reviews are the most important function of the
TAG.  That being the case, then adding more capacity seems like a reasonable
reaction.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;That an action is superficially reasonable is not the standard to apply when
making such a decision.  As with a design review, an examination of the
alternatives is generally illuminating.  Once those alternatives are understood,
we might again conclude that the proposal on the table is the best possible
path, but we do so with a more complete understanding of what opportunities are
lost or foreclosed as a result.  The &lt;a href=&quot;https://www.w3.org/2023/05/12-ab-minutes.html#t09&quot;&gt;AB
minutes&lt;/a&gt; of the decision do
not reflect that process, but then they are only responding to a request from
the TAG.&lt;/p&gt;
&lt;p&gt;There are several other equally reasonable ways of dealing with increased
workload.  If reviews are taking too long, it might be possible to find ways to
make reviewing easier or faster.  Perhaps the TAG has exhausted their options in
that area already.  Maybe they have looked at selectively rejecting more design
review requests.  Maybe they have considered finding ways to offload review work
onto other bodies, like &lt;a href=&quot;https://www.w3.org/Privacy/IG/&quot;&gt;PING&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;From my limited perspective, it is not clear that these avenues of investigation
have been fully explored.  For instance, I have good experience with effective
directorate system that the &lt;acronym title=&quot;Internet Engineering Steering
Group&quot;&gt;IESG&lt;/acronym&gt; uses to greatly alleviate their workload, but I see no
evidence of an effort to delegate in a similar fashion.&lt;/p&gt;
&lt;p&gt;TAG members each volunteer time from their ordinarily busy day jobs, so any
excess load spent on reviewing is time that is not available for higher
functions.  In addition to review load, the TAG has a role in &lt;a href=&quot;https://www.w3.org/2023/Process-20231103/#council&quot;&gt;W3C
Councils&lt;/a&gt; and other critical
procedural functions in the W3C process.  Those tasks are generally not easily
delegated or dealt with by a subset of TAG members.&lt;/p&gt;
&lt;p&gt;I am supportive of efforts to better use the TAG in for key procedural
functions, like the W3C Council. Those functions make the TAG more important in
a good way.  The W3C needs people in the role who have good judgment and the
experience to inform that judgment.&lt;/p&gt;
&lt;p&gt;Along with that, it is important to reserve some space for the TAG to provide
technical leadership for the W3C and the Web community as a whole.  After time
spent on the procedural functions demanded by the process, design reviews have
the potential to completely drain any time TAG members have to dedicate to the
role, leaving no spare capacity.  Ideally, there needs to be some remaining
space for the careful and thoughtful work that leadership demands.&lt;/p&gt;
&lt;p&gt;Effective technical leadership depends somewhat on the TAG being exposed to how
the platform is evolving.  Reviews are a great way to gain some of that
exposure, but that does not mean that the TAG needs to review every single
proposal.&lt;/p&gt;
&lt;p&gt;I don’t have a specific plan yet.  If appointed, it will take some time to
understand what the role is and what options are available.  I consider myself
quite capable of performing that sort of review and I expect it would be easy to
settle into that function.  But I have no intent of letting design reviews
dominate my time; the TAG – and the Web – deserves better.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A note
on the numbers here: The TAG has a template that they use for design reviews and
I have only selected reviews that include the string “requesting a TAG review”,
as present in that template.  There were other issues closed in this period,
some of which are probably also pre-template design reviews, but I haven’t
carefully reviewed those. &lt;a href=&quot;https://lowentropy.net/posts/tag2023/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For posterity, this is
the search I used: &lt;code&gt;opened_since(2023-01-01) not(opened_since(2024-01-01)) body(&amp;quot;requesting a TAG review&amp;quot;) body(&amp;quot;(?:driving the (?:design|specification)|funded by):&#92;&#92;s+&#92;&#92;[?(?:Microsoft|Google|Chromium)&amp;quot;))&lt;/code&gt;,
using &lt;a href=&quot;https://github.com/MikeBishop/archive-repo/tree/main/post-processing/viewer&quot;&gt;a tool I
built&lt;/a&gt;
in combination with the excellent &lt;a href=&quot;https://github.com/MikeBishop/archive-repo&quot;&gt;GitHub issue archival
tool&lt;/a&gt; that Mike Bishop wrote. &lt;a href=&quot;https://lowentropy.net/posts/tag2023/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Fraud, Abuse, Fingerprinting, Privacy, and Openness</title>
    <link href="https://lowentropy.net/posts/fraud/"/>
    <updated>2023-08-23T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/fraud/</id>
    <content type="html">&lt;p&gt;Fraud and abuse online are pretty serious problems.  How sites manage fraud is
something of a mystery to most people.  Indeed, as this post will show, that’s
deliberate.&lt;/p&gt;
&lt;p&gt;This post provides an outline of how fraud management operates.  It looks at the
basic techniques that are used and the challenges involved.  In doing so, it
explores the tension between fraud management and privacy.&lt;/p&gt;
&lt;p&gt;Hopefully this post helps you understand why fingerprinting is bad for privacy;
why you should nevertheless be happy that your bank is fingerprinting you; and,
why efforts to replace fingerprinting are unlikely to change anything.&lt;/p&gt;
&lt;p&gt;Fraud and abuse are a consequence of the way the Web works.  Recognizing that
these are a part of the cost of a Web that values privacy, openness, and equity
is hard, but I can’t see a better option.&lt;/p&gt;
&lt;h2 id=&quot;what-sorts-of-fraud-and-abuse%3F&quot;&gt;What sorts of fraud and abuse?&lt;/h2&gt;
&lt;p&gt;This post concentrates on the conduct of fraud or abuse using online services.
Web-based services mostly, but mobile apps and similar services have similar
concerns.&lt;/p&gt;
&lt;p&gt;The sorts of fraud and abuse of most interest are those that operate at scale.
One-off theft needs different treatment.  Click fraud in advertising is a good
example.  Click fraud is where a site seeks to convince advertisers that ads
have been shown to people in order to get more money.  Click fraud is a constant
companion to the advertising industry, and one that is unlikely to ever go away.
Managing click fraud is an important part of participating in advertising, and
something that affects everyone that uses online services.&lt;/p&gt;
&lt;p&gt;Outside of advertising, fraud management techniques&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; are also used to manage the risk of fake accounts that
are created for fraud or abuse purposes.  Online stores and banks also use fraud
management as part of an overall strategy for managing the risk of payment fraud
or theft.&lt;/p&gt;
&lt;p&gt;This is a very high-level overview, so most of this document applies equally to
lots of different fraud and abuse scenarios.  Obviously, each situation will be
different, but I’m glossing over the details.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;I find the parallels between fraud management and denial of service mitigation
interesting.  I’ve called out some of those similarities and differences
throughout.&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id=&quot;understanding-online-fraud-and-abuse&quot;&gt;Understanding online fraud and abuse&lt;/h2&gt;
&lt;p&gt;Let’s say that you have a site that makes some information or service available.
This site will attract clients, which we can split into two basic groups:
clients that the site wants to serve, and clients that the site does not want to
serve.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;Why the site does not want to serve the clients from the latter group does not
matter that much, but there are some common themes we tend to see.
Distinguishing between humans and bots is a very common goal.
&lt;a href=&quot;https://en.wikipedia.org/wiki/CAPTCHA&quot;&gt;CAPTCHAs&lt;/a&gt; are supposed to be able to
distinguish this.  Of course, CAPTCHAs have always had very poor accessibility
properties and increasingly, computers are better at solving CATCHAs than
humans&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;That doesn’t stop sites from wanting to be able to pick out a bot.  For
advertising cases, sites will want to serve humans – after all, bots are
unlikely to change their purchasing habits as a result of “viewing” an ad.
Similarly, sites that provide goods that are limited in quantity – such as theatre
tickets or limited run goods like sneakers&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; – might prefer to ensure that their
inventory is only sold to people.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;The attacker in this model seeks to access the service for some reason.  In
order to do so, the attacker attempts to convince sites that they are a real
client.&lt;/p&gt;
&lt;p&gt;For click fraud, a site might seek to convince its advertising partners that ads
were shown to real people.  The goal is to convince the advertiser to pay the
fraudlent site more money.  Sophisticated click fraud can also involve faking
clicks or &lt;a href=&quot;https://support.google.com/google-ads/answer/6365?hl=en&quot;&gt;ad
conversions&lt;/a&gt; in an
effort to falsely convince the advertiser that the ads on the fraudulent site
are more useful as they are responsible for sales.&lt;/p&gt;
&lt;p&gt;An adversary rarely gains much by performing a single instance of fraud.  They
will often seek to automate fraud, accessing the service as many times as
possible.  Fraud at scale can be very damaging, but it also means that it is
easier to detect.&lt;/p&gt;
&lt;p&gt;Automation allows fraud to be conducted at scale, but it also creates telltales:
signals that allow an attack to be recognized.&lt;/p&gt;
&lt;h3 id=&quot;detection&quot;&gt;Detection&lt;/h3&gt;
&lt;p&gt;Detection is the first stage for anyone looking to defeat fraud or abuse.  To do
that, site operators will look for anomalies of any sort.  Maybe the attack will
appear as an increase in incoming requests or a repetitive pattern of accesses.&lt;/p&gt;
&lt;p&gt;Repetition might be a key to detecting fraud.  An attacker might try to have
their attacks blend in with real humans that are also accessing the system.  An
attacker’s ability to mimic human behaviour is usually limited, as they often
hope to execute many fraudulent transactions.  Attackers have to balance the
risk that they are detected against the desire to complete multiple actions
before they are detected.&lt;/p&gt;
&lt;p&gt;Detecting fraud and abuse relies on a range of techniques.  Anti-fraud people
generally keep details of their methods secret, but we know that they use both
automated and manual techniques.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Automated systems generally use machine learning that is trained on the
details of past attacks.  This scales really well and allows for repeat
attacks to be detected quickly and efficiently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Human experts can be better at recognizing new forms of attack.  Attacks that
are detected by automated systems can be confirmed by humans before deploying
interventions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, attackers are also constantly trying to adapt their techniques to
evade detection.  Detecting an attack can take time.&lt;/p&gt;
&lt;h3 id=&quot;identification%2Fclassification&quot;&gt;Identification/classification&lt;/h3&gt;
&lt;p&gt;It is not enough to know that fraud is occurring.  Once recognized, the pattern
of fraudulent behaviour needs to be classified, so that future attacks can be
recognized.&lt;/p&gt;
&lt;p&gt;As noted, most fraud is automated in some way.  Even if humans are involved, to
operate at any significant scale, even humans will be operating to a script.
Whether executed by machines or humans, the script will be designed to evade
existing defenses.  This means that attacks need to be carefully scripted, which
can produce patterns.  If a pattern can be found, attempts at fraud can be
distinguished from genuine attempts from people to visit the site.&lt;/p&gt;
&lt;p&gt;Patterns in abuse manifest in one of two ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Common software&lt;/em&gt;.  If attackers only use a specific piece of hardware or
software, then any common characteristics might be revealed by
fingerprinting.  Even if the attacker varies some characteristics (like the
&lt;code&gt;User-Agent&lt;/code&gt; header or similar obvious things), other characteristics might
stay the same, which can be used to recognize the attack.  This is why
browser fingerprinting is a valuable tool for managing fraud.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Common practices&lt;/em&gt;.  Software or scripted interaction can produce fixed
patterns of behaviour that can be used to recognize an attempted attack.
Clues might exist in the timing of actions or the consistency of interaction
patterns.  For instance, automated fraud might not exhibit the sorts of
variance in mouse movements that a diverse set of people could.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The script that is followed by an attacker might try to vary some of these
things.  However, unless the attack script is able to simulate the sorts of
diversity that real people do – which is unlikely – any resulting common
patterns can be used to identify likely attempts at fraud.&lt;/p&gt;
&lt;p&gt;Once a pattern is established, future attempts can be recognized.  Also, if
enough information has been recorded from past interactions, previously
undetected fraud might now be identifiable.&lt;/p&gt;
&lt;p&gt;Learned patterns can sometimes be used on multiple sites.  If an attack is
detected and thwarted on one site, similar attacks on other sites might be
easier to identify.  Fraud and abuse detection services that operate across many
sites can therefore be very effective at detecting and mitigating attacks on
multiple sites.&lt;/p&gt;
&lt;h4 id=&quot;fingerprinting-and-privacy&quot;&gt;Fingerprinting and privacy&lt;/h4&gt;
&lt;p&gt;Browser makers generally regard browser fingerprinting as an attack on user
privacy.  The &lt;a href=&quot;https://amiunique.org/&quot;&gt;fingerprint of a browser&lt;/a&gt; is consistent
across sites in ways that are hard to control.  Browsers can have unique or
nearly-unique fingerprints, which means that people can be effectively
identified and &lt;a href=&quot;https://www.w3.org/2001/tag/doc/unsanctioned-tracking/&quot;&gt;tracked&lt;/a&gt;
using the fingerprint of their browser, against their wishes or expectations.&lt;/p&gt;
&lt;p&gt;Fingerprinting used this way undermines controls that browsers use to maintain
&lt;a href=&quot;https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10/&quot;&gt;contextual integrity&lt;/a&gt;.
Circumventing these controls is unfortunately widespread.  Services exist that
offer “cookie-less tracking” capabilities, which can including linking
cross-site activity using browser fingerprinting or “primary identifiers”&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Fingerprinting options in browsers continue to evolve in two directions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;New browser features, especially those with personalization or hardware
interactions, can expand the ways in which browsers might become more
identifiable through fingerprinting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Browser privacy engineers are constantly reducing the ways in which browsers
can be fingerprinted.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Though these efforts often pull in different directions, the general trend is
toward reduced effectiveness of fingerprinting.  Browsers are gradually becoming
more homogenous in their observable behaviour despite the introduction of new
capabilities.  New features that might be used for fingerprinting tend not to be
accessible without active user intervention, making them far less reliable as a
means of identification.  Existing rich sources of fingerprinting information –
like plugin or font enumeration – will eventually be far more limited.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;I’m deliberately ignoring the use of IP addresses for client identification.  IP
addresses are still a very effective tool for managing fraud and abuse, just as
they are crucial tools for managing denial of service risk.  IP provide some
amount of information that can increase the effectiveness of fingerprinting,
sometimes dramatically.  Like other fingerprinting options, IP addresses are
something that might become less useful for fingerprint as time goes by.  We
hope.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Reductions in the effectiveness of fingerprinting are unlikely to ever result in
every browser looking identical.  More homogenous browser fingerprints makes the
set of people who share a fingerprint larger.  In turn, this only reduces the
odds that a site can successfully reidentify someone using a fingerprint.&lt;/p&gt;
&lt;p&gt;Reduced effectiveness of fingerprinting might limit the ability of sites in
distinguishing between real and abusive activity.  This places stronger reliance
on other signals, like behavioural cues.  It might also mean that additional
checks are needed to discriminate between suspicious and wanted activity, though
this comes with its own hazards.&lt;/p&gt;
&lt;p&gt;Even when fingerprinting is less useful, fingerprints can still help in managing
fraud.  Though many users might share the same fingerprint, additional scrutiny
can be reserved for those browsers that share a fingerprint with the attacker.&lt;/p&gt;
&lt;h3 id=&quot;mitigation-strategies&quot;&gt;Mitigation strategies&lt;/h3&gt;
&lt;p&gt;Once a particular instance of fraud is detected and a pattern has been
established, it becomes possible to mitigate the effects of the attack.  This
can involve some difficult choices.&lt;/p&gt;
&lt;p&gt;With the difficulty in detecting fraud, sites often tolerate extensive fraud
before they are able to start implementing mitigation.  Classification takes
time and can be error prone.  Furthermore, sites don’t want to annoy their
customers by falsely accusing them of fraud.&lt;/p&gt;
&lt;h4 id=&quot;stringing-attackers-along&quot;&gt;Stringing attackers along&lt;/h4&gt;
&lt;p&gt;Tolerance of apparent abuse can have other positive effects.  A change in how a
site reacts to attempted abuse might tip an attacker off that their method is no
longer viable.  To that end, a site might allow abuse to continue, without any
obvious reaction&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;A site that reacts to fraud in obvious ways will also reveal when fraud has
escaped detection.  This can be worse, as it allows an attacker to learn when
their attack was successful.  Tolerating fraud attempts deprives the attacker of
immediate feedback.&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;This is where fraud mitigation differs from something like denial of service
attacks.  In a denial of service attack, the attacker exhausts resources.  Here,
attacks are often swift, as attackers attempt to surprise defenders or catch
them off guard.  Detection, identification, and mitigation needs to occur
rapidly to blunt this sort of attack.&lt;/p&gt;
&lt;p&gt;Denial of service mitigation generally involves finding the cheapest possible
way to block attacks.  Leading an attacker to believe that they have not been
detected does little good when the resource that should be protected is actively
being expended.&lt;/p&gt;
&lt;/aside&gt;
&lt;p&gt;Delaying the obvious effects of mitigation allows abuse detection to remain
effective for longer. Similarly, providing feedback about abuse in the aggregate
might prevent an attacker from learning when specific tactics were successful.
Attackers that receive less feedback or late feedback cannot adapt as quickly
and so are able to evade detection for a smaller proportion of the overall time.&lt;/p&gt;
&lt;h4 id=&quot;addressing-past-abuse&quot;&gt;Addressing past abuse&lt;/h4&gt;
&lt;p&gt;A delayed response depends on being able to somehow negate or mitigate the
effect of fraud from the past.  This is also helpful where instances of fraud or
abuse previously escaped detection.&lt;/p&gt;
&lt;p&gt;For something like click fraud, the effect of fraud is often payment, which is
not immediate.  The cost of fraud can be effectively managed if it can be
detected before payment comes due.  The advertiser can refuse to pay for
fraudulent ad placements and disqualify any conversions that are attributed to
them.  The same applies to credit card fraud, where settlement of payments can
be delayed to allow time for fraudulent patterns to be detected.&lt;/p&gt;
&lt;p&gt;It is not always possible to retroactively mitigate fraud or delay its effect.
Sites can instead require additional checks or delays. These might not deprive
an attacker of feedback on whether their evasive methods were successful, but
changes in response could thwart or slow attacks.&lt;/p&gt;
&lt;h3 id=&quot;security-by-obscurity&quot;&gt;Security by obscurity&lt;/h3&gt;
&lt;p&gt;As someone who works in other areas of security, this overall approach to
managing fraud seems very … brittle.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle&quot;&gt;Kerckhoffs’s principle&lt;/a&gt;
– which guides the design of most security systems – says that you design
systems that depend only on protecting the key and not keeping the details of
how a system is built secret.  A system design that is public knowledge can be
analysed and improved upon by many.  Keeping the details of the system secret,
known as security by obscurity, is considered bad form and usually considered
indicative of a weak system design.&lt;/p&gt;
&lt;p&gt;Here, security assurances rely very much on security by obscurity.  Detecting
fraud depends on spotting patterns, then building ways of recognizing those
patterns.  An attacker that can avoid detection might be able to conduct fraud
with impunity.  That is, the system of defense relies on techniques so fragile
that knowledge of their details would render them ineffectual.&lt;/p&gt;
&lt;h2 id=&quot;is-there-hope-for-new-tools%3F&quot;&gt;Is there hope for new tools?&lt;/h2&gt;
&lt;p&gt;There are some technologies that offer some hope of helping manage fraud and
abuse risk.  However, my expectation is that these will only support existing
methods.&lt;/p&gt;
&lt;p&gt;Any improvements these might provide is unlikely to result in changes in
behaviour.  Anything that helps attackers avoid detection will be exploited to
the maximum extent possible; anything that helps defenders detect fraud or abuse
will just be used to supplement existing information sources.&lt;/p&gt;
&lt;h3 id=&quot;privacy-pass&quot;&gt;Privacy Pass&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-privacypass-architecture&quot;&gt;Privacy
Pass&lt;/a&gt;,
offers a way for sites to exchange information about the trustworthiness of
their visitors.  If one site decides that someone is trustworthy, it can give
the browser an anonymous token.  Other sites can be told that someone is
trustworthy by passing them this token.&lt;/p&gt;
&lt;p&gt;Ostensibly, Privacy Pass tokens cannot carry information, only the presence (or
absence) of a token carries any information.  A browser might be told that the
token means “trustworthy”, but it could mean anything&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.  That
means that the token issuer needs to be trusted.&lt;/p&gt;
&lt;p&gt;How a site determines whether to provide a token also has consequences.  Take
Apple’s &lt;a href=&quot;https://developer.apple.com/news/?id=huqjyh7k&quot;&gt;Private Access Tokens&lt;/a&gt;,
which are supposed to mean that the browser is trustworthy, but they really
carry a cryptographically-backed assertion that the holder has an Apple device.
For sites looking to find a lucrative advertising audience, this provides a
strong indicator that a visitor is rich enough to be able to afford Apple
hardware.  That is bankable information.&lt;/p&gt;
&lt;p&gt;This is an example of how the method used to decide whether to provide a token
can leak.  In order to protect this information, a decent proportion of tokens
need to use alternative methods.&lt;/p&gt;
&lt;p&gt;We also need to ensure that sites do not become overly reliant on tokens.
Otherwise, people who are unable to produce a token could find themselves unable
to access services.  People routinely fail to convince computers of their status
as a human for many reasons&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.  Clients might be able to
withhold some proportion of tokens so that sites might learn not to become
dependent on them.&lt;/p&gt;
&lt;p&gt;If these shortcomings are addressed somehow, it is possible that Privacy Pass
could help sites detect or identify fraud or abuse.  However, implementing the
safeguards necessary to protect privacy and equitable access is not easy.  It
might not even be worth it.&lt;/p&gt;
&lt;h3 id=&quot;questionable-options&quot;&gt;Questionable options&lt;/h3&gt;
&lt;p&gt;Google have proposed an extension to Privacy Pass that carries &lt;a href=&quot;https://eprint.iacr.org/2020/072&quot;&gt;secret
information&lt;/a&gt;.  The goal here is to allow sites
to rely on an assessment of trust that is made by another site, but not reveal
the decision to the client.  All clients would be expected to retrieve a token
and proffer one in order to access the service.  Suspicious clients would be
given a token that secretly identifies them as such.&lt;/p&gt;
&lt;p&gt;This would avoid revealing to clients that they have been identified as
potentially fraudulent, but it comes with two problems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Any determination would only be based on information available to the site
that provides the token.  The marking would less reliable as a result and
based only on the client identity or browser fingerprint&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/fraud/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;.  Consequently, any such marking would not be directly usable and it
need to be combined with other indicators, like how the client behaves.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Clients that might be secretly classified as dishonest have far less
incentive to carry a token that might label them as such.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The secret bit also carries information, which – again – could mean anything.
Anything like this would need safeguards against privacy abuse by token
providers.&lt;/p&gt;
&lt;p&gt;Google have also proposed &lt;a href=&quot;https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md&quot;&gt;Web Environment
Integrity&lt;/a&gt;,
which seeks to suppress diversity of client software.  Eric Rescorla has a good
explanation of &lt;a href=&quot;https://educatedguesswork.org/posts/wei/&quot;&gt;how this sort of approach is
problematic&lt;/a&gt;.  Without proper
safeguards, the same concerns apply to Apple’s Private Access Tokens.&lt;/p&gt;
&lt;p&gt;The key insight for me is that all of these technologies risk placing
restrictions on how people access the Web.  Some more than others.  But openness
is worth protecting, even if it does make some things harder.  Fraud and abuse
management are in some ways a product of that openness, but so is user
empowerment, equity of access, and privacy.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;It seems unlikely that anything is going to change.  Those who want to commit
fraud will continue to try to evade detection and those who are trying to stop
them will try increasingly invasive methods, including fingerprinting.&lt;/p&gt;
&lt;p&gt;Fraud and abuse are something that many sites contend with. There are no easy or
assured methods for managing fraud or abuse risk.  Defenders look for patterns,
both in client characteristics and their behaviour.  Fingerprinting browsers
this way can have poor privacy consquences.  Concealing how attacks are
classified is the only way to ensure that attackers do not adapt their methods
to avoid protections.  New methods for classification might help, but they
create new challenges that will need to be managed.&lt;/p&gt;
&lt;p&gt;Fraud is here to stay.  Fingerprinting too.  I wish that I had a better story to
tell, but this is one of the prices we pay for an open Web.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I’m not comfortable using
the more widely used “anti-fraud” term here.  It sounds too definite, as if to
imply that fraud can be prevented perfectly.  Fraud and abuse can be managed,
but not so absolutely. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This story has been widely misreported, see
(&lt;a href=&quot;https://www.schneier.com/blog/archives/2023/08/bots-are-better-than-humans-at-solving-captchas.html&quot;&gt;Schneier&lt;/a&gt;,
&lt;a href=&quot;https://www.theregister.com/2023/08/15/so_much_for_captcha_then/&quot;&gt;The
Register&lt;/a&gt;, and
&lt;a href=&quot;https://hardware.slashdot.org/story/23/08/10/0439241/bots-are-better-than-humans-at-cracking-are-you-a-robot-captcha-tests-study-finds&quot;&gt;Slashdot&lt;/a&gt;). These
articles cite a recent &lt;a href=&quot;https://arxiv.org/abs/2307.12108&quot;&gt;study&lt;/a&gt; from UC Irvine,
which cites &lt;a href=&quot;https://arxiv.org/abs/2307.12108&quot;&gt;a study from 2014&lt;/a&gt; that applies
to a largely defunct CAPTCHA method.  CAPTCHA fans might hold out
&lt;a href=&quot;https://arxiv.org/abs/2209.06293&quot;&gt;some&lt;/a&gt;
&lt;a href=&quot;https://www.nature.com/articles/d41586-023-02361-7&quot;&gt;hope&lt;/a&gt;, though maybe the
rest of us would be happy to never see another inane test. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is a whole industry around the
scalping of limited run sneakers, to the point that there are specialist cloud
services that &lt;a href=&quot;https://sneakerserver.com/&quot;&gt;boast extra low latency access&lt;/a&gt; to
the sites for major sneaker vendors. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Think
email addresses or phone numbers.  These sites like to pretend that these
practices are privacy respecting, but collecting primary identifiers often
involves deceptive practices. For example, making access to a service
conditional on providing a phone number. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It is widely believed that, during the second World War, that
the British chose not to act on intelligence gained from their breaking of
Enigma codes.  No doubt the Admiralty did exercise discretion in how it used the
information it gained, but &lt;a href=&quot;https://drenigma.org/2021/09/21/who-spilt-the-beans-how-the-enigma-secret-was-revealed/&quot;&gt;the famous case of the bombing of Coventry in
November 1940 was not one of these
instances&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It could be bad if tokens
had something to say about the colour of a person’s skin or their gender
identity.  There are more bad uses than good ones for these tokens. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Finally, a good reason to cite &lt;a href=&quot;https://arxiv.org/abs/2307.12108&quot;&gt;the study mentioned
previously&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;A fingerprint could
be re-evaluated on the other site without using a token, so that isn’t much
help. &lt;a href=&quot;https://lowentropy.net/posts/fraud/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Entropy and Privacy Analysis</title>
    <link href="https://lowentropy.net/posts/entropy-privacy/"/>
    <updated>2022-05-27T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/entropy-privacy/</id>
    <content type="html">&lt;p&gt;Aggregation is a powerful tool when it comes to providing privacy for users. But
analysis that relies on aggregate statistics for privacy loss hides some of the
worst effects of designs.&lt;/p&gt;
&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
&lt;p&gt;A lot of my time recently has been spent looking at various proposals for
improving online advertising.  A lot of this work is centred on the &lt;a href=&quot;https://patcg.github.io/&quot;&gt;Private
Advertising Technology Community Group&lt;/a&gt; in the W3C
where the goal is to find designs that improve advertising while maintaining
strong technical protections for privacy.&lt;/p&gt;
&lt;p&gt;Part of deciding whether a design does in fact provide strong privacy
protections requires understanding firstly what that means.  That is a large
topic on which the conversation is continuing.  In this post, my goal is to look
at some aspects of how we might critically evaluate the privacy characteristics
of proposals.&lt;/p&gt;
&lt;h2 id=&quot;limitations-of-differential-privacy&quot;&gt;Limitations of Differential Privacy&lt;/h2&gt;
&lt;p&gt;A number of designs have been proposed in this space with supporting analysis
that is based on differential privacy. Providing differential privacy involves
adding noise to measurements using a tunable parameter (usually called
$&#92;varepsilon$) that hides individual contributions under a random distribution.&lt;/p&gt;
&lt;p&gt;I’m a big fan of differential privacy, but while differential privacy provides a
good basis for understanding the impact of a proposal, it is recognized that
there is a need to continuously release information in order to maintain basic
utility in a long-running system.&lt;/p&gt;
&lt;p&gt;Continuous release of data potentially leads to the protections offered by
differential privacy noise being less effective over time.  It is prudent
therefore to understand the operation of the system without the protection
afforded by noise.  This is particularly relevant where the noise uses a large
$&#92;varepsilon$ value or is applied to unaggregated outputs, where it can be
easier to cancel the effect of noise by looking at multiple output values.&lt;/p&gt;
&lt;p&gt;Information exposure is often expressed using information theoretic statistics
like entropy.  This note explores how entropy — or any single statistic
— is a poor basis for privacy analysis and suggests options for more
rigorous analysis.&lt;/p&gt;
&lt;h2 id=&quot;information-theory-and-privacy&quot;&gt;Information Theory and Privacy&lt;/h2&gt;
&lt;p&gt;Some analysis of Web privacy features often looks at the number of bits of
information that a system releases to an adversary. Analyses of this type use
the distribution of probabilities of all events as a way of estimating the
amount of information that might be provided by a specific event.&lt;/p&gt;
&lt;p&gt;In information theory, each event provides information or surprisal, defined by
a relationship with the probability of the event:&lt;/p&gt;
&lt;p&gt;$$I(x)=-&#92;log_2(P(x))$$&lt;/p&gt;
&lt;p&gt;The reason we might use information is that if a feature releases too much
information, then people might be individually identified.  They might no longer
be anonymous.  Their activities might be linked to them specifically.  The
information can be used to form a profile based on their actions or further
joined to their identity or identities.&lt;/p&gt;
&lt;p&gt;Generally, we consider it a problem when information enables identification of
individuals.  We might express concern if:&lt;/p&gt;
&lt;p&gt;$$2^{I(x)} &#92;ge &#92;text{size of population}$$&lt;/p&gt;
&lt;p&gt;Because surprisal is about specific events, it can be a little unwieldy.
Surprisal is not useful for reaching a holistic understanding of the system.  A
statistic that summarizes all potential outcomes is more useful in gaining
insight into how the system operates as a whole.  A common statistic used in
this context is entropy, which provides a mean or expected surprisal across a
sampled population:&lt;/p&gt;
&lt;p&gt;$$H(X)=&#92;sum_{x&#92;in X}P(x)I(x)=-&#92;sum_{x&#92;in X}P(x)&#92;log_2(P(x))=-&#92;frac{1}{N}&#92;sum_{i=1}^N&#92;log_2(P(x_i))$$&lt;/p&gt;
&lt;p&gt;Entropy has a number of applications. For instance, it can be used to determine
an optimal encoding of the information from many events, using entropy coding
(such as Huffman or Arithmetic coding).&lt;/p&gt;
&lt;h2 id=&quot;using-entropy-in-privacy-analysis&quot;&gt;Using Entropy in Privacy Analysis&lt;/h2&gt;
&lt;p&gt;The use of specific statistics in privacy analysis is useful to the extent that
they provide an understanding of the overall shape of the system. However,
simple statistics tend to lose information about exceptional circumstances.&lt;/p&gt;
&lt;p&gt;Entropy has real trouble with rare events. Low probability events have high
surprisal, but as entropy scales their contribution by their probability, they
contribute less to the total entropy than higher probability events.&lt;/p&gt;
&lt;p&gt;In general, revealing more information is undesirable from a privacy
perspective. Toward that end, it might seem obvious that minimizing entropy is
desirable. However, this can be shown to be counterproductive for individual
privacy, even if a single statistic is improved.&lt;/p&gt;
&lt;div class=&quot;callout&quot;&gt;
&lt;p&gt;An example might help prime intuition. A cohort of 100 people is arbitrarily
allocated into two groups. If people are evenly distributed into groups of 50,
revealing the group that a person has been allocated provides just a single bit
of information, that is, surprisal is 1 bit. The total entropy of the system is
1 bit.&lt;/p&gt;
&lt;p&gt;An asymmetric allocation can produce a different result. If 99 people are
allocated to one group and a single person to the other, revealing that someone
is in the first group provides almost no information at 0.0145 bits. On the
contrary, revealing the allocation for the lone person in the second group —
which uniquely identifies that person — produces a much larger surprisal of
6.64 bits. Though this is clearly a privacy problem for that person, their
privacy loss is not reflected in the total entropy of the system, which at
0.0808 bits is close to zero.&lt;/p&gt;
&lt;p&gt;Entropy tells us that the average information revealed for all users is very
small.  That conclusion about the aggregate is reflected in the entropy
statistic, but it hides the disproportionately large impact on the single user
who loses the most.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The more asymmetric the information contributed by individuals, the lower the
entropy of the overall system.&lt;/p&gt;
&lt;p&gt;Limiting analysis to simple statistics, and entropy in particular, can hide
privacy problems. Somewhat counterintuitively, the adverse consequences of a
design are felt more by a minority of participants for systems with lower
entropy.&lt;/p&gt;
&lt;p&gt;This is not a revelatory insight.  It is well known that &lt;a href=&quot;https://janhove.github.io/teaching/2016/11/21/what-correlations-look-like&quot;&gt;a single metric is
often a poor means of understanding
data&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Entropy can provides a misleading intuitive understanding of privacy as it
relates to the experience of individual users.&lt;/p&gt;
&lt;h2 id=&quot;recommendations&quot;&gt;Recommendations&lt;/h2&gt;
&lt;p&gt;Information entropy remains useful as a means of understanding the overall
utility of the information that a system provides. Understanding key statistics
as part of a design is valuable.  However, for entropy measures in particular,
this is only useful from a perspective that seeks to reduce overall utility;
entropy provides almost no information about the experience of individuals.&lt;/p&gt;
&lt;h3 id=&quot;understand-the-surprisal-distribution&quot;&gt;Understand the Surprisal Distribution&lt;/h3&gt;
&lt;p&gt;Examining only the mean surprisal offers very little insight into a
system. Statistical analysis rarely considers a mean value in isolation. Most
statistical treatment takes the shape of the underlying distribution into
account.&lt;/p&gt;
&lt;p&gt;For privacy analysis, understanding the distribution of surprisal values is
useful.  Even just looking at percentiles might offer greater insight into the
nature of the privacy loss for those who are most adversely affected.&lt;/p&gt;
&lt;p&gt;Shortcomings of entropy are shared by related statistics, like &lt;a href=&quot;https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence&quot;&gt;Kullback–Leibler
divergence&lt;/a&gt;
or mutual information, which estimate information gain relative to a known
distribution.  Considering percentiles and other statistics can improve
understanding.&lt;/p&gt;
&lt;p&gt;Knowing the distribution of surprisal admits the combination of privacy loss
metrics.  As privacy is affected by multiple concurrent efforts to change the
way people use the Web, the interaction of features can be hard to understand.
Richer expressions of the effect of changes might allow for joint analysis to be
performed.  Though it requires assumptions about the extent to which different
surprisal distributions might be correlated, analyses of surprisal that assume
either complete independence or perfect correlation could provide insights into
the potential extent of privacy loss from combining features.&lt;/p&gt;
&lt;p&gt;For example, it might be useful to consider the interaction of a proposal with
extant browser fingerprinting.  The users who reveal the most information using
the proposal might not be the same users who reveal the most fingerprinting
information.  Analysis could show that there are no problems or it might help
guide further research that would provide solutions.&lt;/p&gt;
&lt;p&gt;More relevant to privacy might be understanding the proportion of individuals
that are potentially identifiable using a system. A common privacy goal is to
maintain a minimum size of anonymity set.  It might be possible to apply
knowledge of a surprisal distribution to estimating the size of a population
where the anonymity set becomes too small for some users.  This information
might then guide the creation of safeguards.&lt;/p&gt;
&lt;h3 id=&quot;consider-the-worst-case&quot;&gt;Consider the Worst Case&lt;/h3&gt;
&lt;p&gt;A worst-case analysis is worth considering from the perspective of understanding
how the system treats the privacy of all those who might be affected. That is,
consider the implications for users on the tail of any distribution.  Small user
populations will effectively guarantee that any result is drawn from the tail of
a larger distribution.&lt;/p&gt;
&lt;p&gt;Concentrating on cases where information might be attributed to individuals
might miss privacy problems that might arise from people being identifiable in
small groups. Understand how likely smaller groups might be affected.&lt;/p&gt;
&lt;p&gt;The potential for targeting of individuals or small groups might justify
disqualification of — or at least adjustments to — a proposal.  The Web is
for everyone, not just most people.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Bundling for the Web</title>
    <link href="https://lowentropy.net/posts/bundles/"/>
    <updated>2021-02-26T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/bundles/</id>
    <content type="html">&lt;p&gt;The idea of bundling is deceptively simple. Take a bunch of stuff and glom them
into a single package. So why is it so difficult to teach the web how to
bundle?&lt;/p&gt;
&lt;aside&gt;
&lt;p&gt;First a note: this is my personal opinion and an incomplete one at that. Though
I work for Mozilla, this post is part of my process of grappling with a complex
problem. Mozilla has made &lt;a href=&quot;https://mozilla.github.io/standards-positions/#bundled-exchanges&quot;&gt;a
statement&lt;/a&gt;
regarding Google’s bundled exchanges, which still applies. That position might
be revised and I’ll have some say in that if it does, but any process will
involve a discussion with a group of my colleagues who each add their own
perspectives and experience.&lt;/p&gt;
&lt;/aside&gt;
&lt;h2 id=&quot;the-web-already-does-bundling&quot;&gt;The Web already does bundling&lt;/h2&gt;
&lt;p&gt;A bundled resource is a resource the composes multiple pieces of content.
Bundles can consist of content only of a single type or mixed types.&lt;/p&gt;
&lt;p&gt;Take something like JavaScript&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;. A very large proportion of the JavaScript content on the web is
bundled today. If you haven’t bundled, minified, and compressed your
JavaScript, you have left easy performance wins unrealized.&lt;/p&gt;
&lt;p&gt;HTML is a bundling format in its own right, with inline JavaScript and CSS.
Bundling other content is also possible with &lt;code&gt;data:&lt;/code&gt; URIs, even if this has
some drawbacks.&lt;/p&gt;
&lt;p&gt;Then there are &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Glossary/CSS_preprocessor&quot;&gt;CSS
preprocessors&lt;/a&gt;,
which provide bundling options, &lt;a href=&quot;https://developers.google.com/web/fundamentals/design-and-ux/responsive/images#use_image_sprites&quot;&gt;image
spriting&lt;/a&gt;,
and myriad other hacks.&lt;/p&gt;
&lt;p&gt;And that leaves aside the whole mess of zipfiles, tarballs, and self-extracting
executables that are used for a variety of Web-adjacent purposes. Those matter
too, but they are generally not Web-visible.&lt;/p&gt;
&lt;h2 id=&quot;why-we-might-want-bundles&quot;&gt;Why we might want bundles&lt;/h2&gt;
&lt;p&gt;What is immediately clear from this brief review of available Web bundling
options is that they are all terrible in varying degrees. The reasons are
varied and a close examination of the reasons for this is probably not
worthwhile.&lt;/p&gt;
&lt;p&gt;It might be best just to view this as the legacy of a system that evolved in
piecemeal fashion; an evolutionary artifact along a dimension that nature did
not regard as critical to success.&lt;/p&gt;
&lt;p&gt;I’m more interested in the balance of different pressures, both for and against
bundling. There are good reasons in support of bundling, and quite a few
reasons to be cautious, but it looks like the time has come to consider
bundling seriously.&lt;/p&gt;
&lt;p&gt;I doubt that introducing native support for bundling technology will
fundamentally change the way Web content is delivered. I see it more as an
opportunity to expand the toolkit to allow for more use cases and more flexible
deployment options.&lt;/p&gt;
&lt;p&gt;In researching this, I was reminded of work that Jonas Sicking did to identify
&lt;a href=&quot;https://gist.github.com/sicking/6dcd3b771612b2f6f1bb&quot;&gt;use cases&lt;/a&gt;. There are
lots of reasons and requirements that are worth looking at. Some of the
reasoning is dated, but there is a lot of relevant material, even five years
on.&lt;/p&gt;
&lt;h3 id=&quot;efficiency&quot;&gt;Efficiency&lt;/h3&gt;
&lt;p&gt;One set of touted advantages for bundling relate to performance and efficiency.
Today, we have a better understanding of the ways in which performance is
affected by resource composition, so this has been narrowed down to two primary
features: compression efficiency and reduced overheads.&lt;/p&gt;
&lt;p&gt;I also want to address another reason that is often cited: providing content
that a client will need, but doesn’t yet know about.&lt;/p&gt;
&lt;h4 id=&quot;shared-compression&quot;&gt;Shared compression&lt;/h4&gt;
&lt;p&gt;Compression efficiency can be dramatically improved if similar resources are
bundled together. This is because the larger shared context results in more
repetition and gives a compressor more opportunities to find and exploit
similarities.&lt;/p&gt;
&lt;p&gt;Bundling is not the only way to achieve this. Alternative methods of attaining
compression gains have been explored, such as
&lt;a href=&quot;https://docs.google.com/document/d/1REMkwjXY5yFOkJwtJPjCMwZ4Shx3D9vfdAytV_KQCUo/edit?usp=sharing&quot;&gt;SDCH&lt;/a&gt;
and &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-vkrasnov-h2-compression-dictionaries-03&quot;&gt;cross-stream compression contexts for
HTTP/2&lt;/a&gt;.
Prototypes of the latter showed immense improvements in compression efficiency
and corresponding performance gains.&lt;/p&gt;
&lt;p&gt;General solutions like these have not been successful in find ways to manage
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-handte-httpbis-dict-sec-00&quot;&gt;operational security
concerns&lt;/a&gt;.
The hope with bundles is that a bundling can occur as a build process. As the
build occurs before deploying content to servers, no sensitive or user-specific
data will be involved. This is somewhat at odds with some of the dynamic
features involved, but that sort of separation could be an effective strategy
for managing this security risk.&lt;/p&gt;
&lt;h4 id=&quot;reduced-overheads&quot;&gt;Reduced overheads&lt;/h4&gt;
&lt;p&gt;Bundling could also reduce overheads. While HTTP/2 and HTTP/3 reduce the cost
of making requests, those costs still compound when multiple resources are
involved. The claim here is that internal handling of individual requests in
browsers has inefficiencies that are hard to eliminate without some form of
bundling.&lt;/p&gt;
&lt;p&gt;I find it curious that protocol-level inefficiencies are not blamed here, but
rather inter-process communication between internal browser processes. Not
having examined this closely&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;, I can’t really speak to these
claims.&lt;/p&gt;
&lt;p&gt;What I do know is that performance in this space is subtle. When we were
building HTTP/2, we found that performance was highly sensitive to the number
of requests that could be made by clients in the first few round trips of a
connection. The way that networking protocols work means that there is very
limited space for sending anything early in a connection&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;. The main motivation for &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc7541&quot;&gt;HTTP
header compression&lt;/a&gt; was that it
allowed significantly more requests to be made early in a connection. By
reducing request counts, bundling might do the same.&lt;/p&gt;
&lt;h4 id=&quot;eliminating-round-trips&quot;&gt;Eliminating round trips&lt;/h4&gt;
&lt;p&gt;One of the other potential benefits of bundling is in eliminating additional
round trips. For content that is requested, a bundle might provide resources
that a client does not know that it needs yet. Without bundling, a resource
that references another resource adds an additional round trip as the first
resource needs to be fetched before the second one is even known to the client.&lt;/p&gt;
&lt;p&gt;Again, experience with HTTP/2 suggests that performance gains from sending
extra resources are not easy to obtain. This is exactly what HTTP/2 server push
promised to provide. However, as we have learned with server push, the wins
here are not easy to realize. A number of attempts to improve performance with
server push often resulted in mixed results and sometimes large regressions in
performance&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.
The problem is that servers are unable to accurately predict when to push
content and so they push data that is not needed. To date, no studies have
shown that there are reliable strategies that servers can use to reliably
improve performance with server push.&lt;/p&gt;
&lt;h4 id=&quot;realizing-performance-improvements&quot;&gt;Realizing performance improvements&lt;/h4&gt;
&lt;p&gt;For bundles to realize performance gains from eliminating round trips, the
compression gains would need to be enough counteract any potential waste. This
is more challenging if bundles are built statically.&lt;/p&gt;
&lt;p&gt;I personally remain lukewarm on using bundling as a performance tool.
Shortcomings in protocols – or implementations – seem like they could be
addressed at that level.&lt;/p&gt;
&lt;h3 id=&quot;ergonomics&quot;&gt;Ergonomics&lt;/h3&gt;
&lt;p&gt;The use of bundlers is an established practice in Web development. Being able
to outsource some of the responsibility for managing the complexities of
content delivery is no doubt part of the appeal.&lt;/p&gt;
&lt;p&gt;Being able to compose complex content into a single package should not be
underestimated.&lt;/p&gt;
&lt;p&gt;Bundling of content into a single file is a property common to many systems.
Providing a single item to manage with a single identity simplifies
interactions. This is how most people expect content of all kinds to be
delivered, whether it is
&lt;a href=&quot;https://en.wikipedia.org/wiki/Self-extracting_archive&quot;&gt;applications&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/wiki/Comparison_of_e-book_formats&quot;&gt;books&lt;/a&gt;,
&lt;a href=&quot;https://en.wikipedia.org/wiki/Library_(computing)&quot;&gt;libraries&lt;/a&gt;, or any other
sort of digital artifact. The Web here is something of an abberation in that it
resists the idea that parts of it can be roped off into a discrete unit with a
finite size and name.&lt;/p&gt;
&lt;p&gt;Though this usage pattern might be partly attributed to path dependence, the
usability benefits of individual files cannot be so readily dismissed. Being
able to manage bundles as a single unit where necessary, but identify the
component pieces is likely to be a fairly large gain for developers.&lt;/p&gt;
&lt;p&gt;For me, this reason might be enough to justify using bundles, even over some of
their drawbacks.&lt;/p&gt;
&lt;h2 id=&quot;why-we-might-not-want-bundles&quot;&gt;Why we might not want bundles&lt;/h2&gt;
&lt;p&gt;The act of bundling subsumes the identity of each piece of bundled content with
the identity of the bundle that is formed. This produces a number of effects,
some of them desirable (as discussed), some of them less so.&lt;/p&gt;
&lt;p&gt;As far as effects go, whether they are valuable or harmful might depend on
context and perspective. Some of these effects might simply be managed as
trade-offs, with site or server developers being able to choose how content is
composed in order to balance various factors like total bytes transferred or
latency.&lt;/p&gt;
&lt;p&gt;If bundling only represented trade-offs that affected the operation of servers,
then we might be able to resolve whether the feature is worth pursuing on the
grounds of simple cost-benefit. Where things get more interesting is where
choices might involve depriving others of their own choices. Balancing the
needs of clients and servers is occasionally necessary. Determining the effect
of server choices on clients – and the people they might act for – is
therefore an important part of any analysis we might perform.&lt;/p&gt;
&lt;h3 id=&quot;cache-efficiency-and-bundle-composition&quot;&gt;Cache efficiency and bundle composition&lt;/h3&gt;
&lt;p&gt;Content construction and serving infrastructure generally operates with
imperfect knowledge of the state of caches. Not knowing what a client might
need can make it hard to know what content to serve at any given point in time.&lt;/p&gt;
&lt;p&gt;Optimizing the composition of the bundles used on a site for clients with a
variety of cache states can be particularly challenging if caches operate at
the granularity of resources. Clients that have no prior state might benefit
from maximal bundling, which allows better realization of the aforementioned
efficiency gains.&lt;/p&gt;
&lt;p&gt;On the other hand, clients that have previously received an older version of
the same content might only need to receive updates for those things that have
changed. Similarly, clients that have previously received content for other
pages that includes some of the same content. In both cases, receiving copies
of content that was already transferred might negate any efficiency gains.&lt;/p&gt;
&lt;p&gt;This is a problem that JavaScript bundlers have to deal with today. As an
optimization problem it is made difficult by the combination of poor
information about client state with the complexity of code dependency graphs
and the potential for clients to follow different paths through sites.&lt;/p&gt;
&lt;p&gt;For example, consider the code that is used on an article page on a
hypothetical news site and the code used on the home page of the same site.
Some of that code will be common, if we make the assumption that site
developers use common tools. Bundlers might deal with this by making three
bundles: one of common code, plus one each of article and home page code. For a
very simple site like this, that allows all the code to be delivered in just
two bundles on either type of page, plus an extra bundle when navigating from
an article to the home page or vice versa.&lt;/p&gt;
&lt;p&gt;As the number of different types of page increases, splitting code into
multiple bundles breaks down. The number of bundle permutations can increase
much faster than the number of discrete uses. In the extreme, the number of
bundles could end up being factorial on the number of types of page, limited
only by the number of resources that might be bundled. Of course, well before
that point is reached, the complexity cost of bundling likely exceeds any
benefits it might provide.&lt;/p&gt;
&lt;p&gt;To deal with this, bundlers have a bunch of heuristics that balance the costs
of providing too much data in a bundle for a particular purpose, against the
costs of potentially providing bundled data that is already present. Some sites
take this a little further and use service workers to enhance browser caching
logic&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It is at this point that you might recognize an opportunity. If clients
understood the structure of bundles, then maybe they could do something to
avoid fetching redundant data. Maybe providing a way to selectively request
pieces of bundles could reduce the cost of fetching bundles when parts of the
bundle are already present. That would allow the bundlers to skew their
heuristics more toward putting stuff in bundles. It might even be possible to
tune first-time queries this way.&lt;/p&gt;
&lt;p&gt;The thing is, we’ve already tried that.&lt;/p&gt;
&lt;h4 id=&quot;a-standard-for-inefficient-caching&quot;&gt;A standard for inefficient caching&lt;/h4&gt;
&lt;p&gt;There is a long history in HTTP of failed innovation when it comes to
standardizing improvements for cache efficiency. Though cache invalidation is
recognized as one of the &lt;a href=&quot;https://www.karlton.org/2017/12/naming-things-hard/&quot;&gt;hard
problems&lt;/a&gt; in computer
science, there are quite a few examples of successful deployments of
proprietary solutions in server and CDN infrastructure.&lt;/p&gt;
&lt;p&gt;A few caching innovations have made it into HTTP over time, such as the recent
&lt;a href=&quot;https://datatracker.ietf.org/doc//html/rfc8246&quot;&gt;immutable Cache-Control
directive&lt;/a&gt;. That particular
solution is quite relevant in this context due to the way that it supports
content-based URI construction, but it is still narrower in applicability than
a good solution in this space might need.&lt;/p&gt;
&lt;p&gt;If we view bundling as a process that happens as part of site construction,
bundles might be treated as opaque blobs by servers. Servers that aren’t aware
of bundle structure are likely to end up sending more bits than the client
needs. To avoid this, servers and clients both need to be aware of the contents
of bundles.&lt;/p&gt;
&lt;h4 id=&quot;cache-digests&quot;&gt;Cache digests&lt;/h4&gt;
&lt;p&gt;Once both the client and server are aware of individual resources within
bundles, this problem starts to look very much like server push.&lt;/p&gt;
&lt;p&gt;Previous attempts to solve the problem of knowing what to push aimed to improve
the information available to servers. &lt;a href=&quot;https://tools.ietf.org/html/draft-ietf-httpbis-cache-digest-05&quot;&gt;Cache
digests&lt;/a&gt; is the
most notable attempt here. It got several revisions into the IETF working group
process. It still failed.&lt;/p&gt;
&lt;p&gt;If the goal of failing is to learn, then this too was a failure largely for the
most ignomonious of reasons: no deployment. Claims from clients that cache
digests are too expensive to implement seem reasonable, but not entirely
satisfactory in light of the change to use &lt;a href=&quot;https://en.wikipedia.org/wiki/Cuckoo_filter&quot;&gt;Cuckoo
filters&lt;/a&gt; in later versions. More
so with recent storage partitioning work.&lt;/p&gt;
&lt;p&gt;The point of this little digression is to highlight the inherent difficulties
in trying to fix this problem by layering in enhancements to the caching model.
More so when that requires replicating the infrastructure we have for
individual resources at the level of bundled content.&lt;/p&gt;
&lt;p&gt;My view is that it would be unwise to attempt to tackle a problem like this as
part of trying to introduce a new feature. If the success of bundling depends
on finding a solution to this problem, then I would be surprised, but it might
suggest that the marginal benefit of bundling – at least for performance – is
not sufficient to justify the effort&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3 id=&quot;prioritization-is-harder&quot;&gt;Prioritization is harder&lt;/h3&gt;
&lt;p&gt;Mark Nottingham reminded me that even if servers and clients are modified so
that they are aware of individual resources, there are still limitations.
Bundles might contain resources with different priorities. It might be
impossible to avoid performance regressions.&lt;/p&gt;
&lt;p&gt;It is certainly possible to invent a new system for ensuring that bundles are
properly prioritized, but that requires good knowledge of relative priority at
the time that bundles are constructed.&lt;/p&gt;
&lt;p&gt;Putting important stuff first is likely a good strategy, but that has drawbacks
too. Servers need to know where to apply priority changes when serving bundles
or the low-priority pieces will be served at the same priority as high-priority
pieces. The relative priority of resources will need to be known at bundling
time. Bundling content that might change in priority in response to &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-priority&quot;&gt;client
signals&lt;/a&gt;
might result in priority inversions and performance regressions.&lt;/p&gt;
&lt;p&gt;Just like with caching, addressing prioritization shortcomings could require
replicating a lot of the machinery we have for prioritizing individual
resources within bundles.&lt;/p&gt;
&lt;h3 id=&quot;erasing-resource-identity&quot;&gt;Erasing resource identity&lt;/h3&gt;
&lt;p&gt;An &lt;a href=&quot;https://github.com/WICG/webpackage/issues/551&quot;&gt;issue&lt;/a&gt; that was first&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt; raised by Brave is that the use of bundles creates
opportunities for sites to obfuscate the identity of resources. The thesis
being that bundling could &lt;a href=&quot;https://brave.com/webbundles-harmful-to-content-blocking-security-tools-and-the-open-web/&quot;&gt;confound content blocking
techniques&lt;/a&gt;
as it would make rewriting of identifiers easier.&lt;/p&gt;
&lt;p&gt;For those who rely on the identity of resources to understand the semantics and
intent of the identified resource, there are some ways in which bundling might
affect their decision-making. The primary concern is that references between
resources in the same bundle are fundamentally more malleable than other
references. As the reference and reference target are in the same place, it is
trivial – at least in theory – to change the identifier.&lt;/p&gt;
&lt;p&gt;Brave and several others are therefore concerned that bundling will make it
easier to prevent URI-based classification of resources. In the extreme,
identifiers could be rewritten for every request, negating any attempt to use
those identifiers for classification.&lt;/p&gt;
&lt;p&gt;One of the most interesting properties of the Web is the way that it insinuates
a browser – and user agency – into the process. The way that happens is that
the the Web&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt; is structurally biased toward functioning better when sites expose
semantic information to browsers. This property, sometimes called semantic
transparency, is what allows browsers to be opinionated about content rather
than acting as a dumb pipe&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h4 id=&quot;yes%2C-it%E2%80%99s-about-ad-blockers&quot;&gt;Yes, it’s about ad blockers&lt;/h4&gt;
&lt;p&gt;Just so that this is clear, this is mostly about blocking advertising.&lt;/p&gt;
&lt;p&gt;While more advanced ad blocking techniques also draw on contextual clues about
resources, those methods are more costly. Most ad blocking decisions are made
based on the URI of resources. Using the resource identity allows the ad
blocker to prevent the load, which not only means that the ad is not displayed,
but the resources needed to retrieve it are not spent&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;While &lt;a href=&quot;https://www.statista.com/statistics/804008/ad-blocking-reach-usage-us/&quot;&gt;many people might choose to block
ads&lt;/a&gt;,
sites don’t like being denied the revenue that advertising provides. Some sites
use techniques that are designed to show advertising to users of ad blockers,
so it is not unreasonable to expect tools to be used to prevent classification.&lt;/p&gt;
&lt;p&gt;It is important to note that this is not a situation that requires an absolute
certainty. The sorry state of Web privacy means that we have a lot of places
where various forces are in tension or transition. The point of Brave’s
complaint here is not that bundling outright prevents the sort of
classification they seek, but that it changes the balance of system dynamics by
giving sites another tool that they might employ to avoid classification.&lt;/p&gt;
&lt;p&gt;Of course, when it is a question of degree, we need to discuss and agree how
much the introduction of such a tool affects the existing system. That’s where
this gets hard.&lt;/p&gt;
&lt;h4 id=&quot;coordination-artifacts&quot;&gt;Coordination artifacts&lt;/h4&gt;
&lt;p&gt;As much as these concerns are serious, I tend to think that Jeffrey Yasskin’s
&lt;a href=&quot;https://medium.com/@jyasskin/why-do-url-based-ad-blockers-work-3a13b08a1167&quot;&gt;analysis of the
problem&lt;/a&gt;
is broadly correct. That analysis essentially concludes that the reason we have
URIs is to facilitate coordination between different entities. As long as there
is a need to coordinate between the different entities that provide the
resources that might be composed into a web page, that coordination will expose
information that can be used for classification.&lt;/p&gt;
&lt;p&gt;That is, to the extent to which bundles enable obfuscation of identifiers, that
obfuscation needs to be coordinated. Any coordination that would enable
obfuscation with bundling is equally effective and easy to apply without
bundling.&lt;/p&gt;
&lt;h4 id=&quot;single-page-coordination&quot;&gt;Single-page coordination&lt;/h4&gt;
&lt;p&gt;Take a single Web page. Pretend for a moment that the web page exists in a
vacuum, with no relationship to other pages at all. You could take all the
resources that comprise that page and form them into a single bundle. As all
resources are in the one place, it would be trivial to rewrite the references
between those resources. Or, the identity of resources could be erased entirely
by inlining everything. If every request for that page produced a bundle with a
different set of resource identifiers, it would be impossible to infer anything
about the contents of resources based on their identity alone.&lt;/p&gt;
&lt;p&gt;Unitary bundles for evey page is an extreme that is almost certainly
impractical. If sites were delivered this way, there would be no caching, which
means no reuse of common components. Using the Web would be terribly slow.&lt;/p&gt;
&lt;p&gt;Providing strong incentive to deploy pages as discrete bundles – something
Google Search has done to &lt;a href=&quot;https://developers.google.com/search/docs/guides/about-amp#about-signed-exchange&quot;&gt;enable preloading search results for cooperating
sites&lt;/a&gt;
– could effectively force sites to bundle in this way. Erasing or obfuscating
internal links in these bundles does seem natural at this point, if only to try
to reclaim some of the lost performance, but that assumes an unnatural pressure
toward bundling&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Absent perverse incentives, sites are often built from components developed by
multiple groups, even if that is just different teams working at the same
company. To the extent that teams operate independently, they need to agree on
how they interface. The closer the teams work together, and the more tightly
they are able to coordinate, the more flexible those interfaces can be.&lt;/p&gt;
&lt;p&gt;There are several natural interface points on the Web. Of these the URI remains
a key interface point&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;. A simple string
that provides a handle for a whole bundle&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt; of
collected concepts is a powerful abstraction.&lt;/p&gt;
&lt;h4 id=&quot;cross-site-coordination&quot;&gt;Cross-site coordination&lt;/h4&gt;
&lt;p&gt;Interfaces between components therefore often use URIs, especially once
cross-origin content is involved. For widely-used components that enable
communication between sites, URIs are almost always involved. If you want to
use &lt;a href=&quot;https://reactjs.org/&quot;&gt;React&lt;/a&gt;, the primary interface is a URI:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;script src=&amp;quot;https://unpkg.com/react@17/umd/react.production.min.js&amp;quot; crossorigin&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;script src=&amp;quot;https://unpkg.com/react-dom@17/umd/react-dom.production.min.js&amp;quot; crossorigin&amp;gt;&amp;lt;/script&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want add &lt;a href=&quot;https://developers.google.com/analytics/devguides/collection/gtagjs&quot;&gt;Google
analytics&lt;/a&gt;,
there is a bit of JavaScript&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt; as well, but the URI is still key:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;script async src=&amp;quot;https://www.googletagmanager.com/gtag/js?id=$XXX&amp;quot;&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;script&amp;gt;
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag(&#39;js&#39;, new Date());
  gtag(&#39;config&#39;, &#39;$XXX&#39;);
&amp;lt;/script&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The same applies to &lt;a href=&quot;https://support.google.com/adsense/answer/9274634&quot;&gt;advertising&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The scale of coordination required to change these URIs is such that changes
cannot be effected on a per-request basis, they need months, if not years&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn15&quot; id=&quot;fnref15&quot;&gt;[15]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Even for resources on the same site, a version of the same coordination problem
exists. Content that might be used by multiple pages will be requested at
different times. At a minimum, changing the identity of resources would mean
forgoing any reuse of cached resources. Caching provides such a large
performance advantage that I can’t imagine sites giving that up.&lt;/p&gt;
&lt;p&gt;Even if caching were not incentive enough, I suggest that the benefits of
reference stability are enough to ensure that identifiers don’t change
arbitrarily.&lt;/p&gt;
&lt;h4 id=&quot;loose-coupling&quot;&gt;Loose coupling&lt;/h4&gt;
&lt;p&gt;As long as loose coupling is a feature of Web development, the way that
resources are identified will remain a key part of how the interfaces between
components is managed. Those identifiers will therefore tend to be stable. That
stability will allow the semantics of those resources to be learned.&lt;/p&gt;
&lt;p&gt;Bundles do not change these dynamics in any meaningful way, except to the
extent that they might enable better atomicity. That is, it becomes easier to
coordinate changes to references and content if the content is distributed in a
single indivisible unit. That’s not nothing, but – as the case of selective
fetches and cache optimization highlights – content from bundles need to be
reused in a different context, so the application of indivisible units is
severely limited.&lt;/p&gt;
&lt;p&gt;Of course, there are ways of enabling coordination that might allow for
constructing identifiers that are less semantically meaningful. To draw on the
earlier point about the Web already having bundling options, advertising code
could be inlined with other JavaScript or in HTML, rather than having it load
directly from the advertiser&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn16&quot; id=&quot;fnref16&quot;&gt;[16]&lt;/a&gt;&lt;/sup&gt;. In the extreme,
servers could rewrite all content and encrypt all URIs with a per-user key. None
of this depends on the deployment of new Web bundling technology, but it does
require close coordination.&lt;/p&gt;
&lt;h3 id=&quot;all-or-nothing-bundles&quot;&gt;All or nothing bundles&lt;/h3&gt;
&lt;p&gt;Even if it were possible to identify unwanted content, opponents of bundling
point out that placing that content in the same bundle as critical resources
makes it difficult to avoid loading the unwanted content. Some of the
performance gains from content blockers are the result of not fetching
content&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn17&quot; id=&quot;fnref17&quot;&gt;[17]&lt;/a&gt;&lt;/sup&gt;. Bundling unwanted content might eliminate the cost and
performance benefits of content blocking.&lt;/p&gt;
&lt;p&gt;This is another important criticism that ties in with the early concerns
regarding bundle composition and reuse. And, similar to previous problems, the
concern is not that this sort of bundling is enabled as a result of native,
generic bundling capabilities, but more that it becomes more readily accessible
as a result.&lt;/p&gt;
&lt;p&gt;This problem, more so than the caching one, might motivate designs for
selective acquisition of bundled content.&lt;/p&gt;
&lt;p&gt;Existing techniques for selective content fetching, like &lt;a href=&quot;https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#range.requests&quot;&gt;HTTP range
requests&lt;/a&gt;,
don’t reliably work here as compression can render byte ranges useless. That
leads to inventing new systems for selective acquistion of bundles. Selective
removal of content from compressed bundles does seem to be possible &lt;a href=&quot;https://dev.to/riknelix/fast-and-efficient-recompression-using-previous-compression-artifacts-47g5&quot;&gt;at some
levels&lt;/a&gt;,
but this leads to a complex system and the effects on other protocol
participants is non-trivial.&lt;/p&gt;
&lt;p&gt;At some level, clients might want to say “just send me all the code, without
the advertising”, but that might not work so well. Asking for bundle manifests
so that content might be selectively fetched adds an additional round trip.
Moving bundle manifests out of the bundles and into content&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn18&quot; id=&quot;fnref18&quot;&gt;[18]&lt;/a&gt;&lt;/sup&gt; gives clients the information
they need to be selective about which resources they want, but it requires
moving information about the composition of resources into the content that
references it. That too requires coordination.&lt;/p&gt;
&lt;p&gt;For caches, this can add an extra burden. Using the Vary HTTP header field
would be necessary to ensure that caches would not break when content from
bundles is fetched selectively&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn19&quot; id=&quot;fnref19&quot;&gt;[19]&lt;/a&gt;&lt;/sup&gt;. But it takes full awareness of these requests and how they are
applied for a cache to not be exposed to a combinatorial explosion of different
bundles as a result. Without updating caches to understand selectors, the
effect is that caches end up bearing the load for the myriad permutations of
bundles that might be needed.&lt;/p&gt;
&lt;h3 id=&quot;supplanting-resource-identity&quot;&gt;Supplanting resource identity&lt;/h3&gt;
&lt;p&gt;A final concern is the ability – at least in active proposals – for
bundled content to be identified with URIs from the same origin as the bundle
itself. For example, a bundle at &lt;code&gt;https://example.com/foo/bundle&lt;/code&gt; might contain
content that is identified as &lt;code&gt;https://example.com/foo/script.js&lt;/code&gt;. This is a
&lt;a href=&quot;https://github.com/w3ctag/packaging-on-the-web/issues/10&quot;&gt;long-standing
concern&lt;/a&gt; that applies
to many previous attempts at bundling or packaging.&lt;/p&gt;
&lt;p&gt;This ability is constrained, but the intent is to have content in a bundle act
as a valid substitute for other resources. The reason being that you need a
fallback for those cases where bundles aren’t optimal or aren’t available. This
has implications for anyone deploying a server, who now need to ensure that
bundles aren’t hosted adjacent to content that might not want interference from
the bundle.&lt;/p&gt;
&lt;p&gt;At this point, I will note that replacing the content of other resources is
also the point of &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-yasskin-http-origin-signed-responses&quot;&gt;signed
exchanges&lt;/a&gt;.
The difference is that in signed exchanges, the replacement extends to other
origins. The constraints on what can be replaced and how are important details,
but the goal is the same: signed exchanges allow a bundle to speak for other
resources.&lt;/p&gt;
&lt;p&gt;As already noted, this sort of thing is already possible with &lt;a href=&quot;https://w3c.github.io/ServiceWorker/&quot;&gt;service
workers&lt;/a&gt;. Service workers take what it
means to subvert the identity of resources to the next level. A request that is
handled by a service worker can be turned into any other request or even
multiple requests. Service workers are limited though. A site can opt to
perform whatever substitutions it likes, but it can only do that for its own
requests. Bundles propose something that might be enabled for any server, even
inadvertently.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/WICG/resource-bundles/blob/main/subresource-loading.md#optionality-and-url-integrity&quot;&gt;One
proposal&lt;/a&gt;
says that all supplanted resources must be identical to the resources they
supplant. The theory there is that clients could fetch the resource from within
a bundle or directly and expect the same result. It goes on to suggest that a
mismatch between these fetches might be cause for a client to stop using the
bundle. However, it is perfectly normal in HTTP for the same resource to return
different content when fetched multiple times, even when the fetch is made by
the same client or at the same time. So it is hard to imagine how a client
would treat inconsistency as anything other than normal. If bundling provides
advantages, giving up on using bundles for that reason could make bundles
completely unreliable.&lt;/p&gt;
&lt;p&gt;One good reason for enabling equivalence of bundled and unbundled resources is
to provide a graceful fallback in the case that bundling is not supported by a
client. Attempting to ensure that the internal identifiers in bundles are
“real” and that the fallback does not change behaviour is not going to work.&lt;/p&gt;
&lt;h4 id=&quot;indirection-for-identifiers&quot;&gt;Indirection for identifiers&lt;/h4&gt;
&lt;p&gt;Addressing the problem of one resource speaking unilaterally for another
resource requires a little creativity. Here the solution is hinted at with both
service workers and JavaScript &lt;a href=&quot;https://wicg.github.io/import-maps/#note-on-import-specifiers&quot;&gt;import
maps&lt;/a&gt;. Both
allow the entity making a reference to rewrite that reference before the
browser acts on it.&lt;/p&gt;
&lt;p&gt;Import maps are especially instructive here as it makes it clear that the
mapping from the import specifier to a URI is not the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc3986#section-5&quot;&gt;URI resolution function
in RFC 3986&lt;/a&gt; or &lt;a href=&quot;https://url.spec.whatwg.org/#url-parsing&quot;&gt;the
URL parsing algorithm in Fetch&lt;/a&gt;;
import specifiers are explicitly not URIs, relative or otherwise.&lt;/p&gt;
&lt;p&gt;This as an opportunity to add indirection, either the limited form provided in
import maps where one string is mapped to another, or the Turing-complete
version that service workers enable.&lt;/p&gt;
&lt;p&gt;That is, we allow those places that reference resources to provide the browser
with a set of rules that change howidentifiers they use are translated into
URIs. This is something that HTML has had forever, with the
&lt;a href=&quot;https://html.spec.whatwg.org/#the-base-element&quot;&gt;&lt;code&gt;&amp;lt;base&amp;gt;&lt;/code&gt;&lt;/a&gt; element. This is
also the fundamental concept behind the &lt;a href=&quot;https://discourse.wicg.io/t/proposal-fetch-maps/4259&quot;&gt;fetch maps
proposal&lt;/a&gt;, which looks
like this&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn20&quot; id=&quot;fnref20&quot;&gt;[20]&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;script type=&amp;quot;fetchmap&amp;quot;&amp;gt;
{
  &amp;quot;urls&amp;quot;: {
    &amp;quot;/styles.css&amp;quot;: &amp;quot;/styles.a74fs3.css&amp;quot;,
    &amp;quot;/bg.png&amp;quot;: &amp;quot;/bg.8e3ac4.png&amp;quot;
  }
}
&amp;lt;/script&amp;gt;
&amp;lt;link rel=&amp;quot;stylesheet&amp;quot; href=&amp;quot;/styles.css&amp;quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this example, when the browser is asked to fetch &lt;code&gt;/styles.css&lt;/code&gt;, it knows to
fetch &lt;code&gt;/styles.a74fs3.css&lt;/code&gt; instead.&lt;/p&gt;
&lt;p&gt;The beauty of this approach is that the change only exists where the reference
is made. The canonical identity of the resource is the same for everyone (it is
always &lt;code&gt;https://example.com/styles.a74fs3.css&lt;/code&gt;), only the way that reference is
expressed changes.&lt;/p&gt;
&lt;p&gt;In other words, the common property between these designs – service workers,
&lt;code&gt;&amp;lt;base&amp;gt;&lt;/code&gt;, import maps, or fetch maps – is that the indirection only occurs at
the explicit request of the thing that makes the reference. A site deliberately
chooses to use this facility, and if it does, it controls the substitution of
resource identities. There is no lateral replacement of content as all of the
logic occurs at the point the reference is made.&lt;/p&gt;
&lt;h4 id=&quot;making-resource-maps-work&quot;&gt;Making resource maps work&lt;/h4&gt;
&lt;p&gt;Of course, fitting this indirections into an existing system requires a few
awkward adaptations. But it seems like this particular design could be quite
workable.&lt;/p&gt;
&lt;p&gt;Anne van Kesteren pointed out that the &lt;code&gt;import:&lt;/code&gt; scheme in &lt;a href=&quot;https://github.com/WICG/import-maps#import-urls&quot;&gt;import
maps&lt;/a&gt; exists because many of
the places where identifiers appear are concretely URIs. APIs assume that they
can be manipulated as URIs and violating that expectation would break things
that rely on that. If we are going to enable this sort of indirection, then we
need to ensure that URIs stay URIs. That doesn’t mean that URIs need to be
HTTP, just that they are still URIs.&lt;/p&gt;
&lt;p&gt;You might choose to construct identifiers with a new URI scheme in order to
satisfy this requirement&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn21&quot; id=&quot;fnref21&quot;&gt;[21]&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;a href=&amp;quot;scheme-for-mappings:hats&amp;quot;&amp;gt;buy hats here&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, in the fetch map example given, those identifiers look like and can
act like URIs. They can be fetched directly, without translation, if there is
no map. That’s probably a useful feature to retain as it means that you can
find local files when the reference is found in a local file during
development. Using a new scheme won’t have that advantage. A new scheme might
be an option, but it doesn’t seem to be a necessary feature of the design.&lt;/p&gt;
&lt;p&gt;I can also credit Anne with the idea that we model this indirection as a
redirect, something like an HTTP 303 (See Other). The Web is already able to
manage redirection for all sorts of resources, so that would not naturally
disrupt things too much.&lt;/p&gt;
&lt;p&gt;That is not to say that this is easy, as these redirects will need to conform
to established standards for the Web, with respect to the origin model and
integration with things like &lt;a href=&quot;https://w3c.github.io/webappsec-csp/&quot;&gt;Content Security
Policy&lt;/a&gt;. It will need to be decided how
resource maps affect cross-origin content. And many other details will need to
be thought about carefully. But again, the design seems at least plausible.&lt;/p&gt;
&lt;p&gt;Of note here is that resource maps can be polyfilled with service workers. That
suggests we might just have sites build this logic into service workers. That
could work, and it might be the basis for initial experiments. A static format
is likely superior as it makes the information more readily available.&lt;/p&gt;
&lt;h4 id=&quot;alternatives-and-bundle-uris&quot;&gt;Alternatives and bundle URIs&lt;/h4&gt;
&lt;p&gt;Providing indirection is just one piece of enabling use of bundled content.
Seamless integration needs two additional pieces.&lt;/p&gt;
&lt;p&gt;The first is an agreed method of identifying the contents of bundles. The &lt;a href=&quot;https://datatracker.ietf.org/group/wpack/about/&quot;&gt;IETF
WPACK working group&lt;/a&gt; have had
several discussions about this. These discussions were inconclusive, in part
because it was difficult to manage conflicting requirements. However, a design
grounded in a map-like construct might loosen some of the constraints that
disqualified some of the past options that were considered.&lt;/p&gt;
&lt;p&gt;In particular, the idea that a bundle might itself have an implicit resource
map was not considered. That could enable the use of simple identifiers for
references between resources in the same bundle without forcing links in
bundled content to be rewritten. And any ugly URI scheme syntax for bundles
might then be abstracted away elegantly.&lt;/p&gt;
&lt;p&gt;The second major piece to getting this working is a map that provides multiple
alternatives. In previous proposals, mappings were strictly one-to-one. A
one-to-many map could offer browsers a choice of resources that the referencing
entity considers to be equivalent&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn22&quot; id=&quot;fnref22&quot;&gt;[22]&lt;/a&gt;&lt;/sup&gt;. The browser is then able to select the option that it prefers. If
an alternative references a bundle the browser already has, that would be good
cause to use that option.&lt;/p&gt;
&lt;p&gt;Presenting multiple options also allows browsers to experiment with different
policies with respect to fetching content when bundles are offered. If bundled
content tends to perform better on initial visits, then browsers might request
bundles then. If bundled content tends to perform poorly when there is some
valid, cached content available already, then the browser might request
individual resources in that case.&lt;/p&gt;
&lt;p&gt;A resource map might be used to enable deployment of new bundling formats, or
even new retrieval methods&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn23&quot; id=&quot;fnref23&quot;&gt;[23]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h4 id=&quot;selective-acquisition&quot;&gt;Selective acquisition&lt;/h4&gt;
&lt;p&gt;One advantage of providing an identifier map like this is that it provides a
browser with some insight into what bundles contain before fetching them&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn24&quot; id=&quot;fnref24&quot;&gt;[24]&lt;/a&gt;&lt;/sup&gt;. Thus, a browser
might be able to make a decision about whether a bundle is worth fetching. If
most of the content is stuff that the browser does not want, then it might
choose to fetch individual resources instead.&lt;/p&gt;
&lt;p&gt;Having a reference map might thereby reduce the pressure to design mechanisms
for partial bundle fetching and caching. Adding some additional metadata, like
hints about resource size, might further allow for better tuning of this logic.&lt;/p&gt;
&lt;p&gt;Reference maps could even provide content classification tools more information
about resources that they can use. Even in a simple one-to-one mapping, like
with an import map, there are two identifiers that might be used to classify
content. Even if one of these is nonsense, the other could be useable.&lt;/p&gt;
&lt;p&gt;While this requires a bit more sophistication on the part of classifiers, it
also provides opportunities for better classification. With alternative
sources, even if the identifier for one source does not reveal any useful
information, an alternative might.&lt;/p&gt;
&lt;p&gt;Now that I’m fully into speculating about possibilities, this opens some
interesting options. The care that was taken to ensure that pages don’t break
when Google Analytics is blocked could be managed differently. Remember that
script:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-js&quot;&gt;window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag(&#39;js&#39;, new Date());
gtag(&#39;config&#39;, &#39;$XXX&#39;);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, the primary interface is always defined and the
&lt;code&gt;window.dataLayer&lt;/code&gt; object is replaced with a dumb array if the script didn’t
load. With multiple alternatives, the fallback logic here could be encoded in
the map as a &lt;code&gt;data:&lt;/code&gt; URI instead:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-html&quot;&gt;&amp;lt;element-for-mappings type=&amp;quot;text/media-type-for-mappings+json&amp;quot;&amp;gt;
{ &amp;quot;scheme-for-mappings:ga&amp;quot;: [
  &amp;quot;https://www.googletagmanager.com/gtag/js?id=$XXX&amp;quot;,
  &amp;quot;data:text/javascript;charset=utf-8;base64,d2luZG93LmRhdGFMYXllcj1bXTtmdW5jdGlvbiBndGFnKCl7ZGF0YUxheWVyLnB1c2goYXJndW1lbnRzKTt9Z3RhZygnanMnLG5ldyBEYXRlKCkpO2d0YWcoJ2NvbmZpZycsJyRYWFgnKTs=&amp;quot;
]}&amp;lt;/element-for-mappings&amp;gt;
&amp;lt;script async src=&amp;quot;scheme-for-mappings:ga&amp;quot;&amp;gt;&amp;lt;/script&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, a content blocker that decides to block the HTTPS fetch could
allow the &lt;code&gt;data:&lt;/code&gt; URI and thereby preserve compatibility. Nothing really
changed, except that the fallback script is async too. Of course, this is an
unlikely outcome as this is not even remotely backward-compatible, but it does
give some hints about some of the possibilities.&lt;/p&gt;
&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;
&lt;p&gt;So that was many more words than I expected to write. The size and complexity
of this problem continues to be impressive. No doubt this conversation will
continue for some time before we reach some sort of conclusion.&lt;/p&gt;
&lt;p&gt;For me, the realization that it is possible to provide finer control over how
outgoing references are managed was a big deal. We don’t have to accept a
design that allows one resource speaking for others, we just have to allow for
control over how references are made. That’s a fairly substantial improvement
over most existing proposals and the basis upon which something good might be
built.&lt;/p&gt;
&lt;p&gt;I still have serious reservations about the caching and performance trade-offs
involved with bundling. Attempting to solve this problem with selective
fetching of bundle contents seems like far too much complexity. Not only does
it require addressing the known-hard problem of cache invalidation, it also
requires that we find solutions to problems that have defied solutions on
numerous occasions in the past.&lt;/p&gt;
&lt;p&gt;That said, I’ve concluded that giving servers the choice in how content is
assembled does not result in bad outcomes for others. Unless we include signed
exchanges&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/bundles/#fn25&quot; id=&quot;fnref25&quot;&gt;[25]&lt;/a&gt;&lt;/sup&gt;,
we are not talking about negative externalities.&lt;/p&gt;
&lt;p&gt;If we accept that selective fetching is a difficult problem, supporting bundles
might not be all-powerful from the outset. It might only give servers and
developers more options. What we learn from trying that out might give us the
information that allows us to find good solutions later. Resource maps mean
that we can always fall back to fetching resources individually. Resources maps
could even be the foundation upon which we build new experiments with
alternative resource fetching models.&lt;/p&gt;
&lt;p&gt;All that said, the usability advantages provided by bundles seem to be
sufficient justification for enabling their support. That applies even if there
is uncertainty about performance. That applies even if we don’t initially solve
those performance problems. One enormous problem at a time, please.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Have I ever mentioned that I loathe CamelCase
names?  Thanks 1990s. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yoav Weiss makes this claim based on his
experience with Chromium. I respect his experience here, but don’t know what
was done to reach this conclusion. I can see there being a lot more
investigation and discussion about this point. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is due to the
way congestion control algorithms operate. These start out slow in case the
network is constrained, but gradually speed up. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Eric Rescorla suggested a possible reason that server push
regresses performance: pushing only really helps if the transmission channel
from server to client has spare capacity. Because HTTP/2 clients can make lots
of requests cheaply, it’s entirely possible that the channel is – or will soon
be – already full. If pushed resources are less important than resources the
client has already requested, even if the client eventually needs those pushed
resources, the capacity spent on pushing will delay more important responses. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Tantek Çelik pointed out that you can use a service worker to load old
content at the same time as checking asynchronously for updates. That’s even
better. The fact is, service workers can do just about anything discussed here.
That you need to write and maintain a service worker might be enough to
discourage all but the bravest of us though. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You might reasonably suggest that this
sort of thinking tends toward suboptimal local minima. That is a fair
criticism, but my rejoinder there might be that conditioning success on a
design that reduces to a previously unsolved problem is not really a good
strategy either. Besides, accepting suboptimal local minima is part of how we
make forward progress without endless second-guessing. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I
seem to recall this being raised before Pete Snyder opened this issue, perhaps
at the &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc8752&quot;&gt;ESCAPE workshop&lt;/a&gt;, but I
can’t put a name to it. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;In particular, the split between style (CSS) and semantics
(HTML). &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At this point, a footnote seems necessary. Yes, a
browser is an
&lt;a href=&quot;https://martinthomson.github.io/tmi/draft-thomson-tmi.html&quot;&gt;intermediary&lt;/a&gt;. All
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-hildebrand-middlebox-erosion-01&quot;&gt;previous
complaints&lt;/a&gt;
apply. It would be dishonest to deny the possibility that a browser might abuse
its position of privilege. But that is the topic for a much longer posting. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This more than makes up
for the overheads of the ad blocker in most cases, with page loads being
considerably faster on ad-heavy pages. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;If it isn’t clear, I’m firmly of the opinion that Google’s
&lt;a href=&quot;https://developers.google.com/amp/cache/overview&quot;&gt;AMP Cache&lt;/a&gt; is not just a bad
idea, but an abuse of Google’s market dominance. It also happens to be a gross
waste of resources in a lot of cases, as Google pushes content that can be
either already present or content for links that won’t ever be followed. Of
course, if they guess right and you follow a link, navigation is fast.
Whoosh. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;With increasing amounts of scripts, interfaces might
also be expressed at the JavaScript module or function level. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yep. Pun totally intended. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Worth noting here is the care Google takes to
structure the script to avoid breaking pages when their JavaScript load is
blocked by an ad blocker. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn15&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I
wonder how many people are still fetching
&lt;code&gt;&lt;a href=&quot;https://ssl.google-analytics.com/ga.js&quot;&gt;ga.js&lt;/a&gt;&lt;/code&gt; from Google. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref15&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn16&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This isn’t a great example, because while it
prevents the code from being identified, it’s probably not a very good
solution. For starters, the advertiser no longer sees requests that come
directly from browsers, which it might use to track people. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref16&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn17&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Note that, at least for ad blocking, the biggest gains come from not
&lt;em&gt;executing&lt;/em&gt; unwanted content, as executing ad content almost always leads to a
chain of additional fetches. Saving the CPU time is the third major component
to savings. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref17&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn18&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Yes, that
effectively means bundling them with content. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref18&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn19&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Curiously, the
&lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-variants&quot;&gt;Variants&lt;/a&gt;
design is might not be a good fit here as it provides enumeration of
alternatives, which is tricky for the same reason that caching in ignorance of
bundling is. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref19&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn20&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is lots to quibble about in the exact spelling in this
example, but I just copied from the proposal directly. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref20&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn21&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It’s tempting here to suggest &lt;code&gt;urn:&lt;/code&gt;, but that might
cause some heads to explode. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref21&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn22&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The thought occurs that this is something
that could be exploited to allow for safe patching of dependencies when
combined with semantic versioning. For instance, I will accept any version
&lt;code&gt;X.Y.?&lt;/code&gt; of this file greater than &lt;code&gt;X.Y.Z&lt;/code&gt;. We can leave that idea for another
day though. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref22&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn23&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Using &lt;a href=&quot;https://ipfs.io/&quot;&gt;IPFS&lt;/a&gt; seems far more
plausible if you allow it as one option of many with the option for graceful
fallback. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref23&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn24&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;To
what extent providing information ahead of time can be used to improve
performance is something that I have often wondered about; it seems like it has
some interesting trade-offs that might be worth studying. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref24&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn25&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;We’ve already established that &lt;a href=&quot;https://mozilla.github.io/standards-positions/#http-origin-signed-responses&quot;&gt;signed exchanges are not good for
the
Web&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/bundles/#fnref25&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Standardizing Principles</title>
    <link href="https://lowentropy.net/posts/standard-principles/"/>
    <updated>2021-01-05T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/standard-principles/</id>
    <content type="html">&lt;p&gt;There is a perennial question in standards development about the value of the different artefacts that the process kicks out.&lt;/p&gt;
&lt;p&gt;One subject that remains current is the relative value of specifications against things like compliance testing frameworks.  Reasonable people tend to place different weight on tests, with a wide range of attitudes.  In the past, more people were willing to reject attempts to invest in any shared test or compliance infrastructure.&lt;/p&gt;
&lt;p&gt;In recent years however, it has become very clear that a common test infrastructure is critical to developing a high quality standard.  Developing tests in conjunction with the standardization effort has improved the quality of specifications and implementations a great deal.&lt;/p&gt;
&lt;p&gt;Recently, I encountered an example where a standards group deliberately chose not to document behaviour, relying exclusively on the common test framework.  Understanding what is lost when this&lt;/p&gt;
&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
&lt;p&gt;My experience with compliance testing in standards development is patchy.  It might help to describe how these have worked out.&lt;/p&gt;
&lt;h3 id=&quot;standardize-first&quot;&gt;Standardize First&lt;/h3&gt;
&lt;p&gt;Some of the early projects I was involved in relied on testing being entirely privately driven.  This can lead to each team relying almost exclusively on tests they develop internally.  Occasional pairwise interoperability testing occurs, but it is ad hoc and unreliable.&lt;/p&gt;
&lt;p&gt;This loose arrangement does tend to result in specifications being published sooner.  The cost is in less scrutiny, especially when it comes to details, so the quality of the output is not as good as it could be.&lt;/p&gt;
&lt;p&gt;This doesn’t mean that there is no compliance testing, but it requires effort.  That effort can pay off, as I have seen with &lt;a href=&quot;https://crossbar.io/autobahn/&quot;&gt;WebSockets&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/FIPS_140&quot;&gt;FIPS-140&lt;/a&gt;, &lt;a href=&quot;https://cache-tests.fyi/&quot;&gt;HTTP Caching&lt;/a&gt;, and others.&lt;/p&gt;
&lt;h3 id=&quot;implement-in-parallel&quot;&gt;Implement in Parallel&lt;/h3&gt;
&lt;p&gt;My experience with &lt;a href=&quot;https://tools.ietf.org/html/rfc7540&quot;&gt;HTTP/2&lt;/a&gt; was not a whole lot different to those early projects.  The major improvement there was the level of active engagement from implementers in developing the specification.&lt;/p&gt;
&lt;p&gt;This process did not involve active development of a compliance testing framework, but there were regular interoperability tests.  I still remember Jeff Pinner deploying &lt;a href=&quot;https://tools.ietf.org/html/draft-ietf-httpbis-http2-04&quot;&gt;draft -04&lt;/a&gt; to production on &lt;a href=&quot;https://twitter.com/&quot;&gt;twitter.com&lt;/a&gt; during a meeting.  Not everyone was so fearless, but live deployment was something we saw routinely in the 13 subsequent drafts it took finalize the work.&lt;/p&gt;
&lt;p&gt;Good feedback from implementations was key to the success of HTTP/2, which now drives &lt;a href=&quot;https://mzl.la/35bToXH&quot;&gt;well over half of the HTTP requests in Firefox&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The excellent &lt;a href=&quot;https://github.com/summerwind/h2spec&quot;&gt;h2spec&lt;/a&gt; came out a little after the release of the specification.  It has since become a valuable compliance testing tool for implementers.&lt;/p&gt;
&lt;h3 id=&quot;test-in-parallel&quot;&gt;Test in Parallel&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc8446&quot;&gt;TLS 1.3&lt;/a&gt; followed a similar trajectory to HTTP/2, with a few interesting twists.  Part of the testing that occurred during the development of the protocol was formal verification.  For example, &lt;a href=&quot;https://tls13tamarin.github.io/TLS13Tamarin/docs/tls13tamarin.pdf&quot;&gt;a Tamarin model of TLS 1.3&lt;/a&gt; was developed alongside the protocol, which both informed the design and provided validation of the design.  Some implementations automated compliance testing based on &lt;a href=&quot;https://boringssl.googlesource.com/boringssl/+/refs/heads/master/ssl/test/runner/&quot;&gt;a tool developed for BoringSSL&lt;/a&gt;, which turned out to be very useful.&lt;/p&gt;
&lt;p&gt;With &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-transport.html&quot;&gt;QUIC&lt;/a&gt;, Marten Seeman and Jana Iyengar developed &lt;a href=&quot;https://github.com/marten-seemann/quic-interop-runner/&quot;&gt;a framework&lt;/a&gt; that automates testing between QUIC implementations.  This runs regularly and produces &lt;a href=&quot;https://interop.seemann.io/&quot;&gt;a detailed report&lt;/a&gt; showing how each implementation stands up under a range of conditions, some of them quite adversarial.  This has had a significant positive effect on the quality of both implementations and specifications&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Overall, I can see no way of going back to anything less.  In all cases, tests have been so valuable that there is no way I would go back to a world without them.&lt;/p&gt;
&lt;h3 id=&quot;test-first&quot;&gt;Test First&lt;/h3&gt;
&lt;p&gt;Of course, no mention of testing would be complete with remiss here not to mention the excellent &lt;a href=&quot;https://web-platform-tests.org/&quot;&gt;Web Platform Tests&lt;/a&gt;, which are now critical parts of the process adopted by the &lt;a href=&quot;https://whatwg.org/working-mode#changes&quot;&gt;WHATWG&lt;/a&gt; and &lt;a href=&quot;https://www.w3.org/2019/05/webapps-charter.html&quot;&gt;some W3C groups&lt;/a&gt;.  Web Platform Tests are considered a prerequisite for normative specification changes under these processes.&lt;/p&gt;
&lt;p&gt;Akin to &lt;a href=&quot;https://en.wikipedia.org/wiki/Test-driven_development&quot;&gt;test driven development&lt;/a&gt;, this ensures that new features and changes are not just testable, but tested, before anything is documented.  In practice the work continues in parallel, with tight feedback between development, specification, and testing.  Shorter feedback cycles means that work can be completed faster and with higher quality.&lt;/p&gt;
&lt;h2 id=&quot;the-role-of-specifications&quot;&gt;The Role of Specifications&lt;/h2&gt;
&lt;p&gt;An obvious question that might be asked when it comes to this process, particularly where there are firm requirements for tests, is what value the specification provides.  Given sufficiently thorough testing, it should be possible to construct an interoperable implementation based solely on those tests.&lt;/p&gt;
&lt;p&gt;To go further, when &lt;a href=&quot;https://w3c.github.io/ServiceWorker/#cache-storage-match&quot;&gt;specifications consist of mostly code-like constructs&lt;/a&gt; and real implementations are open source anyway, the value of a specification seems greatly diminished.  As empirical observation of how things actually work is of more value to how they work in theory, it is reasonable to ask what value the specification provides.&lt;/p&gt;
&lt;p&gt;As my own recent experience with the &lt;a href=&quot;https://en.wikipedia.org/wiki/CUBIC_TCP&quot;&gt;Cubic congestion control algorithm&lt;/a&gt; taught me, what is implemented and deployed is what matters.  &lt;a href=&quot;https://tools.ietf.org/html/rfc8312&quot;&gt;The RFC that purports to document Cubic&lt;/a&gt; is not really implementable and barely resembles &lt;a href=&quot;https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/net/ipv4/tcp_cubic.c&quot;&gt;what real implementations do&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So if testing is a central part of the development of new standards and people rely increasingly on tests or observing the behaviour of other implementations, it is reasonable to question what value specifications provide.&lt;/p&gt;
&lt;h3 id=&quot;a-specification-can-teach&quot;&gt;A Specification Can Teach&lt;/h3&gt;
&lt;p&gt;Specification documents often come with a bunch of normative language.  Some of the most critical text defines what it means to be conformant, describing what is permitted and what is forbidden in precise terms.&lt;/p&gt;
&lt;p&gt;Strictly normative text is certainly at risk from displacement from good testing.  But there is often a bunch of non-normative filler in specifications.  Though text might be purely informative, it is often of significant value to people who are attempting to understand the specification in detail:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Informative text can motivate the existence of the specification.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Filler can provide insights into why things are.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Notes can point to outcomes results that might not be obvious.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For specifications that are developed using &lt;a href=&quot;https://open-stand.org/about-us/principles/&quot;&gt;an open process&lt;/a&gt;, much of this information is not hidden, but it can be difficult to find&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.  Presenting timely, relevant information to readers is useful in putting things into context.&lt;/p&gt;
&lt;h3 id=&quot;a-specification-can-capture-other-forms-of-agreement&quot;&gt;A Specification Can Capture Other Forms Of Agreement&lt;/h3&gt;
&lt;p&gt;One of the hardest lessons out of recent standards work has been the realization that many decisions are made with only superficial justification.  Developing standards based on shared principles is much harder than agreeing on what happens in certain conditions, or which bit goes where.&lt;/p&gt;
&lt;p&gt;Though it might be harder, reaching agreement&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; on principles is far more enduring and valuable.  A specification can document that agreement.&lt;/p&gt;
&lt;p&gt;Reaching agreement or consensus on a principle can be hard for a variety of reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dealing with abstractions can be challenging because people can develop different abstract models based on their own perspective and biases.  Subtle differences can mean a lot of talking past each other.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Abstractions can also become too far removed from reality to be useful.  This might serve you well when filing a patent application, but ultimately we depend on principles being applicable to the current work&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Agreement on principles can be difficult because it forces people to fully address differences of opinion.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without first addressing disagreements in principle, it is possible that concrete decisions could be consistent with different perspectives.  This might not have any immediate effect, but could produce inconsistencies.  Some inconsistency can result in real problems, especially if it becomes necessary to rely more extensively on a principle that was in contention&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;However hard agreement might be to achieve, a principle that is agreed can inform multiple decisions.  Documenting a principle that has achieved agreement can therefore more efficient over time.  Documentation can also help avoid application of inconsistent or conflicting principles over time.&lt;/p&gt;
&lt;p&gt;Documenting principles does not have a direct normative effect.  But a specification offers an opportunity to document more than just conformance requirements, it can capture other types of agreement.&lt;/p&gt;
&lt;h3 id=&quot;conformance-test-suites-can-overreach&quot;&gt;Conformance Test Suites Can Overreach&lt;/h3&gt;
&lt;p&gt;A problem that can occur with conformance testing is that the tests can disagree with specifications.  If implementations depend more on the test than the specification, this can make the conformance test the true source of the definition of what it means to interoperate.&lt;/p&gt;
&lt;p&gt;This is not inherently bad.  It can be that the tests capture something that is inherently better, because it reflects what people need, because it is easier to implement, or just because that is what interoperates.&lt;/p&gt;
&lt;p&gt;Of course, disagreement between two sources that claim authority does implementations a disservice.  A new implementation now has to know which is “correct”.  Ensuring that deployments, tests, and specifications align is critical to ensuring the viability of new implementations.&lt;/p&gt;
&lt;p&gt;The true risk with relying on tests is the process by which conformance tests are maintained.  Specification development processes are burdened with rules that govern how agreement is reached.  Those rules exist for good reason.&lt;/p&gt;
&lt;p&gt;Change control processes for conformance testing projects might not provide adequate protection for anti-trust or intellectual property.  They also might lack opportunities for affected stakeholders to engage.  This doesn’t have to be the case, but the governance structures underpinning most conformance suites is usually less robust than that of standards&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;The exact nature of how specifications are used to guiding the development of interoperable standards is something of a fluid situation.  Here I’ve laid out a case for the value of specifications: for the non-normative language they provide, for their ability to capture agreement on more than just normative functions, and for the governance structures that they use.  There are probably other reasons too, and likely counter-arguments, both of which I would be delighted to hear about.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I should also point at &lt;a href=&quot;https://quic-tracker.info.ucl.ac.be/grid&quot;&gt;QUIC Tracker&lt;/a&gt; and &lt;a href=&quot;http://d.hatena.ne.jp/kazu-yamamoto/&quot;&gt;Kazu Yamamoto&lt;/a&gt; has also started work on an &lt;a href=&quot;https://github.com/kazu-yamamoto/h3spec&quot;&gt;h3spec&lt;/a&gt;, both of which have made significant contributions too. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;For example, the development of even a relatively small specification like QUIC involved more than 4000 issues and pull requests, more than 8000 email messages, not to mention all the chat messages that are not in public archives. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;…or consensus if that is how you spell it. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is perhaps a criticism that might be levelled at &lt;a href=&quot;https://w3ctag.github.io/design-principles/#priority-of-constituencies&quot;&gt;the priority of constituencies&lt;/a&gt; or text like that in &lt;a href=&quot;https://tools.ietf.org/html/rfc8890&quot;&gt;RFC 8890&lt;/a&gt;.  However, these might be more correctly viewed as meta-principles, or ideals that guide the development of more specific and actionable principles. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;An example of this might be DNS, where the need for agreement on principles was neglected.  As such, the global community has no documented principles that might guide decisions on issues such as having a single global namespace or whether network operators are entitled to be involved in name resolution.  Now that encrypted DNS is being rolled out, reflective of a principle that values individual privacy, it is become obvious that people differing views but no shared principles have been coexisting. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Not that these too lack opportunities for improvement, but they are the best we have. &lt;a href=&quot;https://lowentropy.net/posts/standard-principles/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>RFCs in HTML</title>
    <link href="https://lowentropy.net/posts/line-length/"/>
    <updated>2020-12-18T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/line-length/</id>
    <content type="html">&lt;p&gt;I spend a shocking amount of my time staring at IETF documents, both Internet-Drafts and RFCs.  I have spend quite a bit of time looking at GitHub README files and W3C specifications.&lt;/p&gt;
&lt;p&gt;For reading prose, the format I routinely find to be the most accessible is the text versions.  This is definitely not based on the quality of the writing, all of these formats produce unreadable documents.  What I refer to here is not the substance, but the form.  That is, how the text is laid out on my screen&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;There is clearly a degree of familiarization and bias involved in this.  A little while ago, I worked out that there is just one thing that elevates that clunky text format above the others: line length.&lt;/p&gt;
&lt;h2 id=&quot;relearning-old-lessons&quot;&gt;Relearning Old Lessons&lt;/h2&gt;
&lt;p&gt;This is hardly a new insight.  A brief web search will return &lt;a href=&quot;https://practicaltypography.com/line-length.html&quot;&gt;numerous&lt;/a&gt; &lt;a href=&quot;https://smad.jmu.edu/shen/webtype/linelength.html&quot;&gt;articles&lt;/a&gt; on the &lt;a href=&quot;https://www.fonts.com/content/learning/fontology/level-2/text-typography/length-column-width&quot;&gt;subject&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.  All of them say the same thing: shorter lines are more readable.&lt;/p&gt;
&lt;p&gt;I was unable to find a single print newspaper that didn’t take this advice to heart, if not to extremes&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;.  Some magazines have ignored this, but those too turned out to be ill suited to reading prose and more geared toward looking at the pictures.&lt;/p&gt;
&lt;p&gt;Recommendations from most sources put a hard stop somewhere around 80 characters.  Some go a little lower or higher, but the general advice is pretty consistent.  Of course, variable-width fonts make this imprecise, but  it tends to average out.&lt;/p&gt;
&lt;h2 id=&quot;why-text-is-so-good&quot;&gt;Why Text Is So Good&lt;/h2&gt;
&lt;p&gt;I suppose that it is no accident that this corresponds to the width of the screen on a &lt;a href=&quot;https://en.wikipedia.org/wiki/VT52&quot;&gt;DEC 52&lt;/a&gt;.  The text format of old RFCs&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt; might have been to fit on these small screens, or it might have been to make printing easier, but the net effect is that you can get just &lt;a href=&quot;https://tools.ietf.org/html/rfc7994#section-4.3&quot;&gt;72 characters on a line&lt;/a&gt;.  The standard tools spend three of those on a left margin for text, so that means just 69 fixed-width characters per line.&lt;/p&gt;
&lt;p&gt;That turns out to be very readable.&lt;/p&gt;
&lt;h2 id=&quot;why-html-is-so-bad&quot;&gt;Why HTML Is So Bad&lt;/h2&gt;
&lt;p&gt;The “official” HTML renderings of RFCs on &lt;a href=&quot;https://www.rfc-editor.org/&quot;&gt;rfc-editor.org&lt;/a&gt; is a little wider than this.  If I measure using whole alphabets&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;, this results in a width of 98 characters.  That’s more than the maximum in any recommendation I found.&lt;/p&gt;
&lt;p&gt;Performing a similar test on the W3C specification style&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; used for W3C publications, I got 102 characters.  The WHATWG &lt;a href=&quot;https://fetch.spec.whatwg.org/&quot;&gt;Fetch Standard&lt;/a&gt; had room for a massive 163 characters!&lt;/p&gt;
&lt;p&gt;All of these wrap earlier than this on a smaller screen, but these are relatively small font sizes, so many screens will be wide enough to reach these values.  Many people have a screen that has the 1300 horizontal pixels&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt; needed to get to 100 characters in a W3C specification.  The official IETF HTML crams its 98 characters into just 724 pixels.&lt;/p&gt;
&lt;p&gt;High text density comes from the font size and line height being quite small in official renderings of IETF documents.  This compounds the problem as it makes tracking from one line to the next when reading more difficult.  I consider the 14px/22.4px of the official IETF rendering to be positively tiny.  I use a 9px (monospace) font in terminals, but I wouldn’t inflict that choice on others.  That W3C and WHATWG settled on 16px/24px is far more humane, though with the selected font I still find this a little on the small side.&lt;/p&gt;
&lt;p&gt;What is interesting here is that the text rendering on &lt;a href=&quot;https://tools.ietf.org/html/&quot;&gt;tools.ietf.org&lt;/a&gt; uses a value of &lt;code&gt;13.33px&lt;/code&gt;.  This seems smaller, but - at least subjectively - it is no harder to read than the &lt;code&gt;16px&lt;/code&gt; W3C/WHATWG specifications.  Also, the default font configuration in Firefox is &lt;code&gt;16px&lt;/code&gt; for most fonts and &lt;code&gt;13px&lt;/code&gt; for monospace, suggesting that smaller font sizes are better tolerated for monospace fonts.  That’s especially convenient here as it happens.&lt;/p&gt;
&lt;h2 id=&quot;making-html-readable&quot;&gt;Making HTML Readable&lt;/h2&gt;
&lt;p&gt;The fix is pretty simple, make the &lt;code&gt;max-width&lt;/code&gt; small enough that lines don’t run so long.  I set a value of &lt;code&gt;600px&lt;/code&gt;.  Combine this with a font size of &lt;code&gt;16px&lt;/code&gt; and the result is a line length of 72&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-transport.html&quot;&gt;editor’s copy of the QUIC spec&lt;/a&gt; is a fairly thorough example of this.&lt;/p&gt;
&lt;h3 id=&quot;fonts&quot;&gt;Fonts&lt;/h3&gt;
&lt;p&gt;I chose to change the font to something that is a little wider at the same time. Using &lt;a href=&quot;https://docs.microsoft.com/en-us/typography/font-list/arial&quot;&gt;Arial&lt;/a&gt; - the default sans-serif font on Windows and the font chosen by the W3C and WHATWG - adds 4-5 characters to line length and is noticeably smaller on screen.  &lt;a href=&quot;https://docs.microsoft.com/en-us/typography/font-list/times-new-roman&quot;&gt;Times New Roman&lt;/a&gt; - the default serif font - adds 9-10 characters and is smaller again.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://fonts.google.com/specimen/Lora&quot;&gt;Lora&lt;/a&gt;, which has a light serif, was my choice for text.  I know little enough about fonts that this was ultimately subjective.  &lt;a href=&quot;https://fonts.google.com/specimen/Noto+Sans&quot;&gt;Noto Sans&lt;/a&gt;, the font used in IETF official renderings, is comparable here, but I find it a little boring.&lt;/p&gt;
&lt;p&gt;Some people don’t like the visual noise of a serif font for reading on a screen.  Modern displays with high pixel density are less vulnerable to that and this is a light font with enough serif noise to add a little flair without adversely affecting readability.  Lora is very readable at &lt;code&gt;16px&lt;/code&gt;, where many other serif fonts require a larger size to be similarly clear.&lt;/p&gt;
&lt;h3 id=&quot;headings&quot;&gt;Headings&lt;/h3&gt;
&lt;p&gt;Fitting headings on a single line given the shorter line length turned out to be fiddly.  I didn’t want headings to wrap, or to use too small a font.  And IETF people have a deep and abiding love for &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-invariants.html#section-1&quot;&gt;very long headings&lt;/a&gt;.  For this, a condensed font was ideal.&lt;/p&gt;
&lt;p&gt;A semi-condensed font might have been ideal, but there are fewer of those and it was a little hard to find one that didn’t look too jarring next to the main text&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.  Again Google Fonts was a great resource and &lt;a href=&quot;https://fonts.google.com/specimen/Cabin+Condensed&quot;&gt;Cabin Condensed&lt;/a&gt; is OK.&lt;/p&gt;
&lt;h3 id=&quot;ascii-art&quot;&gt;ASCII Art&lt;/h3&gt;
&lt;p&gt;In setting this size, it is then necessary to consider the effect on diagrams.  IETF documents are still stuck in the dark ages when it comes to diagrams and ASCII Art still dominates there.  As the text format accepts 72 column text, so too must the figures in the HTML output.&lt;/p&gt;
&lt;p&gt;This turns out to be a bit of a compromise.  Styling of figures to include an offset from text, a border, and background shading eats up horizontal space.  In the end, I managed to reduce the text size to &lt;code&gt;13.5px&lt;/code&gt; and set &lt;code&gt;letter-spacing: -0.2px&lt;/code&gt; to slightly compress the text further and fit 72 columns in&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h3 id=&quot;minimizing-distractions&quot;&gt;Minimizing Distractions&lt;/h3&gt;
&lt;p&gt;The styles used here are based on those from an earlier version of the official renderings.  Once the major pieces were in place, the details need to be aligned to fit.  After fixing major items like margins and line heights to match font and size choices, a bunch of work is needed to make documents look consistent.  The first task was removing a bunch of design elements that I found distracting.&lt;/p&gt;
&lt;p&gt;The HTML rendering includes a &lt;a href=&quot;https://en.wikipedia.org/wiki/Pilcrow&quot;&gt;pilcrow&lt;/a&gt; at the end of each paragraph.  This enables linking to specific paragraphs, which is a great feature.&lt;/p&gt;
&lt;p&gt;The official styling only renders the pilcrow when the paragraph is hovered&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;, but it renders very strongly when shown and so can be distracting.  That needed softening.&lt;/p&gt;
&lt;p&gt;The default blue (&lt;code&gt;#00f&lt;/code&gt;) for links is strongly saturated, which is too assertive.  Reducing the saturation makes links blend into text better.&lt;/p&gt;
&lt;p&gt;Changing background colours on hover for titles is a nice way of indicating the presence of links, but that too was very strong.  Making that lighter made moving the mouse less of a light show.&lt;/p&gt;
&lt;h3 id=&quot;cleanup&quot;&gt;Cleanup&lt;/h3&gt;
&lt;p&gt;Then there was a bunch of maintenance and tidying:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Negative margins on headings, presumably to tweak the position of headings when following internal links to section headings, went&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn12&quot; id=&quot;fnref12&quot;&gt;[12]&lt;/a&gt;&lt;/sup&gt;.&lt;/li&gt;
&lt;li&gt;Rules that were overwritten later in the file were consolidated.&lt;/li&gt;
&lt;li&gt;The table of contents was moved closer to content.&lt;/li&gt;
&lt;li&gt;Horizontal lines were given the flick.&lt;/li&gt;
&lt;li&gt;Table and figure captions were tightened up.&lt;/li&gt;
&lt;li&gt;Authors addresses were put into multiple columns.&lt;/li&gt;
&lt;li&gt;The References section got a big cleanup too.&lt;/li&gt;
&lt;li&gt;I use CSS variables (&lt;code&gt;var(--foo)&lt;/code&gt;), which is a great feature.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, a bunch of work was put into making this look decent on a small screen.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;What I’ve learned from this is a newfound respect for the work designers do.  My amateur fumbling here has helped me appreciate just how many detail work goes into making something like this look good.&lt;/p&gt;
&lt;p&gt;Immense thanks are owed to Anitra Nottingham, who graciously provided feedback on earlier versions of this work.  Those versions were obviously much worse.  I also owe thanks to Mark Nottingham, James Gruessing, Adam Roach, Jeffrey Yasskin and those I’ve forgotten who each took the time to provide feedback and expertise.&lt;/p&gt;
&lt;p&gt;None of this is truly professional.  I’m still finding things that I don’t like.  I’m still not happy with various pieces of spacing, for instance.&lt;/p&gt;
&lt;p&gt;Even learning this much design is more of a curse than I’d like.  I might not ace &lt;a href=&quot;https://cantunsee.space/&quot;&gt;cantunsee&lt;/a&gt;, but I know enough to notice things like alignment issues and bad kerning&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn13&quot; id=&quot;fnref13&quot;&gt;[13]&lt;/a&gt;&lt;/sup&gt; now.  I’m not sure that that has enriched my life all that much.&lt;/p&gt;
&lt;p&gt;But the main thing remains: I can read these documents now.  Cutting the line length was what did that.  I now prefer HTML if it uses this stylesheet&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/line-length/#fn14&quot; id=&quot;fnref14&quot;&gt;[14]&lt;/a&gt;&lt;/sup&gt;.  The rest was just gravy.&lt;/p&gt;
&lt;p&gt;The stylesheet can be found &lt;a href=&quot;https://github.com/martinthomson/i-d-template/blob/main/v3.css&quot;&gt;here&lt;/a&gt;.  Contributions are welcome.  Anyone using &lt;a href=&quot;https://github.com/martinthomson/i-d-template&quot;&gt;my GitHub template&lt;/a&gt; for generating Internet-Drafts already benefits from this work.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Reading from paper is not something I can countenance; the cost of in paper on my specification reading alone would be devastating and I like tree too much to do that to them. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;And those are just links from my browsing history &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;So many hyphens… &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Officially, they are &lt;a href=&quot;https://tools.ietf.org/html/rfc7991&quot;&gt;all XML now&lt;/a&gt; and only rendered to text or HTML. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The way to do this is to find a paragraph and open it in browser developer tools.  Add a style rule of &lt;code&gt;overflow: hidden&lt;/code&gt; then modify the content to be “abcdef…” and repeat until the text cuts off.  This follows the advice in &lt;a href=&quot;https://practicaltypography.com/line-length.html&quot;&gt;Butterick’s Practical Typography&lt;/a&gt;. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I tested the &lt;a href=&quot;https://w3c.github.io/push-api/&quot;&gt;Push API&lt;/a&gt;, which uses &lt;a href=&quot;https://github.com/w3c/respec&quot;&gt;ReSpec&lt;/a&gt;, but specifications using &lt;a href=&quot;https://tabatkins.github.io/bikeshed/&quot;&gt;Bikeshed&lt;/a&gt; produced exactly the same result. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Using the browser measure for pixel, which doesn’t correspond to dots on screen for devices with high pixel density. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I wasn’t going for this deliberately, but that is how it worked out. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;In particular, I have this thing about the shape of ‘e’ and ‘a’.  They can’t be dramatically different. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The need for packing this tightly came when I discovered that pilcrows for figures were possible, but the official rendering put them on a blank line.  That broke the document flow badly and I wanted space for those on the line as well.  See &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-recovery.html#section-b.5-3&quot;&gt;this example&lt;/a&gt; for how that turned out. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Is this an accessibility problem?  I don’t know. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn12&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;I’ve learned that with CSS, like many other things, can lend itself easily to making small hacks.  The net effect of introducing a hack is invariably that you have to add a whole bunch more corrective hacks in a death spiral.  Avoid hacks. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref12&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn13&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;My 9 year old son finds &lt;a href=&quot;https://www.brookes.net.au/&quot;&gt;signs for this real estate company&lt;/a&gt;, which seem deliberately bad, amusing.  It’s clearly infectious. &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref13&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn14&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Mark Nottingham has &lt;a href=&quot;https://mnot.github.io/I-D/http-grease/&quot;&gt;a different stylesheet&lt;/a&gt; that is also acceptable.  He also uses a very nice font. (Edit 2024-12-11: Mark is now using my stylesheet. Shame about that &lt;a href=&quot;https://typographyforlawyers.com/mb-fonts.html&quot;&gt;font&lt;/a&gt;.) &lt;a href=&quot;https://lowentropy.net/posts/line-length/#fnref14&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Next Level Version Negotiation</title>
    <link href="https://lowentropy.net/posts/vn/"/>
    <updated>2020-12-11T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/vn/</id>
    <content type="html">&lt;p&gt;The &lt;a href=&quot;https://www.iab.org/activities/programs/evolvability-deployability-maintainability-edm-program/&quot;&gt;IAB EDM Program&lt;/a&gt;&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt; met this morning.  While the overall goal of the meeting, we ended up talking a lot &lt;a href=&quot;https://intarchboard.github.io/use-it-or-lose-it/draft-iab-use-it-or-lose-it.html&quot;&gt;a document&lt;/a&gt; I wrote a while back and how to design version negotiation in protocols.&lt;/p&gt;
&lt;p&gt;This post provides a bit of background and shares some of what we learned today after what was quite a productive discussion.&lt;/p&gt;
&lt;h2 id=&quot;protocol-ossification&quot;&gt;Protocol Ossification&lt;/h2&gt;
&lt;p&gt;The subject of protocol ossification has been something of a live discussion in the past several years.  The community has come to the realization that it is effectively impossible to extend many Internet protocols without causing a distressing number of problems with existing deployments.  It seems like no protocol is unaffected&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;.  IP, TCP, TLS, and HTTP all have various issues that prevent extensions from working correctly.&lt;/p&gt;
&lt;p&gt;A number of approaches have been tried.  &lt;a href=&quot;https://tools.ietf.org/html/rfc7540&quot;&gt;HTTP/2&lt;/a&gt;, which was developed early in this process, was deployed only for HTTPS.  Even though a cleartext variant was defined, many implementations explicitly decided not to implement that, partly motivated by these concerns.  &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-transport.html&quot;&gt;QUIC&lt;/a&gt; doubles down on this by encrypting as much as possible.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc8446&quot;&gt;TLS 1.3&lt;/a&gt;, which was delayed by about &lt;em&gt;a year&lt;/em&gt; by related problems, doesn’t have that option so it ultimately used trickery to avoid notice by problematic middleboxes: TLS 1.3 looks a lot like TLS 1.2 unless you are paying close attention.&lt;/p&gt;
&lt;p&gt;One experiment that turned out to quite successful in revealing ossification in TLS was &lt;a href=&quot;https://tools.ietf.org/html/rfc8701&quot;&gt;GREASE&lt;/a&gt;.  David Benjamin and Adam Langley, who maintain the TLS stack used by Google&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; found that inserting random values into different extension points had something of a cleansing effect on the TLS ecosystem.  Several TLS implementations were found to be intolerant of new extensions.&lt;/p&gt;
&lt;p&gt;One observation out of the experiments with TLS was that protocol elements that routinely saw new values, like cipher suites, were less prone to failing when previously unknown values were encountered.  Those that hadn’t seen new values as often, like server name types or signature schemes, were more likely to show problems.  This caused Adam Langley to &lt;a href=&quot;https://www.imperialviolet.org/2016/05/16/agility.html&quot;&gt;advise&lt;/a&gt; that protocols “have one joint and keep it well oiled.”&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://intarchboard.github.io/use-it-or-lose-it/draft-iab-use-it-or-lose-it.html&quot;&gt;draft-iab-use-it-or-lose-it&lt;/a&gt; explores the problem space a little more thoroughly.  The draft looks at a bunch of different protocols and finds that in general the observations hold.  The central thesis is that for an extension point to be usable, it needs to be actively used.&lt;/p&gt;
&lt;!--
## Grease

David and Adam observed that the cause of this is that many extension points in protocols don&#39;t receive new usage for extended periods.  They then suggested that apparent, but meaningless, use of extension codepoints by widely deployed implementations would ensure that new values were routinely encountered by other implementations.

As failure to handle new values is a bug, and one that is relatively easy to fix, the goal here is to ensure that implementations - especially new ones - encountered conditions that would trigger failures if they had that bug.

The community is still unsure that greasing is a generally applicable technique.  There is also the suggestion that the bugs found in TLS were a one-off.  Maybe new implementations of new protocols won&#39;t have the implementation flaws of the past^[This view is tempered somewhat by experience with HTTP/2 which has some pretty serious issues.].

And then there are limitations inherent to the design.  We really only know how to use greasing where implementations are required to ignore unknown values.  Not all protocol extension points use that sort of extension model.

Greasing is being included in QUIC and HTTP/3, so the hope is that we&#39;ll continue to learn more.
--&gt;
&lt;h2 id=&quot;version-negotiation&quot;&gt;Version Negotiation&lt;/h2&gt;
&lt;p&gt;The subject of the discussion today was version negotiation.  Of all the extension points available in protocols, the one that often sees the &lt;em&gt;least&lt;/em&gt; use is version negotation.  A version negotiation mechanism has to exist in the first version of a protocol, but it is never really tested until the second version is deployed.&lt;/p&gt;
&lt;p&gt;No matter how carefully the scheme is designed&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;, the experience with TLS shows that even a well-designed scheme can fail.&lt;/p&gt;
&lt;p&gt;The insight for today, thanks largely to Tommy Pauly, was that the observation about extension points could be harnessed to make version negotiation work.  Tommy observed that some protocols don’t design in-protocol version negotiation schemes, but instead rely on the protocol at the next layer down.  And these protocols have been more successful at avoid some of the pitfalls inherent to version negotiation.&lt;/p&gt;
&lt;p&gt;At the next layer down the stack, the codepoints for the higher-layer protocol are just extension codepoints.  They aren’t exceptional for the lower layer and they probably get more use.  Therefore, these extension points are less likely to end up being ossified when the time comes to rely on them.&lt;/p&gt;
&lt;h3 id=&quot;supporting-examples&quot;&gt;Supporting Examples&lt;/h3&gt;
&lt;p&gt;Tommy offered &lt;a href=&quot;https://github.com/intarchboard/edm/issues/8#issue-759871255&quot;&gt;a few examples&lt;/a&gt; and we discussed several others.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc8200&quot;&gt;IPv6&lt;/a&gt; was originally intended to use the IP &lt;a href=&quot;https://en.wikipedia.org/wiki/EtherType&quot;&gt;EtherType&lt;/a&gt; (0x0800) in 802.1, with routers looking at the IP version number to determine how to handle packets.  That didn’t work out&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.  What did work was assigning IPv6 its own EtherType (0x86dd).  This supports the idea that a function that was already in use for other reasons&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt; was better able to support the upgrade than the in-protocol mechanisms that were originally designed for that purpose.&lt;/p&gt;
&lt;p&gt;HTTP/2 was floated as another potential example of this effect.  Though the original reason for adding &lt;a href=&quot;https://tools.ietf.org/html/rfc7301&quot;&gt;ALPN&lt;/a&gt; was performance - we wanted to ensure that we wouldn’t have to do another round trip after the TLS handshake to do &lt;a href=&quot;https://tools.ietf.org/html/rfc2817&quot;&gt;Upgrade&lt;/a&gt; exchange - the effect is that negotiation of HTTP relied on a mechanism that was well-tested and proven at the TLS layer&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;We observed that ALPN doesn’t work for the HTTP/2 to &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-http.html&quot;&gt;HTTP/3&lt;/a&gt; upgrade as these protocols don’t share a protocol.  Here, we observed that we would likely end up relying on &lt;a href=&quot;https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-svcb-https-02&quot;&gt;SVCB&lt;/a&gt; and the HTTPS DNS record.&lt;/p&gt;
&lt;p&gt;Carsten Bormann also pointed at &lt;a href=&quot;https://tools.ietf.org/html/rfc8428&quot;&gt;SenML&lt;/a&gt;, which deliberately provides no inherent version negotiation.  I suggest that this is an excellent example of relying on lower-layer negotiation, in this case the content negotiation functions provided by underlying protocols like &lt;a href=&quot;https://tools.ietf.org/html/rfc7252&quot;&gt;CoAP&lt;/a&gt; or HTTP.&lt;/p&gt;
&lt;p&gt;It didn’t come up at the time, but one of my favourite examples comes from the people building web services at Mozilla.  They do not include version numbers in URLs or hostnames for their APIs and they don’t put version numbers in request or response formats.  The reasoning being that, should they need to roll a new version that is incompatible with the current one, they can always deploy to a new domain name.  I always appreciated the pragmatism of that approach, though I still see lots of &lt;code&gt;/v1/&lt;/code&gt; in public HTTP API documentation.&lt;/p&gt;
&lt;p&gt;These all seem to provide good support for the basic idea.&lt;/p&gt;
&lt;h3 id=&quot;counterexamples&quot;&gt;Counterexamples&lt;/h3&gt;
&lt;p&gt;Any rule like this isn’t worth anything without counterexamples.  Understanding counterexamples helps us understand what conditions are necessary for the theory to hold.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc3584&quot;&gt;SNMP&lt;/a&gt;, which was already mentioned in the draft as having successfully managed a version transition using an in-band mechanism, was a particularly interesting case study.  Several observations were made, suggesting several inter-connected reasons for success.  It was observed that there was no especially strong reason to prefer SNMPv3 over SNMPv2 (or SNMPv2c), a factor which resulted in both SNMP versions coexisting for years.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There was an interesting sidebar at this point.  It was observed that SNMP doesn’t have any strong need to avoid version downgrade attacks in the way that a protocol like TLS might.  Other protocols might not tolerate such phlegmatic coexistence.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While SNMP clients do include probing code to determine what protocols were supported.  However, as network management systems include provisioning information for devices, it is usually the case that protocol support for managed devices is stored alongside other configuration.  Thus we concluded that SNMP - to the extent that it even needs version upgrades - was closest to the “shove it in the DNS” approach used for the upgrade to HTTP/3.&lt;/p&gt;
&lt;h2 id=&quot;in-practice&quot;&gt;In Practice&lt;/h2&gt;
&lt;p&gt;The lesson here is that planning for the next version doesn’t mean designing a version negotiation mechanism.  It’s possible that a perfectly good mechanism already exists.  If it does, it’s almost certainly better than anything you might cook up.&lt;/p&gt;
&lt;p&gt;This is particularly gratifying to me as I had already begun following the practice of SenML with other work.  For instance, &lt;a href=&quot;https://tools.ietf.org/html/rfc8188&quot;&gt;RFC 8188&lt;/a&gt; provides no in-band negotiation of version or even &lt;a href=&quot;https://tools.ietf.org/html/rfc7696&quot;&gt;cryptographic agility&lt;/a&gt;.  Instead, it relies on the existing content-coding negotiation mechanisms as a means of enabling its own eventual replacement.  This was somewhat controversial at the time, especially the cryptographic agility part, but in retrospect it seems to be a good choice.&lt;/p&gt;
&lt;p&gt;It’s also good to have a strong basis for rejecting profligate addition of extension points in protocols&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;, but now it seems like we have firm reasons to avoid designing version negotiation mechanisms into every protocol.&lt;/p&gt;
&lt;p&gt;Maybe version negotiation can now be put better into context.  Version negotiation might only belong in protocols at the lowest levels of the stack&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/vn/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.  For most protocols, which probably need to run over TLS for other reasons, ALPN and maybe SVCB can stand in for version negotiation, with the bonus that these are specifically designed to avoid adding latency.  HTTP APIs can move to a different URL.&lt;/p&gt;
&lt;p&gt;As this seems solid, I now have the task of writing a brief summary of this conclusion for the next revision of the “use it or lose it” draft.  That might take some time as there are a few &lt;a href=&quot;https://github.com/intarchboard/use-it-or-lose-it/issues&quot;&gt;open issues&lt;/a&gt; that need some attention.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Not electronic dance music sadly, it’s about Evolvability, Deployability, &amp;amp; Maintainability of Internet protocols &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;UDP maybe. UDP is simple enough that it doesn’t have &lt;s&gt;features&lt;/s&gt;/bugs.  Not to say that it is squeaky clean, it has plenty of baggage, with checksum issues, a reputation for being used for DoS, and issues with flow termination in NATs. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://boringssl.googlesource.com/boringssl/&quot;&gt;BoringSSL&lt;/a&gt;, which is now used by a few others, including Cloudflare and Apple. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/rfc6709#section-4.1&quot;&gt;Section 4.1 of RFC 6709&lt;/a&gt; contains some great advice on how to design a version negotiation scheme, so that you can learn from experience.  Though pay attention to the disclaimer in the last paragraph. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;No one on the call was paying sufficient attention at the time, so we don’t know precisely why.  We intend to find out, of course. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;At the time, there was still reasonable cause to think that IP wouldn’t be the only network layer protocol, so other values were being used routinely. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;You might rightly observe here that ALPN was brand new for HTTP/2, so the mechanism itself wasn’t exactly proven.  This is true, but there are mitigating factors.  The negotiation method is exactly the same as many other TLS extensions.  And we tested the mechanism thoroughly during HTTP/2 deployment as each new revision from the -04 draft onwards was deployed widely with a different ALPN string.  By the time HTTP/2 shipped, ALPN was definitely solid. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;There is probably enough material for a long post on why this is not a problem in JSON, but I’ll just assert for now - without support - that there really is only one viable extension point in any JSON usage. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;It doesn’t seem like TLS or QUIC can avoid having version negotiation. &lt;a href=&quot;https://lowentropy.net/posts/vn/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Oblivious DoH</title>
    <link href="https://lowentropy.net/posts/odoh/"/>
    <updated>2020-12-09T00:00:00Z</updated>
    <id>https://lowentropy.net/posts/odoh/</id>
    <content type="html">&lt;p&gt;Today we heard &lt;a href=&quot;https://blog.cloudflare.com/oblivious-dns/&quot;&gt;an announcement&lt;/a&gt; that Cloudflare, Apple, and Fastly are collaborating on a new technology for improving privacy of DNS queries using a technology they call Oblivious DoH (ODoH).&lt;/p&gt;
&lt;p&gt;This is an exciting development.  This posting examines the technology in more detail and looks at some of the challenges this will need to overcome before it can be deployed more widely.&lt;/p&gt;
&lt;h2 id=&quot;how-odoh-provides-privacy-for-dns-queries&quot;&gt;How ODoH Provides Privacy for DNS Queries&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://tools.ietf.org/html/draft-pauly-dprive-oblivious-doh-03&quot;&gt;Oblivious DoH&lt;/a&gt; is a simple &lt;a href=&quot;https://en.wikipedia.org/wiki/Mix_network&quot;&gt;mixnet&lt;/a&gt; protocol for making DNS queries.  It uses a proxy server to provide added privacy for query streams.&lt;/p&gt;
&lt;p&gt;This looks something like:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-graphviz&quot;&gt;digraph ODoH {
  graph [overlap=true, splines=line, nodesep=1.0, ordering=out];
  node [shape=rectangle, fontname=&amp;quot; &amp;quot;];
  edge [arrowhead=none];
  { rank=same; Client-&amp;gt;Proxy; Proxy-&amp;gt;Resolver; }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A common criticism of &lt;a href=&quot;https://tools.ietf.org/html/rfc8484&quot;&gt;DNS over HTTPS&lt;/a&gt; (DoH) is that it provides DoH resolvers with lots of privacy-sensitive information&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.  Currently all DNS resolvers, including DoH resolvers, see the contents of queries and can link that to who is making those queries.  DoH includes connection reuse, so resolvers can link requests from the same client using the connection.&lt;/p&gt;
&lt;p&gt;In Oblivious DoH, a proxy aggregates queries from multiple clients so that the resolver is unable to link queries to individual clients. ODoH protects the IP address of the client, but it also prevents the resolver from linking queries from the same client together.  Unlike an ordinary HTTP proxy, which handle TLS connections to servers&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;, ODoH proxies handle queries that are individually encrypted.&lt;/p&gt;
&lt;p&gt;ODoH prevents resolvers from assembling profiles on clients by collecting the queries they make, because resolvers see queries from a large number of clients all mixed together.&lt;/p&gt;
&lt;p&gt;An ODoH proxy learns almost nothing from this process as ODoH uses &lt;a href=&quot;https://tools.ietf.org/html/draft-irtf-cfrg-hpke-06&quot;&gt;HPKE&lt;/a&gt; to encrypt the both query and answer with keys chosen by the client and resolver.&lt;/p&gt;
&lt;p&gt;The privacy benefits of ODoH can only be undone if both the proxy and resolver cooperate.  ODoH therefore recommends that the two services be run independently, with the operator of each making a commitment to respecting privacy.&lt;/p&gt;
&lt;h2 id=&quot;costs&quot;&gt;Costs&lt;/h2&gt;
&lt;p&gt;The privacy advantages provided by the ODoH design come at a higher cost than DoH, where a client just queries the resolver directly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The proxy adds a little latency as it needs to forward queries and responses.&lt;/li&gt;
&lt;li&gt;HPKE encryption adds up to about 100 bytes to each query.&lt;/li&gt;
&lt;li&gt;The client and resolver need to spend a little CPU time to add and remove the encryption.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cloudflare’s tests show that the overall effect of ODoH on performance is quite modest.  These early tests even suggest some improvement for the slowest queries.  If those performance gains can be kept as they scale up their deployment, that would be strong justification for deployment.&lt;/p&gt;
&lt;h2 id=&quot;why-this-design&quot;&gt;Why This Design&lt;/h2&gt;
&lt;p&gt;A similar outcome might be achieved using a proxy that supports HTTP CONNECT.  However, in order to avoid the resolver from learning which queries come from the same client, each query would have to use a new connection.&lt;/p&gt;
&lt;p&gt;That gets pretty expensive.  While you might be able to tricks to drive down latency like sending the TLS handshake with the HTTP CONNECT, it means that every request uses a separate TCP connection and a round trip to establish the connection&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It is also possible to use something like &lt;a href=&quot;https://www.torproject.org/&quot;&gt;Tor&lt;/a&gt;, which provides superior privacy protection.  Tor is a lot more expensive.&lt;/p&gt;
&lt;p&gt;Using HPKE and a multiplexed protocol like &lt;a href=&quot;https://tools.ietf.org/html/rfc7540&quot;&gt;HTTP/2&lt;/a&gt; or &lt;a href=&quot;https://quicwg.org/base-drafts/draft-ietf-quic-http.html&quot;&gt;HTTP/3&lt;/a&gt; avoids per-query connection setup costs.  However, the most important thing is that it involves only minimal additional latency to get the privacy benefits&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;key-management-in-dns&quot;&gt;Key Management in DNS&lt;/h2&gt;
&lt;p&gt;The proposal puts HPKE keys for the resolver in the DNS&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt;.  The idea is that clients can talk to the resolver directly to get these, then use that information to protect its queries.  As the keys are DNS records, they can be retrieved from any DNS resolver, which is a potential advantage.&lt;/p&gt;
&lt;p&gt;This also means that this ODoH design depends on DNSSEC.  Many clients rely on their resolver to perform DNSSEC validation, which doesn’t help here.  So this makes it difficult to deploy something like this incrementally in clients.&lt;/p&gt;
&lt;p&gt;A better option might be to offer the HPKE public key information in response to a direct HTTP request to the resolver.  That would ensure that the key could be authenticated by the client using HTTPS and the Web PKI.&lt;/p&gt;
&lt;h2 id=&quot;trustworthiness-of-proxies&quot;&gt;Trustworthiness of Proxies&lt;/h2&gt;
&lt;p&gt;Both client and resolver will want to authenticate the proxy and only allow a trustworthy proxy.  The protocol design means that the need for trust in the proxy is limited, but it isn’t zero.&lt;/p&gt;
&lt;p&gt;Clients need to trust that the proxy is hiding their IP address.  A bad proxy could attach the client IP address to every query they forward. Clients will want some way of knowing that the proxy won’t do this&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://lowentropy.net/posts/odoh/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Resolvers will likely want to limit the number of proxies that they will accept requests from, because the aggregated queries from a proxy of any reasonable size will look a lot like a denial of service attack.  Mixing all the queries together denies resolvers the ability to do per-client rate limiting, which is a valuable denial of service protection measure.  Resolvers will need to apply much more generous rate limits for these proxies and trust that the proxies will take reasonable steps to ensure that individual clients are not able to generate abusive numbers of queries.&lt;/p&gt;
&lt;p&gt;This means that proxies will need to be acceptable to both client and resolver.  Early deployments will be able to rely on contracts and similar arrangements to guarantee this.  However, if use of ODoH is to scale out to support large numbers of providers of both proxies and resolvers, it could be necessary to build systems for managing these relationships.&lt;/p&gt;
&lt;h2 id=&quot;proxying-for-other-applications&quot;&gt;Proxying For Other Applications&lt;/h2&gt;
&lt;p&gt;One obvious thing with this design is that it isn’t unique to DNS queries.  In fact, there are a large number of request-response exchanges that would benefit from the same privacy benefits that ODoH provides.  For example, Google this week announced &lt;a href=&quot;https://blog.chromium.org/2020/12/continuing-our-journey-to-bring-instant.html&quot;&gt;a trial&lt;/a&gt; of a similar technology for preloading content.&lt;/p&gt;
&lt;p&gt;A generic design that enabled protection for HTTP queries of any sort would be ideal.  My hope is that we can design that protocol.&lt;/p&gt;
&lt;p&gt;Once you look to designing a more generic solution, there are a few extra things that might improve the design.  Automatic discovery of HTTP endpoints that allow oblivious proxying is one potential enhancement.  Servers could advertise both keys and the proxies they support so that clients can choose to use those proxies to mask their address.  This might involve automated proxy selection or discovery and even systems for encoding agreements.  There are lots of possibilities here.&lt;/p&gt;
&lt;h2 id=&quot;centralization&quot;&gt;Centralization&lt;/h2&gt;
&lt;p&gt;One criticism regarding DoH deployments is that they encourage consolidation of DNS resolver services.  For ODoH - at least in the short term - options for ODoH resolvers will be limited, which could push usage toward a small number of server operators in exchange for the privacy gains ODoH provides.&lt;/p&gt;
&lt;p&gt;During initial roll-out, the number of proxy operators will be limited.  Also, using a larger proxy means that your queries are mixed in with more queries from other people, providing marginally better privacy.  That might provide some impetus to consolidate.&lt;/p&gt;
&lt;p&gt;Deploying automated discovery systems for acceptable proxies might help mitigate the worst centralization effects, but it seems likely that this will not be a feature of early deployments.&lt;/p&gt;
&lt;p&gt;In the end, it would be a mistake to cry “centralization” in response to early trial deployments of a technology, which are naturally limited in scope.  Furthermore, it’s hard to know what the long term impact on the ecosystem will be.  We might never be able to separate the effect of existing trends toward consolidation from the effect of new technology.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I like the model adopted here.  The use of a proxy neatly addresses one of the biggest concerns with the rollout of DoH: the privacy risk of having a large provider being able to gather information about streams of queries that can be linked to your IP address.&lt;/p&gt;
&lt;p&gt;ODoH breaks streams of queries into discrete transactions that are hard to assemble into activity profiles.  At the same time, ODoH makes it hard to attribute queries to individuals as it hides the origin of queries.&lt;/p&gt;
&lt;p&gt;My sense is that the benefits very much outweigh the performance costs, the protocol complexity, and the operational risks.  ODoH is a pretty big privacy win for name resolution.  The state of name resolution is pretty poor, with much of it still unprotected from snooping, interception, and poisoning.  The deployment of DoH went some way to address that, but came with some drawbacks.  Oblivious DoH takes the next logical step.&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This is something all current DNS resolvers get, but the complaint is about the scale at which this information is gathered.  Some people are unhappy that network operators are unable to access this information, but I regard that as a feature. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;OK, proxies do handle individual, unencrypted HTTP requests, but that capability is hardly ever used any more now that 90% of the web is HTTPS. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;Using 0-RTT doesn’t work here without some fiddly changes to TLS because the session ticket used for TLS allows the server to link connections together, which isn’t what we need. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;This also makes ODoH far more susceptible to &lt;a href=&quot;https://en.wikipedia.org/wiki/Traffic_analysis&quot;&gt;traffic analysis&lt;/a&gt;, but it relies on volume and the relative similarity of DNS queries to help manage that risk. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The recursion here means that the designers of ODoH probably deserve a prize of some sort. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;The &lt;a href=&quot;https://github.com/bslassey/ip-blindness&quot;&gt;willful IP blindness proposal&lt;/a&gt; goes into more detail on what might be required for this. &lt;a href=&quot;https://lowentropy.net/posts/odoh/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
</feed>
