Next Level Version Negotiation

The IAB EDM Program^[1] met this morning. While the overall goal of the meeting, we ended up talking a lot a document I wrote a while back and how to design version negotiation in protocols.

This post provides a bit of background and shares some of what we learned today after what was quite a productive discussion.

Protocol Ossification

The subject of protocol ossification has been something of a live discussion in the past several years. The community has come to the realization that it is effectively impossible to extend many Internet protocols without causing a distressing number of problems with existing deployments. It seems like no protocol is unaffected^[2]. IP, TCP, TLS, and HTTP all have various issues that prevent extensions from working correctly.

A number of approaches have been tried. HTTP/2, which was developed early in this process, was deployed only for HTTPS. Even though a cleartext variant was defined, many implementations explicitly decided not to implement that, partly motivated by these concerns. QUIC doubles down on this by encrypting as much as possible.

TLS 1.3, which was delayed by about a year by related problems, doesn’t have that option so it ultimately used trickery to avoid notice by problematic middleboxes: TLS 1.3 looks a lot like TLS 1.2 unless you are paying close attention.

One experiment that turned out to quite successful in revealing ossification in TLS was GREASE. David Benjamin and Adam Langley, who maintain the TLS stack used by Google^[3] found that inserting random values into different extension points had something of a cleansing effect on the TLS ecosystem. Several TLS implementations were found to be intolerant of new extensions.

One observation out of the experiments with TLS was that protocol elements that routinely saw new values, like cipher suites, were less prone to failing when previously unknown values were encountered. Those that hadn’t seen new values as often, like server name types or signature schemes, were more likely to show problems. This caused Adam Langley to advise that protocols “have one joint and keep it well oiled.”

draft-iab-use-it-or-lose-it explores the problem space a little more thoroughly. The draft looks at a bunch of different protocols and finds that in general the observations hold. The central thesis is that for an extension point to be usable, it needs to be actively used.

Version Negotiation

The subject of the discussion today was version negotiation. Of all the extension points available in protocols, the one that often sees the least use is version negotation. A version negotiation mechanism has to exist in the first version of a protocol, but it is never really tested until the second version is deployed.

No matter how carefully the scheme is designed^[4], the experience with TLS shows that even a well-designed scheme can fail.

The insight for today, thanks largely to Tommy Pauly, was that the observation about extension points could be harnessed to make version negotiation work. Tommy observed that some protocols don’t design in-protocol version negotiation schemes, but instead rely on the protocol at the next layer down. And these protocols have been more successful at avoid some of the pitfalls inherent to version negotiation.

At the next layer down the stack, the codepoints for the higher-layer protocol are just extension codepoints. They aren’t exceptional for the lower layer and they probably get more use. Therefore, these extension points are less likely to end up being ossified when the time comes to rely on them.

Supporting Examples

Tommy offered a few examples and we discussed several others.

IPv6 was originally intended to use the IP EtherType (0x0800) in 802.1, with routers looking at the IP version number to determine how to handle packets. That didn’t work out^[5]. What did work was assigning IPv6 its own EtherType (0x86dd). This supports the idea that a function that was already in use for other reasons^[6] was better able to support the upgrade than the in-protocol mechanisms that were originally designed for that purpose.

HTTP/2 was floated as another potential example of this effect. Though the original reason for adding ALPN was performance - we wanted to ensure that we wouldn’t have to do another round trip after the TLS handshake to do Upgrade exchange - the effect is that negotiation of HTTP relied on a mechanism that was well-tested and proven at the TLS layer^[7].

We observed that ALPN doesn’t work for the HTTP/2 to HTTP/3 upgrade as these protocols don’t share a protocol. Here, we observed that we would likely end up relying on SVCB and the HTTPS DNS record.

Carsten Bormann also pointed at SenML, which deliberately provides no inherent version negotiation. I suggest that this is an excellent example of relying on lower-layer negotiation, in this case the content negotiation functions provided by underlying protocols like CoAP or HTTP.

It didn’t come up at the time, but one of my favourite examples comes from the people building web services at Mozilla. They do not include version numbers in URLs or hostnames for their APIs and they don’t put version numbers in request or response formats. The reasoning being that, should they need to roll a new version that is incompatible with the current one, they can always deploy to a new domain name. I always appreciated the pragmatism of that approach, though I still see lots of /v1/ in public HTTP API documentation.

These all seem to provide good support for the basic idea.

Counterexamples

Any rule like this isn’t worth anything without counterexamples. Understanding counterexamples helps us understand what conditions are necessary for the theory to hold.

SNMP, which was already mentioned in the draft as having successfully managed a version transition using an in-band mechanism, was a particularly interesting case study. Several observations were made, suggesting several inter-connected reasons for success. It was observed that there was no especially strong reason to prefer SNMPv3 over SNMPv2 (or SNMPv2c), a factor which resulted in both SNMP versions coexisting for years.

There was an interesting sidebar at this point. It was observed that SNMP doesn’t have any strong need to avoid version downgrade attacks in the way that a protocol like TLS might. Other protocols might not tolerate such phlegmatic coexistence.

While SNMP clients do include probing code to determine what protocols were supported. However, as network management systems include provisioning information for devices, it is usually the case that protocol support for managed devices is stored alongside other configuration. Thus we concluded that SNMP - to the extent that it even needs version upgrades - was closest to the “shove it in the DNS” approach used for the upgrade to HTTP/3.

In Practice

The lesson here is that planning for the next version doesn’t mean designing a version negotiation mechanism. It’s possible that a perfectly good mechanism already exists. If it does, it’s almost certainly better than anything you might cook up.

This is particularly gratifying to me as I had already begun following the practice of SenML with other work. For instance, RFC 8188 provides no in-band negotiation of version or even cryptographic agility. Instead, it relies on the existing content-coding negotiation mechanisms as a means of enabling its own eventual replacement. This was somewhat controversial at the time, especially the cryptographic agility part, but in retrospect it seems to be a good choice.

It’s also good to have a strong basis for rejecting profligate addition of extension points in protocols^[8], but now it seems like we have firm reasons to avoid designing version negotiation mechanisms into every protocol.

Maybe version negotiation can now be put better into context. Version negotiation might only belong in protocols at the lowest levels of the stack^[9]. For most protocols, which probably need to run over TLS for other reasons, ALPN and maybe SVCB can stand in for version negotiation, with the bonus that these are specifically designed to avoid adding latency. HTTP APIs can move to a different URL.

As this seems solid, I now have the task of writing a brief summary of this conclusion for the next revision of the “use it or lose it” draft. That might take some time as there are a few open issues that need some attention.

Not electronic dance music sadly, it’s about Evolvability, Deployability, & Maintainability of Internet protocols ↩︎
UDP maybe. UDP is simple enough that it doesn’t have ~~features~~/bugs. Not to say that it is squeaky clean, it has plenty of baggage, with checksum issues, a reputation for being used for DoS, and issues with flow termination in NATs. ↩︎
BoringSSL, which is now used by a few others, including Cloudflare and Apple. ↩︎
Section 4.1 of RFC 6709 contains some great advice on how to design a version negotiation scheme, so that you can learn from experience. Though pay attention to the disclaimer in the last paragraph. ↩︎
No one on the call was paying sufficient attention at the time, so we don’t know precisely why. We intend to find out, of course. ↩︎
At the time, there was still reasonable cause to think that IP wouldn’t be the only network layer protocol, so other values were being used routinely. ↩︎
You might rightly observe here that ALPN was brand new for HTTP/2, so the mechanism itself wasn’t exactly proven. This is true, but there are mitigating factors. The negotiation method is exactly the same as many other TLS extensions. And we tested the mechanism thoroughly during HTTP/2 deployment as each new revision from the -04 draft onwards was deployed widely with a different ALPN string. By the time HTTP/2 shipped, ALPN was definitely solid. ↩︎
There is probably enough material for a long post on why this is not a problem in JSON, but I’ll just assert for now - without support - that there really is only one viable extension point in any JSON usage. ↩︎
It doesn’t seem like TLS or QUIC can avoid having version negotiation. ↩︎