There is a perennial question in standards development about the value of the different artefacts that the process kicks out.
One subject that remains current is the relative value of specifications against things like compliance testing frameworks. Reasonable people tend to place different weight on tests, with a wide range of attitudes. In the past, more people were willing to reject attempts to invest in any shared test or compliance infrastructure.
In recent years however, it has become very clear that a common test infrastructure is critical to developing a high quality standard. Developing tests in conjunction with the standardization effort has improved the quality of specifications and implementations a great deal.
Recently, I encountered an example where a standards group deliberately chose not to document behaviour, relying exclusively on the common test framework. Understanding what is lost when this
My experience with compliance testing in standards development is patchy. It might help to describe how these have worked out.
Some of the early projects I was involved in relied on testing being entirely privately driven. This can lead to each team relying almost exclusively on tests they develop internally. Occasional pairwise interoperability testing occurs, but it is ad hoc and unreliable.
This loose arrangement does tend to result in specifications being published sooner. The cost is in less scrutiny, especially when it comes to details, so the quality of the output is not as good as it could be.
This doesn’t mean that there is no compliance testing, but it requires effort. That effort can pay off, as I have seen with WebSockets, FIPS-140, HTTP Caching, and others.
Implement in Parallel
My experience with HTTP/2 was not a whole lot different to those early projects. The major improvement there was the level of active engagement from implementers in developing the specification.
This process did not involve active development of a compliance testing framework, but there were regular interoperability tests. I still remember Jeff Pinner deploying draft -04 to production on twitter.com during a meeting. Not everyone was so fearless, but live deployment was something we saw routinely in the 13 subsequent drafts it took finalize the work.
Good feedback from implementations was key to the success of HTTP/2, which now drives well over half of the HTTP requests in Firefox.
The excellent h2spec came out a little after the release of the specification. It has since become a valuable compliance testing tool for implementers.
Test in Parallel
TLS 1.3 followed a similar trajectory to HTTP/2, with a few interesting twists. Part of the testing that occurred during the development of the protocol was formal verification. For example, a Tamarin model of TLS 1.3 was developed alongside the protocol, which both informed the design and provided validation of the design. Some implementations automated compliance testing based on a tool developed for BoringSSL, which turned out to be very useful.
With QUIC, Marten Seeman and Jana Iyengar developed a framework that automates testing between QUIC implementations. This runs regularly and produces a detailed report showing how each implementation stands up under a range of conditions, some of them quite adversarial. This has had a significant positive effect on the quality of both implementations and specifications.
Overall, I can see no way of going back to anything less. In all cases, tests have been so valuable that there is no way I would go back to a world without them.
Of course, no mention of testing would be complete with remiss here not to mention the excellent Web Platform Tests, which are now critical parts of the process adopted by the WHATWG and some W3C groups. Web Platform Tests are considered a prerequisite for normative specification changes under these processes.
Akin to test driven development, this ensures that new features and changes are not just testable, but tested, before anything is documented. In practice the work continues in parallel, with tight feedback between development, specification, and testing. Shorter feedback cycles means that work can be completed faster and with higher quality.
The Role of Specifications
An obvious question that might be asked when it comes to this process, particularly where there are firm requirements for tests, is what value the specification provides. Given sufficiently thorough testing, it should be possible to construct an interoperable implementation based solely on those tests.
To go further, when specifications consist of mostly code-like constructs and real implementations are open source anyway, the value of a specification seems greatly diminished. As empirical observation of how things actually work is of more value to how they work in theory, it is reasonable to ask what value the specification provides.
As my own recent experience with the Cubic congestion control algorithm taught me, what is implemented and deployed is what matters. The RFC that purports to document Cubic is not really implementable and barely resembles what real implementations do.
So if testing is a central part of the development of new standards and people rely increasingly on tests or observing the behaviour of other implementations, it is reasonable to question what value specifications provide.
A Specification Can Teach
Specification documents often come with a bunch of normative language. Some of the most critical text defines what it means to be conformant, describing what is permitted and what is forbidden in precise terms.
Strictly normative text is certainly at risk from displacement from good testing. But there is often a bunch of non-normative filler in specifications. Though text might be purely informative, it is often of significant value to people who are attempting to understand the specification in detail:
Informative text can motivate the existence of the specification.
Filler can provide insights into why things are.
Notes can point to outcomes results that might not be obvious.
For specifications that are developed using an open process, much of this information is not hidden, but it can be difficult to find. Presenting timely, relevant information to readers is useful in putting things into context.
A Specification Can Capture Other Forms Of Agreement
One of the hardest lessons out of recent standards work has been the realization that many decisions are made with only superficial justification. Developing standards based on shared principles is much harder than agreeing on what happens in certain conditions, or which bit goes where.
Though it might be harder, reaching agreement on principles is far more enduring and valuable. A specification can document that agreement.
Reaching agreement or consensus on a principle can be hard for a variety of reasons:
Dealing with abstractions can be challenging because people can develop different abstract models based on their own perspective and biases. Subtle differences can mean a lot of talking past each other.
Abstractions can also become too far removed from reality to be useful. This might serve you well when filing a patent application, but ultimately we depend on principles being applicable to the current work.
Agreement on principles can be difficult because it forces people to fully address differences of opinion.
Without first addressing disagreements in principle, it is possible that concrete decisions could be consistent with different perspectives. This might not have any immediate effect, but could produce inconsistencies. Some inconsistency can result in real problems, especially if it becomes necessary to rely more extensively on a principle that was in contention.
However hard agreement might be to achieve, a principle that is agreed can inform multiple decisions. Documenting a principle that has achieved agreement can therefore more efficient over time. Documentation can also help avoid application of inconsistent or conflicting principles over time.
Documenting principles does not have a direct normative effect. But a specification offers an opportunity to document more than just conformance requirements, it can capture other types of agreement.
Conformance Test Suites Can Overreach
A problem that can occur with conformance testing is that the tests can disagree with specifications. If implementations depend more on the test than the specification, this can make the conformance test the true source of the definition of what it means to interoperate.
This is not inherently bad. It can be that the tests capture something that is inherently better, because it reflects what people need, because it is easier to implement, or just because that is what interoperates.
Of course, disagreement between two sources that claim authority does implementations a disservice. A new implementation now has to know which is “correct”. Ensuring that deployments, tests, and specifications align is critical to ensuring the viability of new implementations.
The true risk with relying on tests is the process by which conformance tests are maintained. Specification development processes are burdened with rules that govern how agreement is reached. Those rules exist for good reason.
Change control processes for conformance testing projects might not provide adequate protection for anti-trust or intellectual property. They also might lack opportunities for affected stakeholders to engage. This doesn’t have to be the case, but the governance structures underpinning most conformance suites is usually less robust than that of standards.
The exact nature of how specifications are used to guiding the development of interoperable standards is something of a fluid situation. Here I’ve laid out a case for the value of specifications: for the non-normative language they provide, for their ability to capture agreement on more than just normative functions, and for the governance structures that they use. There are probably other reasons too, and likely counter-arguments, both of which I would be delighted to hear about.
I should also point at QUIC Tracker and Kazu Yamamoto has also started work on an h3spec, both of which have made significant contributions too. ↩︎
For example, the development of even a relatively small specification like QUIC involved more than 4000 issues and pull requests, more than 8000 email messages, not to mention all the chat messages that are not in public archives. ↩︎
…or consensus if that is how you spell it. ↩︎
This is perhaps a criticism that might be levelled at the priority of constituencies or text like that in RFC 8890. However, these might be more correctly viewed as meta-principles, or ideals that guide the development of more specific and actionable principles. ↩︎
An example of this might be DNS, where the need for agreement on principles was neglected. As such, the global community has no documented principles that might guide decisions on issues such as having a single global namespace or whether network operators are entitled to be involved in name resolution. Now that encrypted DNS is being rolled out, reflective of a principle that values individual privacy, it is become obvious that people differing views but no shared principles have been coexisting. ↩︎
Not that these too lack opportunities for improvement, but they are the best we have. ↩︎