Low Entropy

RFCs in HTML

I spend a shocking amount of my time staring at IETF documents, both Internet-Drafts and RFCs. I have spend quite a bit of time looking at GitHub README files and W3C specifications.

For reading prose, the format I routinely find to be the most accessible is the text versions. This is definitely not based on the quality of the writing, all of these formats produce unreadable documents. What I refer to here is not the substance, but the form. That is, how the text is laid out on my screen[1].

There is clearly a degree of familiarization and bias involved in this. A little while ago, I worked out that there is just one thing that elevates that clunky text format above the others: line length.

Relearning Old Lessons

This is hardly a new insight. A brief web search will return numerous articles on the subject[2]. All of them say the same thing: shorter lines are more readable.

I was unable to find a single print newspaper that didn’t take this advice to heart, if not to extremes[3]. Some magazines have ignored this, but those too turned out to be ill suited to reading prose and more geared toward looking at the pictures.

Recommendations from most sources put a hard stop somewhere around 80 characters. Some go a little lower or higher, but the general advice is pretty consistent. Of course, variable-width fonts make this imprecise, but it tends to average out.

Why Text Is So Good

I suppose that it is no accident that this corresponds to the width of the screen on a DEC 52. The text format of old RFCs[4] might have been to fit on these small screens, or it might have been to make printing easier, but the net effect is that you can get just 72 characters on a line. The standard tools spend three of those on a left margin for text, so that means just 69 fixed-width characters per line.

That turns out to be very readable.

Why HTML Is So Bad

The “official” HTML renderings of RFCs on rfc-editor.org is a little wider than this. If I measure using whole alphabets[5], this results in a width of 98 characters. That’s more than the maximum in any recommendation I found.

Performing a similar test on the W3C specification style[6] used for W3C publications, I got 102 characters. The WHATWG Fetch Standard had room for a massive 163 characters!

All of these wrap earlier than this on a smaller screen, but these are relatively small font sizes, so many screens will be wide enough to reach these values. Many people have a screen that has the 1300 horizontal pixels[7] needed to get to 100 characters in a W3C specification. The official IETF HTML crams its 98 characters into just 724 pixels.

High text density comes from the font size and line height being quite small in official renderings of IETF documents. This compounds the problem as it makes tracking from one line to the next when reading more difficult. I consider the 14px/22.4px of the official IETF rendering to be positively tiny. I use a 9px (monospace) font in terminals, but I wouldn’t inflict that choice on others. That W3C and WHATWG settled on 16px/24px is far more humane, though with the selected font I still find this a little on the small side.

What is interesting here is that the text rendering on tools.ietf.org uses a value of 13.33px. This seems smaller, but - at least subjectively - it is no harder to read than the 16px W3C/WHATWG specifications. Also, the default font configuration in Firefox is 16px for most fonts and 13px for monospace, suggesting that smaller font sizes are better tolerated for monospace fonts. That’s especially convenient here as it happens.

Making HTML Readable

The fix is pretty simple, make the max-width small enough that lines don’t run so long. I set a value of 600px. Combine this with a font size of 16px and the result is a line length of 72[8].

The editor’s copy of the QUIC spec is a fairly thorough example of this.

Fonts

I chose to change the font to something that is a little wider at the same time. Using Arial - the default sans-serif font on Windows and the font chosen by the W3C and WHATWG - adds 4-5 characters to line length and is noticeably smaller on screen. Times New Roman - the default serif font - adds 9-10 characters and is smaller again.

Lora, which has a light serif, was my choice for text. I know little enough about fonts that this was ultimately subjective. Noto Sans, the font used in IETF official renderings, is comparable here, but I find it a little boring.

Some people don’t like the visual noise of a serif font for reading on a screen. Modern displays with high pixel density are less vulnerable to that and this is a light font with enough serif noise to add a little flair without adversely affecting readability. Lora is very readable at 16px, where many other serif fonts require a larger size to be similarly clear.

Headings

Fitting headings on a single line given the shorter line length turned out to be fiddly. I didn’t want headings to wrap, or to use too small a font. And IETF people have a deep and abiding love for very long headings. For this, a condensed font was ideal.

A semi-condensed font might have been ideal, but there are fewer of those and it was a little hard to find one that didn’t look too jarring next to the main text[9]. Again Google Fonts was a great resource and Cabin Condensed is OK.

ASCII Art

In setting this size, it is then necessary to consider the effect on diagrams. IETF documents are still stuck in the dark ages when it comes to diagrams and ASCII Art still dominates there. As the text format accepts 72 column text, so too must the figures in the HTML output.

This turns out to be a bit of a compromise. Styling of figures to include an offset from text, a border, and background shading eats up horizontal space. In the end, I managed to reduce the text size to 13.5px and set letter-spacing: -0.2px to slightly compress the text further and fit 72 columns in[10].

Minimizing Distractions

The styles used here are based on those from an earlier version of the official renderings. Once the major pieces were in place, the details need to be aligned to fit. After fixing major items like margins and line heights to match font and size choices, a bunch of work is needed to make documents look consistent. The first task was removing a bunch of design elements that I found distracting.

The HTML rendering includes a pilcrow at the end of each paragraph. This enables linking to specific paragraphs, which is a great feature.

The official styling only renders the pilcrow when the paragraph is hovered[11], but it renders very strongly when shown and so can be distracting. That needed softening.

The default blue (#00f) for links is strongly saturated, which is too assertive. Reducing the saturation makes links blend into text better.

Changing background colours on hover for titles is a nice way of indicating the presence of links, but that too was very strong. Making that lighter made moving the mouse less of a light show.

Cleanup

Then there was a bunch of maintenance and tidying:

Finally, a bunch of work was put into making this look decent on a small screen.

Conclusion

What I’ve learned from this is a newfound respect for the work designers do. My amateur fumbling here has helped me appreciate just how many detail work goes into making something like this look good.

Immense thanks are owed to Anitra Nottingham, who graciously provided feedback on earlier versions of this work. Those versions were obviously much worse. I also owe thanks to Mark Nottingham, James Gruessing, Adam Roach, Jeffrey Yasskin and those I’ve forgotten who each took the time to provide feedback and expertise.

None of this is truly professional. I’m still finding things that I don’t like. I’m still not happy with various pieces of spacing, for instance.

Even learning this much design is more of a curse than I’d like. I might not ace cantunsee, but I know enough to notice things like alignment issues and bad kerning[13] now. I’m not sure that that has enriched my life all that much.

But the main thing remains: I can read these documents now. Cutting the line length was what did that. I now prefer HTML if it uses this stylesheet[14]. The rest was just gravy.

The stylesheet can be found here. Contributions are welcome. Anyone using my GitHub template for generating Internet-Drafts already benefits from this work.


  1. Reading from paper is not something I can countenance; the cost of in paper on my specification reading alone would be devastating and I like tree too much to do that to them. ↩︎

  2. And those are just links from my browsing history ↩︎

  3. So many hyphens… ↩︎

  4. Officially, they are all XML now and only rendered to text or HTML. ↩︎

  5. The way to do this is to find a paragraph and open it in browser developer tools. Add a style rule of overflow: hidden then modify the content to be “abcdef…” and repeat until the text cuts off. This follows the advice in Butterick’s Practical Typography. ↩︎

  6. I tested the Push API, which uses ReSpec, but specifications using Bikeshed produced exactly the same result. ↩︎

  7. Using the browser measure for pixel, which doesn’t correspond to dots on screen for devices with high pixel density. ↩︎

  8. I wasn’t going for this deliberately, but that is how it worked out. ↩︎

  9. In particular, I have this thing about the shape of ‘e’ and ‘a’. They can’t be dramatically different. ↩︎

  10. The need for packing this tightly came when I discovered that pilcrows for figures were possible, but the official rendering put them on a blank line. That broke the document flow badly and I wanted space for those on the line as well. See this example for how that turned out. ↩︎

  11. Is this an accessibility problem? I don’t know. ↩︎

  12. I’ve learned that with CSS, like many other things, can lend itself easily to making small hacks. The net effect of introducing a hack is invariably that you have to add a whole bunch more corrective hacks in a death spiral. Avoid hacks. ↩︎

  13. My 9 year old son finds signs for this real estate company, which seem deliberately bad, amusing. It’s clearly infectious. ↩︎

  14. Mark Nottingham has a different stylesheet that is also acceptable. He also uses a very nice font. ↩︎