
The handling of widowed headings across pages in Apple Books is of particular concern. Since 1997, CSS has had properties to handle this, and yet browsers including Safari and Firefox still don’t…
The physical copies of my book on Web Typography sold out quickly. I self-published, and print runs are expensive when you’re funding them yourself, so numbers were limited. However it was always my plan to publish an ebook at the same time, and that has out-sold the hard copy by an order of magnitude.
I set myself some pretty stiff criteria for the ebook – it needed to replicate the design of print edition as far as possible, adapting to the medium when required. To this day I’m proud of the result. I completely hand-coded the ePub (meaning it’s mostly HTML and CSS under the hood), and I believe the effort paid off. If you’ll forgive the rather un-British boasting, I still think it’s one of the more advanced ebooks out there: with embedded fonts, SVG images, alt text, bold typographic heirarchy, Javascript-driven syntax highlighting and what I hope is a nuanced, highly readable overall design. Not bad for an ebook anyway, although I’ll grant you the bar is not set high (notable exceptions include A Book Apart publications).
All hubris aside, I am still frequently embarrassed by how the ebook renders, particularly in Apple Books. Like a well structured webpage, my book uses a lot of headings and subheadings – I wrote it to be referenced as much as to be read, so this helps the scanability of the text. However Apple Books, and other WebKit, Gecko, or old Blink-powered ebook readers will happily do this to headings:

Notice the orphaned heading “Lean on six centuries of typesetting experience” with its following paragraph out of sight on the next page. This is a typographic no-no, and has been for – um – six centuries. Far better for the reader to have the heading attached to its paragraph on the next page, even if that means leaving some redundant whitespace in its place.
Since 1997(!) and the early drafts of CSS2, there has been an easy way to tell browsers not insert a page break directly after, or in the middle of, a heading:
h2 {
page-break-after: avoid;
page-break-inside: avoid;
}
h2 {
break-after: avoid;
break-inside: avoid;
}
However 26 years later, break-after:avoid is still not supported by either Safari or Firefox, and was only introduced to Chrome 108 in December 2022. I’ve put together a test for support of break-after and break-inside in multi-column layout. Have a play with it in Chrome – try removing break-inside:avoid and then break-after:avoid from the h2 rule in the CSS and you should see how the subheadings end up at the bottom of a column, or worse still, split over two columns.
Browser support for CSS properties tends to follow demand from web developers. Unlike in 1997 – or indeed 2017 – there is now an annual Interop arrangement between browser rendering engine makers in which they agree a common list of priorities for CSS and other web technologies. Interop 2024 has just closed for new proposals. Unfortunately I didn’t manage to submit a request in time for breaking controls to be universally implemented. Thankfully Scott Kellum of Typetura did put in a proposal for advanced multi-column layouts to be improved, and this included support for break- properties. Sadly there’s little to no clamour for it from other developers – the blog post you’re reading probably doubles the published demand, and that’s just for within columns.
Update: Annoyingly the proposal was not selected for Interop 2024. I'll just have to keep prodding the bug reports and keep my fingers crossed they are fixed soon – these bugs are older than some of my colleagues!
Paged media is very much a forgotten aspect, and it’s probably true that web pages are rarely printed in the grand scheme of things, however ebooks are definitely a popular form of paged media and deserve attention. I’d certainly like to read ebooks without failed typographic fundamentals.
While everyone here seems to talk about the epub angle to the story, there's also simply the deeper story here, that "the web's" handling of paged media and the CSS paged media specs (to which his epub problem is related) is a never ending shitshow. Not only for epubs, for everybody who actually wants to print to real paper, too, ideally with a working cross browser solution.
Mistake is largely not in the specs, but in the lack of support for them. Page breaking controls, weirdly breaking tables, lack of access to area outside the page box to influence headers/footers without weird hacks etc. etc. For printing, the 1990ies never ended.
This leads to the bizarre situation where basically everyone who has semi complex printing needs in web applications will create PDF and then print that - and for creating those PDFs, often HTML to PDF conversion is used, just with actually implemented CSS for paged media. Which again proves that the spec is at least 99% there, if somebody would just kindly implement it in a browser, too.
Won't be more complex than having the latest WebGL whatever thing in your browser engine ;-)
Some months ago I wanted to format/print some documents, and given the existing tooling I had I decided to try the html->pdf route. I fully agree is a shitshow. The way things break across pages is hard to fix even when hand-tuning the html itself (not just by working it around with css) to avoid content being cut across margins and pages no matter what. I've found chrome to be "less bad", but still unusable. Column handling is even a bigger joke.
In the end I exported the document to libreoffice, and got something way more usable in a few hours just by editing the styles than whatever I was able to do in days of fiddling with html+browser.
iBooks on apple might get a pass as it doesn't need to paginate, but truth be told it seems that epub/ebooks and ereaders in general are being targeted at novels and romance, where form factor, typesetting and formatting doesn't matter that much.
I have access to ebooks through my local library and there's no way I would use, let alone buy, any technical ebook.
Not to mention, I've seen a steady average decline in the quality of printed media in general over the last ~15 years. A lot less attention is put in the typesetting and layout. Even the print quality itself is lower, which I think is due to the smaller and cheaper print runs being done now also for more popular titles.
I thought book quality started going downhill circa 1990.
I am a fan of the old mass market paperbacks. These had a reputation of being low quality books back in the day because they are cheap and not super-durable but I think they are high quality from a Deming point of view because they are made by a process that is highly repeatable. Circa 2000 I thought my 1970s paperbacks were in great shape, but 2010 they were seriously yellowing.
I just looked at my bookshelf and found a '59 James Blish anthology that I bought for 50 cents maybe ten years ago, it is in "poor" condition and will probably crack if I read it without taking great care. Next to that I found a copy of Galbraith's The Affluent Society from 1958 which is perfectly usable except I'd be worried about the cover coming off. A Frank Herbert book from '68 is stained but in great shape other than the cover also being at risk. A '74 Herbert book is a touch discolored but has no problems at all.
(My collection includes not just science fiction of that era but also both self-help and serious books on psychology as well as books about science, politics, social sciences, etc. Government reports about inflation or race relations would be published as mass market paperbacks. You could get Plato and Sartre and Freud and the rest of the Western literary heavyweights)
The construction, materials, process, and such were repeatable enough that they even fail consistently. Not permanent, but 50 years is not bad. The right size to go in a purse or side pocket of a backpack (e.g. part of the loadout of a bibliomaniac who has 12 books in his backpack) I've got to find a good way to reinforce the cover (adhesive tape?)
Those are no longer produced, today it is trade paperbacks. There is wide variation in the dimension, construction, materials and processes for these. You sometimes find a trade paperback that is beautiful, strongly constructed and printed on acid free paper. Others you pay $50 for and the binding breaks the first time you lay the book open on the table.
> adhesive tape?
Don't - it yellows too.
I use thin 2" tape to wrap the corners and the oldest one is probably 10 years old and no yellowing.
Might depend on the age and or brand of the tape - I've seen old tape (30+ years maybe) that has yellowed. I have a 15 years old book at home with some tape and it's okay, except for the tape that wasn't in contact with the book (which is yellowed).
plenty of high quality books are being printed in indie RPG community.
Suggestions for better quality paper providers?
[flagged]
I think being able to format a page just for printing, esp. with HTML/CSS itself is a killer feature and is gigantically underestimated.
I understand that printers are Satan incarnate and runs on concentrated sins of cost cutting engineers, and nobody has time to read that 1.5 page article someone wrote by giving proper effort, but scientific articles, books, nice blog posts we want to delve in, etc. are real and regardless on the substance they run on, printers are real things and they are used nevertheless.
The tendency to assume that everyone is running on high end laptops and cutting edge, network connected tablets is making me angry sometimes. Implementing features like this will make many programmers' life easier who need to be able to generate and print reports from their web applications, too.
It's not only about books and shrewd people who want to print a blog post and study/read that on a table or wherever.
Completely agree - I wrote a book that had some specific layout requirements in HTML, and while it was easier to get something up and running than LaTeX, getting the printing part right was very painful (not least of all because no browsers seem to support things like page numbering).
Another frustrating thing with ebooks is that you can't get them in PDF format any more. So much time is spent making a nicely fomatted hardcopy edition, then the ebook is only available as a terribly auto-converted epub that throws away all the layout and style. Particularly cookbooks, as well as anything technical, I just can't stand how lazy, ugly, and difficult to read the epubs are. All the tooling already exists to produce PDFs identical to the print version, but no, we can't have those.
For me, it's the opposite. Whenever i have the choice, i want EPUB, not PDF. The problem with PDFs is that you don't have a device that is the same size as the original page size in most cases.
It depends on what kind of book it is. Fiction EPUB all day. Technical books with figures, tables or code listings, PDF!
Exactly! EPUBs are great for novels or anything you just want to read line-by-line
They're really great for anything I want to read on my phone, including technical books.
I'll take a slightly-less-great ePub over a PDF I have to scroll around or use terrible reflow heuristics via some reader any day.
Really, the point is, why can't we have both? I use and enjoy epubs extensively, but also in many contexts I strongly prefer PDF.
Getting a passable PDF from an ePub is probably significantly easier than the reverse, so I'm all for having both, yes! (And please don't charge me twice for the privilege, publishers.)
Each format has its strengths
PDFs on smaller screens - it’s like wrestling with the page just to read a single line
Another problematic aspect is if one has poor eyesight and wishes to use a larger font size. One ends up having to scroll horizontally for each line of text for single column pages. For two-column pages, one has to scroll back a page after reading the first column.
Sometimes one can use a landscape-oriented display to avoid horizontal scrolling, but even if the same word count fits on the screen I seem to be annoyed by the low line count.
Providing large type and huge type PDFs would not entirely solve this problem as sometimes even one with poor eyesight might prefer a smaller font for scanning/skimming. Having to acquire two PDFs and switch between them based on mode of use seems suboptimal.
Fixed paged presentation has significant advantages for familiar reference material; some people seem to have a spatial memory that makes finding specific content by flipping pages faster than trying several search phrases (with the occasional benefit of serendipity).
Poetry often benefits from not reflowing lines and page breaking within a stanza is often more jarring than within a paragraph of prose. Yet a reader might prefer inferior typography over having to use a magnifying glass or carry a very large display.
One might be able to get some of the advantages of paged media for figures and tables by having header (or footer) pop-up links to such content when it is on the same "page" as the displayed text. This is not as low effort as moving one's eyes, but it might be better than inlined presentation on a small (relative to font size) display.
Even with print, there would be times when breaking the text to fit a figure is more disruptive than having the figure on a separate page. There would also be times when all the relevant figures would not fit on the same page as the related text.
Having a separate booklet of illustrations might make going back and forth between text and illustration easier, similar to having a lexicon or commentary open while reading. However, that also introduces position tracking in another book and other inconveniences.
Even when my vision was better, reading academic papers distributed as PDFs (usually 2-column) on a computer screen was less enjoyable than reading similar material in a reflowable format. Academic papers also do not seem to benefit as much from pagination as other writings.
One question: Why do people make PDFs in A4 format? Wouldn't it make better sense to start making them in A5 or A6, so that they could be better read on e-readers, phones, and on part of a computer screen (which is landscape oriented)?
Adapting PDFs for devices won't save us. HTML, being designed around reflow, had the ultimate solution from day one - and yet we've managed to screw that up so badly it spawned a whole industry sub-specialty of "responsive design". When authors start producing multiple PDF versions for different devices and print, how long until someone gets tired of "extra work" and comes up with "responsive PDFs"?
(Also I feel that by default, non-book PDFs tend to show up in the US "Letter" size, which looks deceptively similar to A4, until you try to print it.)
> how long until someone gets tired of "extra work" and comes up with "responsive PDFs"?
I hate to break it to you but, https://blog.developer.adobe.com/adobe-sensei-makes-responsi...
Yes! This is what I keep complaining about! HTML likewise solved accessibility, but then it goes right through the cycle: someone extends it in a way that requires a special reader, then they focus on people with that reader at the expense of everyone else. Unless you stop the cycle from happening, going to a new format doesn't help!
Like you mention, HTML already exists for adaptive text reflow. I assume that people making PDFs want their layouts fixed. But maybe an A5 format would make more sense, even if you're printing it?
Also: What did people screw up with HTML in your opinion?
> Also: What did people screw up with HTML in your opinion?
The problem with PDFs is that you need to create multiple layouts to make them look good in print and on a variety of commonly used screen sizes; all those layouts is extra work. HTML, by its very nature, doesn't have this problem, and yet somehow today we still have to design multiple layouts to support print and common screen sizes. And in practice, we usually don't - instead, we design one layout optimized for mobile phones, and ignore how bad lit looks on desktop or in print. "Responsive web design" turned into forcing HTML to behave like a PDF, except using "iPhone" instead of "A4" as the size.
If you make your PDFs in A5, you can print two of them on an A4 paper and read the paper in landscape orientation. For the same reasons the size fits well for displaying on a computer screen and on a tablet/e-reader. It's still a bit too big to squeeze down to a cell phone, but at least better than A4/Letter size.
As for responsive HTML, it's the responsibility of the designer to make it work if he/she is worth their salt. Like you say, HTML without CSS is already responsive. If businesses understood that there are a big segment of customers who will always use their computer and never their phone when it's time to make a purchase, perhaps they'd be better at it.
Constant zooming and scrolling just to read a single page
Surely the only difference between an A4 PDF and an A6 one would be text size?
I think the point is that if you are designing with that size in mind you may make different decisions about the column layout etc.
I would have thought that a PDF of a book would normally be made in the size of the physical book, which could be A4, but usually isn't (at least not when I look at my bookshelves).
Text size decides everything: paragraph size, heading breaks, figure placement, etc.
Probably also column layout. 2-column documents are fine for A4 sizes, but terrible for A6 or most e-reader screen sizes. Scroll down, then up & across, then down, then across, then repeat. Versus just scroll down or just turn pages.
Depends which parameter you choose to hold fixed. You could shrink your text and keep the layout and page count or keep your text size fixed and increase the page count. If people were doing layout for a fixed A5 or A6 size they will probably make many different choices compared to laying out for A4.
Yes, exactly. That's what matters.
PDFs are in Letter (rarely A4) format, quite simply, to be printed on Letter paper :) The computer screen view is secondary.
> PDFs are in Letter (rarely A4)
You must be from North America. In the rest of the world, it’s always A4. I encounter A4 PDFs fairly often, but don’t know how long it would be since I encountered Letter, but easily years.
Yes, but today most of them never exit cyberspace. Wouldn't it be more reasonable to consider that instead of printing?
Edit: Also, are there any advantages with large papers like A4/Letter for physical prints, except that you can fit more on a single page?
It’s as simple as most consumer printers printing A4/Letter, and most paper being A4/Letter.
The more I think about it, the more I'm getting convinced that A4/Letter was a mistake. Maybe we'll see something like A5 as a standard in the future, that would be neat.
Part of this is that books and screens are not interchangeable pieces of technology. Books are still supreme when it comes to reference material and lookup speed but each page needs static typesetting. Screens are more fragile, expensive, generally smaller, with lower contrast and/or resolution, but they allow a fully flexible display with variable font sizes, the ability to scroll half way between pages etc. A PDF on a screen is all of the downsides of books with none of the upsides of screens.
I agree, I absolutely hate reading basically everything other than novels/non-fiction narratives in epub. All the work they did laying out the pages is just thrown away! Trying to read any sort of instructional book as an epub is straight up infuriating.
I work for a children's reading platform, and the book publishers universally send us PDFs of everything. We have a bespoke system to convert the PDFs into SVG for higher flexibility and added interactivity.
Literal Kids board books are getting a better treatment!
It really shows that when there’s care taken with formatting, it makes a huge difference in how engaging and accessible the material is
I think it's a great point!
Perhaps authors could also produce PDFs designed for common tablets as well, and therefore get the exact expected format.
I do agree with the author that paged format are difficult with browsers to this day, and I also hope this can improve.
I love the PDF format in many ways. Its adoption is widespread, so you can send it with confidence. It's an ISO standard, "self-contained" (a single file, unlike HTML). However, something I’d love for the specification to incorporate is responsive design, which would significantly improve accessibility given the proliferation of mobile devices.
Not to mention the ugly/unusable rendering of mathematical formulate in ebooks on my Kindle, which is gatherig dust.
Layouting is an art and a craft, and the fact that it's automated by people who lack the specialized knowledge, or for whom it is not a priority (quarter century old bug reports, really?) suggests that in 2025, you should still avoid ebooks if you care about quality and aesthetics.
This is a shame because e-ink is just becoming usable. Anyhow, long live the paper book!
Have you tired KOreader[0]? It supports multiple ebook formats, including epub and cbz. You'll need to jailbreak[1] your Kindle though.
[0] http://koreader.rocks/ [1] https://github.com/notmarek/LanguageBreak
For anyone who might want to jailbreak their Kindle in the future, you'll want to enable airplane mode otherwise it will automatically update its firmware (patching the jailbreak) and there's no way to disable that.
It'll keep updating itself as long as it's powered on, even if you haven't used it in months and there's no telling how long it'll take for current firmware versions to be supported, latest jailbroken version is 17 months old.
https://kindlemodding.gitbook.io/kindlemodding/getting-start...
Blame the (metal) compositor unions which back in the day bargained for sinecures where all their members were guaranteed perpetual employment rather than choosing to participate in the digital revolution.
Fortunately, some folks did work to preserve the craft and beauty of books --- Dr. Donald Knuth taking a decade off from writing _The Art of Computer Programming_ to create TeX (though initially he thought he'd do it over a sabbatical) is one shining example.
Robert Bringhurst's authoring _The Elements of Typographic Style_ also made a huge difference (I've lost count of how many copies I've given as gifts to folks).
A further issue is that doing a good page layout over an entire chapter (or book if the pagination is continuous) is an NP-hard problem --- I've had a chapter come out correctly on a first pass exactly once in my career (fastest 40 minutes of my life). The usual work-flow is something like:
- check all characters to ensure that hyphens are properly set, en and em dashes replace them where appropriate, and correct the setting of any instances of what should be special characters such as prime or double primes
- assign all formatting and ensure that all heads and paragraphs have settings which will forbid widows/orphans and verify that the callouts for all figures/photos/tables are correct
- review the entire chapter from beginning to end, page by page, verifying that each ends as it should at the bottom of the page, and that a referenced element shows on that page spread
- for instances where things don't work out, check to see which paragraphs can be adjusted to run longer or shorter by one or more lines, adjusting this until one finds a set of adjustments which results in a proper appearance for the page/spread --- repeat for all future pages --- if a particular spread/figure placement is a problem, back up and see if changing previous pages will fix it --- check the last page to ensure that it is full enough, if not, adjust previous spread, if that doesn't work, see if running the entire chapter long or short by a line will fix it.
- review the entire chapter again to ensure that there are no bad breaks or stacks, add discretionary hyphens or non-breaking spaces or adjust paragraph settings as necessary, ensuring that pages still base-align
If someone wants to write an ePub reader or page formatter which can do that, I'd be glad to see it.
Fascinating, but as an ebook consumer my standards are quite a bit lower. I’m happy if the relevant figures are on the same page as the text (but that’s important), and the spacing is not absolutely awful.
If this is is still a problem thirty years after the invention of the web, then I say: So much the worse for mathematical notation. In the future, mathematical ideas will be expressed in other ways.
Mathematical notation, even when considering all its faults, won't be easy to replace.
Simple math, maybe. But, for anything complex, any other symbol would require completely different ways of being expressed, if your aim is to make it more readable for newcomers that is.
You severely underestimate mathematicians' ability to cling to older more convenient forms of expression.
I imagine blackboards and chalk will be used in advanced mathematics for a few centuries yet.
You could include the formulae as jpg's, not html.
JPEG is the absolute worst possible solution here. If MathML or similar is not supported, use an SVG or PDF so that it's zoomable and not made of pixels. It's also slightly readable by screen readers (although you probably want some sort of alt-text for those anyway).
If no vector formats are supported, use PNG, or another "lossless" format not JPEG. JPEG's compression is designed for photos where the probability of 2 neighbouring pixels being the same is tiny. Note that PNG doesn't have to be lossless - if you want to shrink the file size you can reduce the resolution or the colour space.
Even GIF is a much better choice than JPEG for a diagram, mathematical formula or logo with hard edges and a small number of colours. SVG is usually the right choice, (but don't do what one designer did for me and embed a JPEG in an SVG instead of giving me an SVG direct from Illustrator or Inkscape).
I have not once seen a formulae-as-images solution that I would consider acceptable, aesthetically. Common problems are:
- It almost impossible to align the baseline of the formula to the baseline of the surrounding text.
- Often, images are only used for "complex" formulae, while simple ones are implemented using normal typesetting. This resolves the baseline issue for simple formulae, but now the fonts between simple and complex formulae don't match. (This requires extra concentration for the reader, as in other contexts, different font styles are frequently used meaningfully.)
- The images often have sub-par resolution.
Furthermore, you would need to express them in text anyways for accessibility.
One advantage of ereaders is that the fonts an be set to any size convenient for device and the person reading it.
A fixed pixel size image of something you want to read does not go well together with rendered text at all. It's okay for photos, but very definitely not for formulas, which are basically mostly text.
I'm not even talking about the aesthetics, different fonts because they too can be set by the user, and layout, since what's inside the image is fixed and untouchable by the renderer that handles all the rest of the text.
On my tablet I can use two fingers to zoom. But I pretty much never need to do that with a full size tablet. That's why I bought one with the retina display.
But zooming scales the fonts only. For pixel images you have the pixels that are in it and that's it. Scaling those either up or down does not produce good text.
Now I'm just waiting for the inevitable "AI image scaler" that handles text inside images.
> Now I'm just waiting for the inevitable "AI image scaler" that handles text inside images.
I'm surprised this isn't a thing already, as it seems doable with what people called "AI" 20 years ago. I mean, unless some unusual/non-default font was used, upscaling text on an image should be almost trivial. Ligatures notwithstanding, "printed letters" have a fixed shape, so:
1. Identify the typeface, size, weight, etc. by looking at the pixels of the text;
2. OCR the text (which should be 100% reliable);
3. Blank out the original text pixels; re-render the content (from step 2.) at a larger size (using parameters from step 1.).
I'm hedging here; it feels to me that OCR-ing normal text that never left the digital realm should be 100% reliable, but I'm not a specialist in that subfield so I surely must be missing something...
> I'm hedging here; it feels to me that OCR-ing normal text that never left the digital realm should be 100% reliable, but I'm not a specialist in that subfield so I surely must be missing something...
A string set in a given font at a given size won't always render as a fixed pattern of pixels. The font describes the curves of the letter forms and how that's rasterized depends on lots of factors such as the zoom level, exactly how the font rendering engine is implemented, whether or not anti-aliasing is turned on which is further complicated by the fact that the text can be set in any color with any other color as a background, etc. And there are a LOT of fonts.
Lastly, OCR is not just about recognizing letter shapes but has to contend with how the text flows. It has to understand line-breaks, multi-column layouts, captions, pulled quotes, page-numbers, hyphenation and all the other weird shit that we make text do.
It's definitely possible, as Google Translate (and before it, Word Lens) does exactly that.
Word Lens was demonstrated in late 2010.
That attitude leads to the shitty epubs we currently have. You either do a fixed-size PDF layout, or you have a proper dynamic solution. For technical/mathematical content, I am not interested in anything in between, given that PDF just works for me, and is easily achieved with tools today.
Alt texts are a thing on epubs. I would hope Amazon's format can do them as well.
Problem solved!
Doesn't work well with theming.