Publ: Development Blog

Asynchronous workers

Posted (6 months ago)

Today I got two major bits of functionality in: Publ will now asynchronously scan the content index (which speeds up startup and fixes some annoying race conditions with entry creation), and it also asynchronously generates image renditions (which makes pages not take forever to load on first render, and will also use multiple CPU cores if available). Seems to work well so far.

I was running into scaling problems with beesbuzz.biz (what with there being a couple thousand entries and some pages with hundreds of images on it) and this keeps it feeling pretty good.

So, this brings us up to version 0.1.14.

A bit about pseudo-asynchronous images

One of the compromises I had to do is that since WSGI is purely blocking, if a rendition isn’t yet available it has to return something, so I have it pause the service thread for 1/10 of a second and then redirect back to an async wrapper. And if that same wrapper gets hit 10 times it returns a placeholder colorburst image. The idea being that if there’s a lot of images, putting in a slight delay will hopefully cause the browser to wait until later to try reloading it. And in general its eems to work although larger galleries still end up getting the color bursts. The delay isn’t ideal, but in a multithreaded fronting server (such as gunicorn) this should at least roughly approximate proper async operation, and this is still way better than having the entire site freeze until all the images finish rendering.

If a colorburst image appears, simply reloading the page should clear it up, although there’s a possibility that the redirection might end up getting cached. If that becomes a problem it’ll be easy enough to add a cache-buster to the URL. (UPDATE: This indeed ended up happening both here and on the heroku deployment, so I implemented a trivial fix which will be part of the next release. I also ran into a different silly problem I could have predicted which also has a fairly straightforward fix available.)

For what it’s worth, here’s a few randomly-generated async failure colorbursts:

And, since they go through the actual async handler for this site, that also serves as a demonstration of what it looks like when a file fails to generate in time.

In the future there could also be a clientside javascript library that detects the async URLs (or I could float a data-publ attribute up or something) and then periodically tries reloading images until they appear. For now that feels like overkill, though, since this is generally only an issue the first time a page is rendered and even when I load a year’s worth of comics on an empty rendition cache I still only see a handful of these little colorburst images.

Unfortunately, the only way to properly fix this would be to move to an application stack that doesn’t rely on WSGI, but that means not being deployable on Dreamhost shared hosting anymore (or really any situation where you don’t have complete control over your fronting server), and I’m not particularly interested in raising the base requirements just to make one thing slightly less annoying.

Anyway! This was the last big code feature I felt was necessary to implement before I made my main site go live on Publ. All that’s holding me back now is getting my templates done.

How the site migration has gone

On that note, getting my stuff from Movable Type into Publ has been pretty straightforward, for the most part. I started out by making a Movable Type template that stuffed my entries into a Publ-ready Markdown file with the various Path-Alias headers set (and also a header for the existing Disqus comment thread ID where appropriate), and then I hacked together an HTML to Publ-Markdown converter that more or less handles things sorta-okay (the main tricky thing it deals with being the multi-image stuff, which is different enough from normal Markdown image tags that a standard converter probably wouldn’t have cut it).

In retrospect I would have been better off using an existing HTML to Markdown converter and then handling image tag consolidation myself, but oh well, live and learn.

Anyway, another thing I wrote was a useful script that catalogs all the images on the site and generates a report of which entries can’t resolve their images, and which images are never used. This made cleaning everything up a lot easier.

And, a really nice validating thing happened — I decided to completely change the organization of my art section and this was as simple as moving directories around. And this actually ended up unearthing a bunch of content I’d forgotten about and which had been unreachable from my site!

So, all of the content I care about has migrated over, and I will probably spend some time migrating the more useful parts of my blog although I’m not all that interested in preserving that right now. But I can always do that later, based on seeing what entries 404 or people asking me where such-and-such entry went.