Publ: Development Blog

Publ v0.7.14 released

Posted (a year ago)

Publ v0.7.14 is now released. Changes:

  • Images now only get rendered when they’re first retrieved
  • Certain classes of date-handling issues are handled less-badly on 32-bit systems

Previously, every image rendition would be rendered if it was requested by the template, which wasted a lot of processing time and, more importantly, storage space for things like OpenGraph tags and so on, which are generally not seen very often but are refreshed every time search engines crawl a site.

The image rendition cache for was around 2.6 GB large (2609784 KiB, to be precise), for example, and I guesstimated that about half of that was spurious OpenGraph card images.

After deploying this version of Publ and doing wget -mpk (which mirrors the site and all images via <a href>/<img src[set]>/stylesheets, but not things like og:image), the resulting rendition cache size is around a gigabyte — less than half the original size! This should account for all images that will remain in the cache; obviously OpenGraph card images will appear temporarily when items are posted to Twitter et al (which is, of course, the point), but they shouldn’t persist forever.

The main trickiness in this change was making the asynchronous image handler encode the render parameters into the URL, and use a signed URL to prevent obvious render request attacks (which could lead to a denial-of-service). The logic for doing this really reminds me of how much the image rendering stuff could use some more refactoring. This was actually pretty straightforward to do, although the initial implementation made for some very long (and often redundant and not-CDN-friendly) URLs, because the image render logic sees all template arguments, not just the ones sent for imaging.

For now I’ve added some logic to simply filter out any arguments that don’t affect the image rendition directly. In the future I’d rather change the overall pipeline logic such that what gets sent to LocalImage._get_rendition is just the final attributes needed for the actual render, rather than the parameters that go into the calculation, but, again, refactoring.

None of this really matters, of course, since the async handler URL is only transitory anyway, but it’s always nice to not leak information, and to keep page sizes as small as possible. Technically the URLs could be made somewhat smaller if I were to use a more efficient serializer, but that’s a project for some other day.

Oh, also, a nice side-effect of this change is that, combined with the lazy-loading change I made a while ago, this actually makes images feel somewhat more responsive on large image sets, since Publ only starts to render images when they become visible, rather than clogging up its render queue with a bunch of items up-front. Theoretically this should also cut down on the number of “timeout blobs” which appear in the wild; in some of my spot testing I was unable to get any to appear at all, whereas before this change they were pretty common.