Publ: Development Blog

Publ v0.7.31 released

Posted (2 months ago)

There’s a new release of Publ. There’s no new features, but there’s a huge performance improvement.

I’d been having some performance issues where on my larger sites, the main Atom feed was taking a long time to render. It didn’t really bother me too much because thanks to aggressive caching it would only cause an occasional slow page load (on the order of a few seconds) every now and then, but I thought there was probably something wrong with the I/O characteristics of how pages render.

Boy howdy was I wrong about that.

The way that Publ handles count-based pagination is to look at the first eligible entry and then traverse forward in the query results until it finds that number of visible ones. There’s a few other fancy things for handling the presence of unauthorized entries for various purposes (for example, view.has_unauthorized and being able to retrieve a limited number of unauthorized entries for friends-only stubs on feeds and so on), but the prevailing assumption was that PonyORM would just have a query result cursor page through every time the iterator incremented on the loop.

Turns out that, no. No it doesn’t.

In fact, what was happening was Pony ends up retrieving every possible entry first, before Publ’s auth filtering could run. So on my main site, for example, which has around 3300 entries, it was retrieving all 3300 entry rows in order to render the Atom feed.

So, yeah, no wonder it was taking multiple seconds to render!

Anyway, now in those situations, Publ just fetches chunks of entries at a time, basically doing its own ad-hoc cursor. It’s not quite as efficient as it would be with a proper database cursor (since now it’s using LIMIT/OFFSET queries) and it could still be way faster if I were using a database layer other than Pony (for example, switching to SQLAlchemy, or eschewing a database entirely as I’ve briefly discussed and rambled about), but wow, this makes a huge performance improvement overall.

Also, while I was trying to diagnose the issue, I had followed a red-herring path about changing the way that entry auth is stored, and ended up simplifying it in a way which is how I should have done it to begin with. Unfortunately that required a schema change, which means sites need to be reindexed after upgrading. Fortunately that’s still a largely transparent operation.

Someday I need to do some pretty big overhauls to Publ. But for now I’m happy just keeping it going and making it better all the time.

Anyway, huge thanks to BearPerson from eevee’s discord for having the curiosity and tenacity to find the actual performance issue.