Publ: Development Blog

Publ v0.7.31 released

2024-10-02T02:13:22-07:00

There’s a new release of Publ. There’s no new features, but there’s a huge performance improvement.

I’d been having some performance issues where on my larger sites, the main Atom feed was taking a long time to render. It didn’t really bother me too much because thanks to aggressive caching it would only cause an occasional slow page load (on the order of a few seconds) every now and then, but I thought there was probably something wrong with the I/O characteristics of how pages render.

Boy howdy was I wrong about that.

The way that Publ handles count-based pagination is to look at the first eligible entry and then traverse forward in the query results until it finds that number of visible ones. There’s a few other fancy things for handling the presence of unauthorized entries for various purposes (for example, view.has_unauthorized and being able to retrieve a limited number of unauthorized entries for friends-only stubs on feeds and so on), but the prevailing assumption was that PonyORM would just have a query result cursor page through every time the iterator incremented on the loop.

Turns out that, no. No it doesn’t.

In fact, what was happening was Pony ends up retrieving every possible entry first, before Publ’s auth filtering could run. So on my main site, for example, which has around 3300 entries, it was retrieving all 3300 entry rows in order to render the Atom feed.

So, yeah, no wonder it was taking multiple seconds to render!

Anyway, now in those situations, Publ just fetches chunks of entries at a time, basically doing its own ad-hoc cursor. It’s not quite as efficient as it would be with a proper database cursor (since now it’s using LIMIT/OFFSET queries) and it could still be way faster if I were using a database layer other than Pony (for example, switching to SQLAlchemy, or eschewing a database entirely as I’ve briefly discussed and rambled about), but wow, this makes a huge performance improvement overall.

Also, while I was trying to diagnose the issue, I had followed a red-herring path about changing the way that entry auth is stored, and ended up simplifying it in a way which is how I should have done it to begin with. Unfortunately that required a schema change, which means sites need to be reindexed after upgrading. Fortunately that’s still a largely transparent operation.

Someday I need to do some pretty big overhauls to Publ. But for now I’m happy just keeping it going and making it better all the time.

Anyway, huge thanks to BearPerson from eevee’s discord for having the curiosity and tenacity to find the actual performance issue.

SQLite vs. Postgres, at a glance

2021-05-05T11:28:50-07:00

There’s a general belief that SQLite is a “slow” database and Postgres is “fast,” and many software packages (including FOSS) insist that SQLite is only suitable for testing and doesn’t scale. However, this doesn’t make much sense when you think about it; SQLite is an in-process database so there’s no communications overhead between the service and the database, and because it’s only designed to be accessed from a single process it can make use of optimistic locking to speed up transactions.

Since I was installing postgres for another purpose on my webserver, I decided to quickly see if Publ performs better on Postgres vs SQLite. To test the performance I compared the timing for my website on both doing a full site reindex, and rendering the Atom feed several times (using the debug Flask server and caching disabled).

Times are in seconds:

Database	Index	Atom feed
SQLite	23.267	0.799
Postgres 12	26.132	1.270

So, SQLite is, as I had assumed, substantially faster than Postgres, and also has much lower administrative overhead. Thus, I will continue to recommend that as the database of choice for traditionally-hosted deployments.

My belief is that, in general, if you’re building something where there’s only a single process connecting to the database (i.e. you don’t have a cluster talking to a single database instance), SQLite will perform better than Postgres. The reason to use Postgres is so that you can scale to multiple processes or servers talking to a single centralized data store. If you can build your system such that each database connection can be isolated to a single database instance, SQLite is going to perform much better.

There are other considerations, of course, but if performance is your primary concern, SQLite isn’t a bad way to go.

Publ 0.6.6, Authl 0.4.0

2020-05-31T03:32:50-07:00

I’ve just released new versions of Publ and Authl.

Publ v0.6.6 changes:

Fixed a regression that made it impossible to log out
Fixed a problem where WWW-Authenticate headers weren’t being cached properly
Improve the changed-file cache-busting methodology
Add object pooling to Entry, Category, and View (for a potentially big memory and performance improvement)

Authl v0.4.0 changes:

Finally started to add unit tests
Removed some legacy WebFinger code that was no longer relevant or ever touched
Added a mechanism to allow providers to go directly to login, as appropriate
Added friendly visual icons for providers which support them (a so-called “NASCAR interface”)

Publ 0.6.6

The main reason for this update is just that the embarrassing logout bug was rearing its head and I wanted to fix it on my site without monkeypatching it or temporarily moving to git head or whatever. The WWW-Authenticate fix is nice, though, as it’s related to some work I’m doing on Pushl (namely adding the ability to retrieve bearer tokens from an external helper program).

It’s difficult to estimate what a performance change will be like based on testing on a developer desktop vs. a production VPS. In particular, the various I/O performance characteristics can vary a lot, and Publ is primarily I/O bound. In my desktop-side testing I found that the object pooling increased performance by 15%, which is already pretty great, but that’s also on a machine with a lot of memory, a huge file cache, and no disk virtualization. I’ve only deployed Publ 0.6.6 on my personal website around half an hour ago, but already my site monitoring is showing a rather impressive performance improvement. For example, the Atom feed used to take around 30 seconds to render on a cache miss. Right now it seems to take 2.5 seconds.

So, yeah, it takes only 10% of the time to run now – that’s around a 900% performance improvement in a typical deployment scenario. So, that’s pretty great.

Right now the largest remaining performance bottleneck seems to be in PonyORM, which is unfortunate. I haven’t yet figured out if it’s with PonyORM itself, or with its interface to sqlite. From what I can tell, the way that trace profiling works in Python means that things with a lot of function calls become quite a lot slower than long-running things within a single function, so things that do a lot of abstraction and dependency injection (like, say, PonyORM) get unfairly impacted in trace profiling. A sample-based profiling approach would be much more fair and realistic, but I haven’t found any sample-based Python profilers (and I don’t know enough about Python’s internals to know if that’s even a possibility).

My short-term goals for Publ are otherwise unchanged since the last release announcement.

Authl 0.4.0

I hadn’t worked on Authl in quite some time, but I felt like it needed some attention.

These Authl changes are basically for some UX improvements that had been bugging me for a while; there was an awful lot of text to read and that was possibly scary to newcomers. Now there’s still just as much text to read but there’s friendly icons for a bunch of the supported services, and silo services such as Twitter can now go straight to the login flow without implying that the username is necessary.

Here’s a before and after on the default Flask template:

The next thing I want to work on for Authl is finally adding actual support for user profiles. This would also probably go along with things like adding more providers, particularly Facebook, Tumblr, and maybe even OpenID 1.x (i.e. Dreamwidth). Better profile support means having a friendlier greeting than just the canonical identity URL, among other things that people might want in their own federated login use cases.

Some other thoughts of things that would be neat

Now that Publ supports entry attachments, it might be reasonable to add native server-side webmentions; rather than fetching the mentions from webmention.io on every page view, have a webhook on update that triggers a script that fetches and formats the mentions as an attachment that can then be rendered and cached, as well as getting all of the benefits of SEO that it would bring. For some sites, having the comments be indexed by the search engines makes a huge difference to page ranking, since the conversation about an article can add in some useful keywords that weren’t in the actual article. (Not to mention it improves the page’s “freshness” as far as the search engine is concerned.)

Another thought I’ve had about attachments is they could be used to implement a server-side comment system, although that would require a lot more work than webmention rendering (UI, moderation/spam-filtering, migrating stuff again) and after all the work I put into my Isso setup I’m not quite ready to think about how to actually do that. I’d probably want to do it in the form of having a mechanism to pre-render the Isso comment thread and form into an HTML attachment rather than having every part of it handled via Publ entry attachments.

Caching stats update

2020-02-05T13:23:28-08:00

A few weeks ago I had discovered that caching wasn’t actually being used most of the time, and took some stats snapshots for future comparison.

Now that Publ has been running with correct caching for a while, let’s see how things have changed!

Caveats

These stats are based on overall site usage, so it includes both manual browsing and also search crawlers and feed readers and the like. Simply looking at the cache statistics doesn’t paint a very clear picture of the actual performance improvements; in the stats, 10 users being able to quickly load a fresh blog entry quickly will be far overshadowed by a single search engine spidering the entire website and thrashing the cache, but those 10 users are, to me, far more important.

Measurements

Throughput

Here’s a measurement of how much traffic the cache actually sees:

The first graph shows that before I fixed the caching, very little was being written to the cache, but the amount being read from it was pretty steady. As soon as the fix was made and the cache was being written to, amazingly enough it started actually receiving traffic. In the initial spike of activity, the read and write rate were about the same, which seems plausible for a cache that’s being filled in with a relatively low hit rate. There’s a steady read rate of around 40K/second and a steady write rate of around 8K/sec – most of that being internal routines that were being written to the cache, uselessly.

The second graph (post-fix) shows a cache that’s actually being actively used. There’s an average write rate of 12K/sec, and a read rate of 17K/sec. There are also several write spikes at around 25K/sec, which I am suspecting are due to search crawler traffic.

Allocation

This is where things get a bit more useful to look at – how much stuff is actively being held in the cache?

Before the cache fix, the answer to that was, “Not much.” The cache was averaging a size of a mere 868KB, and after I flipped the caching fix over, it jumped up considerably. During my testing of the fix, the size would spike up substantially and then drop down as cache items got evicted.

After the cache fix, the allocation went way up. It never went below 2MB, and during the write spikes it would jump up to 7MB or so. This is still far short of the 64MB I have allocated for the cache process.

Commands/results

Here’s what is actually happening in terms of the cache hits and misses:

Before, the graph shows an average of 44 hits per second, and .63 misses per second. The GET and SET rates are (unsurprisingly) more or less the same.

After, we see much more interesting patterns – and not in a good way. It’s averaging only 13 hits per second, and .8 misses per second, but that’s an average. Eyeballing the graph it looks like the miss rate spikes at about the same time as the incoming traffic spikes, and outside of those spikes the hit rate is around 13 and the miss rate is… too small to reasonably estimate.

Page load time

Also when I made the change I also started monitoring the load time of a handful of URLs, which is interesting:

What’s interesting about these graphs is that Munin loads those URLs once every 5 minutes – which happens to be the cache timeout, and so that does a lot to explain the rather chaotic nature of the load time graph, especially on the Atom feed (minimum of 113ms, maximum of 45 seconds, average of 12 seconds). The Atom feed is probably the most loadtime-intense page on my entire website, and would most strongly benefit from caching. This graph tells me that based on the average vs. max times, the Atom feed is getting a hit rate of around 25%. That isn’t great.

Conclusions

Aggregate memcached stats aren’t really that useful for determining cache performance at this scale.

More to the point, the cache as currently configured probably isn’t really making much of a difference. Items are falling out of the cache before they’re really being reused.

Next steps

It’s worth noting that the default memcached expiry time is 5 minutes (which also happens to be how I had my sites configured), which feels like a good tradeoff between content staleness and performance optimization. However, Publ soft-expires all cached items when there’s a content change, so the only things that should linger with a longer expiry time are things like the “5 minutes ago” human-readable times on entries, which really don’t matter if they’re outdated.

As an experiment I will try increasing the cache timeout to an hour on all of my sites and see what effect that has. My hypothesis is that the allocation size and hit rate will both go up substantially, and the average page load time will go way down, with (much smaller) hourly spikes and otherwise a very fast page load (except for when I’m making content changes, of course).

I’m also tempted to try setting the default expiry to 0 – as in, never expire, only evict – and see what effect that has on performance. I probably won’t, though – it would have an odd effect on the display of humanized time intervals and make that way too nondeterministic for my taste.

Update: Initial results

Even after just a few hours it becomes pretty obvious what effect this change had:

The actual effect is a bit surprising, though; I would have expected the quiescent RAM allocation to be closer to the peak, and for the incoming (SET) traffic to be spikier after that as well. I wonder if improved site performance caused a malfunctioning spider to stop hammering my site quite so much, or something. I do know there are a bunch of spiders that have historically been pretty aggressive.

Of course the most important metric – page load time – has ended up exactly as I expected, with it dropping to an average of 2ms for everything and it only being that high because of hourly spikes. I guess the fact Munin is the still seeing the spikes means that Munin is keeping my cache warm (for a handful of pages), so, thanks Munin!

Maybe I should set the cache expiration to a prime number so that it is less likely to be touched on an exact 5-minute interval.

Publ v0.5.14 released!

2020-02-04T17:40:14-08:00

Today I released v0.5.14 of Publ, which has a bunch of improvements:

Fixed a bug in card retrieval when there’s no summary
Admin panel works again
Markdown entry headings now get individual permalinks (the presentation of which can be templated)
Markdown entry headings can be extracted into an outline to be used for a table of contents
Lots of performance improvements around ToC and footnote extraction, and template API functions in general

Entry headings

Because of the new entry headings, by default all headings will get an empty at the beginning; this is intended as a styling hook so that you can use stylesheet rules to add a visible anchor/link to them.

The format of headings is also templatizable, by passing a heading_template argument to the entry.body/more property-function things; the default value is "{link}{text}" but you can do, for example, "{link}{text}" if you want to put the entire heading into a link (although this means that having links within your headings becomes undefined), or "{text}" if you want the old behavior, or "{text}{link}#" if you want the anchor marker to be part of the text data (rather than styled via CSS), or whatever. Part of why I opted for the current default is that it seemed to be maximally-useful while minimally-intrusive as far as changing existing layouts.

The "{link}" template fragment can also take some further configuration; if you just want to set its CSS class name, you can do that with the heading_link_class configuration, and you can also set any other arbitrary HTML attributes by passing a dict to heading_link_config. For example, if you want them to all have a title="permalink" you can pass the value {"title":"permalink"} for that configuration value.

Currently there is no way to have those vary across the links, however; a more robust configuration mechanism (that can perhaps take in functions or format strings or the like) is certainly possible but it felt out-of-scope for this feature.

Tables of Contents

I was originally just building the heading permalink generation as a standalone feature, but then I realized that while I was doing that I’d might as well provide tables of contents, too. My original plan for ToCs was to make use of Misaka’s built-in ToC formatter, but getting that to work alongside Publ seemed pretty challenging, especially since Publ has the multiple document section stuff that Misaka wasn’t intended to work with. (Incidentally, I went through similar things with the built-in footnotes stuff.)

A ToC is accessible simply by looking at entry.toc. While it takes all of the usual HTML formatting arguments, none of them really have any effect (aside from disabling smartquotes and enabling XHTML). It does add its own argument, max_depth, which chooses how many levels of the ToC to show. This is relative to the highest level in the entry, so you can continue to use whatever top heading level you prefer.

Performance improvements

So, footnotes had a bit of a performance impact in that rendering those out also requires rendering out the entire entry multiple times, which can add up a lot. This came down to part of how Publ allows you to use template functions as if they are properties, using a too-clever wrapper called CallableProxy. Put briefly, this is an object that wraps a function in such a way that if you use the function directly in a Jinja context, it gets treated as if you’re using the default return value instead. Unfortunately, there are various things at various layers of Flask that make it end up calling the function multiple times, which can be really slow — especially if the function, say, runs Markdown formatting on the entire entry.

A long time ago I had CallableProxy set up such that it would cache the return value of the default call, but this had other implications when I started supporting HTTPS and user login and so on. Depending on how objects got pooled and cached this could cause the wrong content to be displayed to the end user — definitely not a good thing! At the very least this would often cause bugs where outgoing links would flip-flop between being http:// or https:// or using the wrong hostname or the like, and in the worst case this could theoretically cause page content to display for someone who wasn’t authorized to see it, or vice-versa (although I don’t believe there’s an actual way of causing this). But in any case, it was bad.

So, what I ended up doing was instead of naïvely caching the default function return, I wrapped it behind @functools.lru_cache and made the various aspects that make these functions non-idempotent part of the cache key; for now this is just the request URL and the current user.

This cut down on a lot of chaff, but there was still a long ways to go!

Both tables of contents and footnotes have global numberings, meaning rendering entry.more gets affected by how many of those things live in entry.body. There were some pretty wonky ways I was trying to keep track of that stuff, but generally-speaking this meant that rendering entry.more also required rendering (and discarding) entry.body. This could also end up rendering a whole bunch of extra images, too, if the image configuration between body and more don’t match. There were also some attempts at caching the various fragments' buffers, but this got unwieldy and unpleasant.

So, what does it do instead now?

First, there’s a faster counting-only Markdown processor that will only count the number of footnotes and headings, which lets us do some cute optimizations especially on entry.more.

Next, any time a count of footnotes or headings comes about naturally based on some other bit of processing, that information gets cached for later.

This has cut down on the amount of rendering that has to take place. There’s still some redundancies (for example, it still has to render out all of the entry content even when it’s only trying to extract the footnotes or table of contents) but this at least cuts down on the amount of stuff that has to happen.

Combined with the CallableProxy optimization this means it’s also doing way less work when you’re simply checking to see if an entry has a TOC or footnotes, such as when doing:

{% if entry.toc %}
<nav id="toc"><h2>Table of Contentsh2>{{entry.toc}}nav>
{% endif %}

This still unfortunately will be doing extraneous rendering — including image renditions — as will the similar entry.footnotes code, but still, a 3x improvement is a lot better, even if it’s not ideal.

An obvious next step would be to make a headings-only renderer for the table of contents. This won’t help with footnotes, unfortunately, and there’s no reasonable way to prevent footnotes from rendering images that are outside of footnotes (and it still needs to be able to render images for ones that are in footnotes), but still, a partial improvement is still better than no improvement.

So why is this still v0.5.x?

At some point I decided that my versioning scheme would be based on “milestones,” and for v0.6 my milestone is having automated testing and unit coverage set up. This seems to be kind of foolish; what I’m doing isn’t quite semantic versioning, which says that versioning should be based on major.minor.patch where major is for backwards-incompatible changes, minor is for added functionality, and patch is for bug fixes. But if I’d been releasing Publ with that scheme, we’d be on, like, v2.193.0 by now or something.

Fortunately, semver does have a provision for in-development software:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

so I think I’m still following in the spirit with semantic versioning for now. If I ever decide to release 1.0 I’ll have to re-evaluate my version numbering though!

Errata

There is an issue with watchdog v0.10.0 which causes issues with Pipenv or other lockfile-based deployment mechanisms. If you are developing on macOS and deploying on Linux, I highly recommend pinning the version with:

pipenv install watchdog==0.9.0
pipenv clean

See the FAQ for more information.

v0.5.12 released, and lots of documentation fixes

2019-12-31T00:02:13-08:00

Release notes

Today I got a fire lit under me and decided to do a bunch of bug fixing and general performance improvements.

Changes since v0.5.11:

Fixed a micro-optimization which was causing some pretty bad cache problems (I really should write a blog entry about this but tl;dr micro-optimizations are usually bugs in disguise)
Fixed an issue which was causing the page render cache to not actually activate most of the time (you know there’s going to be a ramble about this below…)
Fixed a bunch of spurious log meessages about nested transactions
Refactored the way that markup=False works, making it available from all Markdown/HTML contexts
Changed no_smartquotes=True to smartquotes=False (no_smartquotes is retained for template compatibility) (although I missed this on entry.title; I’ve already committed a fix for the next version)
Improve the way that the page render cache interacts with templates
Fixed an issue where changing a template might cause issues to occur until the cache expires

Documentation improvements

The Apache/nginx deployment guide is vastly improved:
- Now it uses UNIX domain sockets instead of localhost ports, making service provisioning a bit easier
- The systemd unit is now a user unit instead of a system unit, which improves security and also allows for gentler service restarts
The git deployment guide has been updated per the above, and also some of the code snippets are cleaned up
The information about HTML processing and image renditions has been consolidated and cleaned up
Information about private posts and user configuration has also been cleaned up somewhat
Also lots of updates to the beesbuzz.biz Publ templates

The caching stuff

So, once upon a time, the page render cache was caching at the response level, rather than the render level, which seemed like a good idea at the time. But then I realized this was bad, and made it so that if the request was coming from a browser that could potentially return a not modified response, this would break things badly. So, in that situation it just turned the render cache off.

This of course had the silly side effect of making the rendition cache not active in precisely the situation when it should most be active!

Later I had refactored the rendition cache to cache at the render level, with the request routing and response (which are cheap) always evaluated and only the page render itself would be cached. But I forgot to remove the check above.

So, all this time, the caching system was only being used for caching… stuff that didn’t really benefit from being cached. Like low-level file lookups, which aren’t exactly a performance hog (and could lead to rather unfortunate issues with template locations being out-of-date until cache expiry took place).

Anyway, after getting the cache to actually work, I also realized there were a few things I could do to make stale cached renditions no longer linger. Previously, the cache key that’s generated for a rendition just involved (essentially) the file paths of the relevant items in the URL; category templates would know about the template’s file path and the category path, and entry templates would additionally know about the entry ID, and then at a global level it would also know about the request’s base URL (so it would cache different hostnames and schemes differently, which also had the nice side-effect of eliminating key conflicts if two sites were configured with the same memcached key prefix but I digress).

Well, first I realized it was pretty trivial to have entries and templates express their file fingerprint as part of their cache key, so changes to templates and entries would cause immediate cache misses – meaning instant updates on the next page load. But this would only apply to content updates on entry pages, not on category pages.

So I started to go down a rabbit hole where updates to entries would also update the cache key for the category itself, which caused indexing to take a lot more time and also required storing metadata about all categories (and not just ones with configuration metadata) in the database, and this had a few other annoying side-effects (meaning bugs) that had to be ironed out. And it still wouldn’t help to update category pages which change due to an update to an entry in a different category.

Then I realized that the easiest thing to do would be to have the latest file modification be part of the cache key; any content file update would then basically invalidate the entire page render cache. Given that most sites only update very infrequently this seemed like a nice tradeoff. So I started implementing that…

…and then realized that in the early days of me adding caching to Publ, I had already implemented that since I thought it would be useful, and it was just not being used at all! (And I had even touched this code when I was adding mypy annotations to everything, but didn’t even think about it…)

So, now a bit of functionality that’s been there has theoretically made the rendition cache a lot faster, even around site resets. Neat.

In any case, after all this work I decided to do some benchmarking. I used bench to time rendering the Publ tests index page, and the results were interesting:

No cache

time                 90.92 ms   (85.06 ms .. 97.29 ms)
                     0.993 R²   (0.985 R² .. 1.000 R²)
mean                 87.33 ms   (86.27 ms .. 90.70 ms)
std dev              2.868 ms   (968.0 μs .. 4.970 ms)

SimpleCache (in-process object store)

time                 37.22 ms   (36.19 ms .. 38.11 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 38.10 ms   (37.39 ms .. 40.58 ms)
std dev              2.433 ms   (469.3 μs .. 4.620 ms)
variance introduced by outliers: 19% (moderately inflated)

MemcacheD

time                 38.38 ms   (37.95 ms .. 39.06 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 38.21 ms   (37.92 ms .. 38.51 ms)
std dev              570.3 μs   (428.0 μs .. 762.1 μs)

So, at least on that fairly simple test, the tests index page runs about 2x faster with a cache present than without. (MemcacheD is a little slower than SimpleCache, but that’s to be expected, as it has to serialize/deserialize objects over the network. Frankly I’m surprised it’s only that small of a difference!)

Then I decided to benchmark the main page of my personal website, which is rather more complicated. Running locally I got these results:

No cache

time                 280.0 ms   (274.8 ms .. 284.9 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 278.0 ms   (277.1 ms .. 279.7 ms)
std dev              1.548 ms   (749.5 μs .. 2.023 ms)
variance introduced by outliers: 16% (moderately inflated)

SimpleCache

time                 19.32 ms   (19.19 ms .. 19.42 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 19.28 ms   (19.21 ms .. 19.38 ms)
std dev              201.5 μs   (138.9 μs .. 289.7 μs)

MemcacheD

time                 20.85 ms   (20.62 ms .. 21.13 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 20.57 ms   (20.44 ms .. 20.74 ms)
std dev              341.0 μs   (254.2 μs .. 511.7 μs)

So, yeah, 14x faster… And my site feels way more responsive now, too, at least when Pushl isn’t thrashing the cache.

Time will tell just how much of a difference this makes in practical terms; I’ve had munin monitoring my MemcacheD for a while and the graphs made it look like it was pretty effective but it was of course not actually monitoring anything useful. But here’s some graphs of the last week:

In a week or so I’ll see what they’re like and if there’s any difference. I’m also just realizing that my “HTTP load time” graph isn’t actually very useful so I need to configure Munin more appropriately.

I’m also not entirely sure what those semi-regular spikes in MemcacheD traffic have been; it’s unfortunately not easy to tell what individual things are using MemcacheD since it’s just a big ol' global key-value store, more or less.

Goodbye peewee, hello PonyORM

2018-09-19T02:27:21-07:00

For a number of reasons, I have replaced the backing ORM. Previously I was using peewee, but now I’m using PonyORM. The primary reason for this is purely ideological; I do not want to use software which is maintained by someone with a track record of toxic behavior. peewee’s maintainer responds to issues and feature requests with shouting and dismissive snark; PonyORM’s maintainer responds with helpfulness and grace. I am a strong proponent of the latter.

PonyORM’s API is also significantly more Pythonic, and rather than abusing operator overloads for clever query building purposes, it abuses Python’s AST functionality to parse actual Python expressions into SQL queries. Seriously, look at this explanation of it and tell me that isn’t just amazing.

There are a few downsides to Pony so far, though:

While it’s possible to adapt arbitrary types into database fields, queries don’t actually work on them (so at least for Enums I have to convert at query time, which turns out to not be a huge deal)
There’s no simple way to incrementally build a query with an OR branch in it (which I don’t actually use anywhere at present but I did have to rework some query API stuff to do that)
Not really a downside but Pony treats '' and NULL as equivalent, which has some fun implications for storing empty strings in a table
Of course, SQLite does this too, internally, and my existing code for that case wasn’t actually “correct” (but it happened to work with SQLite anyway). So moving to Pony meant I had to make this actually correct which, on the plus side, means that Publ is more likely to work with MySQL or Postgres (which I haven’t tested yet)

In addition to PonyORM I evaluated a few other options; my other front-runner was to simply store all of the data in in-memory tables and using sorted([e for e in model.Entry where e.foo > bar]) or whatever. Which was a gigantic pain to think about. Granted, a lot of what made it painful is stuff I had to do in order to support Pony as well (namely the switch from a query-building syntax to incremental list comprehensions), but the Pony approach happens to also be way more efficient since it can use indexes and also does all the filtering at once and so on.

Anyway, I’m rambling here. How about we look at some quick benchmarks to see if this hurts performance! All these timings are based on building beesbuzz.biz, which is getting to be a reasonably-large site at this point. These timings are based on simply running it locally on my desktop.

For the index scan I ran a simple Python script that looks like:

from main import app
from publ import model
model.scan_index()

which just sets up the configuration as appropriate and scans the index directly and exists. For the spidering I ran it under gunicorn with gunicorn main:app and used the command:

time wget --spider -r http://localhost:8000 -X /static,/comics

To keep things as fair as I could I spidered the entire site once without checking the time (so that the image cache would be pre-populated, to eliminate its I/O overhead as a variable).

peewee

Initial index scan:

$ time pipenv run python ./timing-test.py

real    0m33.809s
user    0m10.916s
sys 0m4.004s

Time to spider entire website:

Total wall clock time: 20s
Downloaded: 421 files, 4.4M in 1.1s (4.12 MB/s)

real    0m20.514s
user    0m0.285s
sys 0m0.435s

Memory usage after spidering: around 78.6MB according to macOS Activity Monitor

PonyORM

Initial index scan:

real    0m13.897s
user    0m4.562s
sys 0m2.864s

Website spider time:

Total wall clock time: 20s
Downloaded: 421 files, 4.4M in 1.3s (3.39 MB/s)

real    0m20.041s
user    0m0.335s
sys 0m0.444s

Memory usage after spidering: 72.6MB

Conclusions

PonyORM takes a little less RAM and it has faster writes. Its queries are also marginally faster. But not enough to make a meaningful difference.

Anyway, I’m mostly just happy that this doesn’t significantly hurt performance. The fact that it improves the end product while supporting positive influences in the F/OSS community is a bonus!

Anyway, the deployed site is still running Publ v0.2.3, but the first Pony-based release will come soon as v0.3.0.