Publ: Development Blog

Publ v0.5.14 released!

2020-02-04T17:40:14-08:00

Today I released v0.5.14 of Publ, which has a bunch of improvements:

Fixed a bug in card retrieval when there’s no summary
Admin panel works again
Markdown entry headings now get individual permalinks (the presentation of which can be templated)
Markdown entry headings can be extracted into an outline to be used for a table of contents
Lots of performance improvements around ToC and footnote extraction, and template API functions in general

Entry headings

Because of the new entry headings, by default all headings will get an empty at the beginning; this is intended as a styling hook so that you can use stylesheet rules to add a visible anchor/link to them.

The format of headings is also templatizable, by passing a heading_template argument to the entry.body/more property-function things; the default value is "{link}{text}" but you can do, for example, "{link}{text}" if you want to put the entire heading into a link (although this means that having links within your headings becomes undefined), or "{text}" if you want the old behavior, or "{text}{link}#" if you want the anchor marker to be part of the text data (rather than styled via CSS), or whatever. Part of why I opted for the current default is that it seemed to be maximally-useful while minimally-intrusive as far as changing existing layouts.

The "{link}" template fragment can also take some further configuration; if you just want to set its CSS class name, you can do that with the heading_link_class configuration, and you can also set any other arbitrary HTML attributes by passing a dict to heading_link_config. For example, if you want them to all have a title="permalink" you can pass the value {"title":"permalink"} for that configuration value.

Currently there is no way to have those vary across the links, however; a more robust configuration mechanism (that can perhaps take in functions or format strings or the like) is certainly possible but it felt out-of-scope for this feature.

Tables of Contents

I was originally just building the heading permalink generation as a standalone feature, but then I realized that while I was doing that I’d might as well provide tables of contents, too. My original plan for ToCs was to make use of Misaka’s built-in ToC formatter, but getting that to work alongside Publ seemed pretty challenging, especially since Publ has the multiple document section stuff that Misaka wasn’t intended to work with. (Incidentally, I went through similar things with the built-in footnotes stuff.)

A ToC is accessible simply by looking at entry.toc. While it takes all of the usual HTML formatting arguments, none of them really have any effect (aside from disabling smartquotes and enabling XHTML). It does add its own argument, max_depth, which chooses how many levels of the ToC to show. This is relative to the highest level in the entry, so you can continue to use whatever top heading level you prefer.

Performance improvements

So, footnotes had a bit of a performance impact in that rendering those out also requires rendering out the entire entry multiple times, which can add up a lot. This came down to part of how Publ allows you to use template functions as if they are properties, using a too-clever wrapper called CallableProxy. Put briefly, this is an object that wraps a function in such a way that if you use the function directly in a Jinja context, it gets treated as if you’re using the default return value instead. Unfortunately, there are various things at various layers of Flask that make it end up calling the function multiple times, which can be really slow — especially if the function, say, runs Markdown formatting on the entire entry.

A long time ago I had CallableProxy set up such that it would cache the return value of the default call, but this had other implications when I started supporting HTTPS and user login and so on. Depending on how objects got pooled and cached this could cause the wrong content to be displayed to the end user — definitely not a good thing! At the very least this would often cause bugs where outgoing links would flip-flop between being http:// or https:// or using the wrong hostname or the like, and in the worst case this could theoretically cause page content to display for someone who wasn’t authorized to see it, or vice-versa (although I don’t believe there’s an actual way of causing this). But in any case, it was bad.

So, what I ended up doing was instead of naïvely caching the default function return, I wrapped it behind @functools.lru_cache and made the various aspects that make these functions non-idempotent part of the cache key; for now this is just the request URL and the current user.

This cut down on a lot of chaff, but there was still a long ways to go!

Both tables of contents and footnotes have global numberings, meaning rendering entry.more gets affected by how many of those things live in entry.body. There were some pretty wonky ways I was trying to keep track of that stuff, but generally-speaking this meant that rendering entry.more also required rendering (and discarding) entry.body. This could also end up rendering a whole bunch of extra images, too, if the image configuration between body and more don’t match. There were also some attempts at caching the various fragments' buffers, but this got unwieldy and unpleasant.

So, what does it do instead now?

First, there’s a faster counting-only Markdown processor that will only count the number of footnotes and headings, which lets us do some cute optimizations especially on entry.more.

Next, any time a count of footnotes or headings comes about naturally based on some other bit of processing, that information gets cached for later.

This has cut down on the amount of rendering that has to take place. There’s still some redundancies (for example, it still has to render out all of the entry content even when it’s only trying to extract the footnotes or table of contents) but this at least cuts down on the amount of stuff that has to happen.

Combined with the CallableProxy optimization this means it’s also doing way less work when you’re simply checking to see if an entry has a TOC or footnotes, such as when doing:

{% if entry.toc %}
<nav id="toc"><h2>Table of Contentsh2>{{entry.toc}}nav>
{% endif %}

This still unfortunately will be doing extraneous rendering — including image renditions — as will the similar entry.footnotes code, but still, a 3x improvement is a lot better, even if it’s not ideal.

An obvious next step would be to make a headings-only renderer for the table of contents. This won’t help with footnotes, unfortunately, and there’s no reasonable way to prevent footnotes from rendering images that are outside of footnotes (and it still needs to be able to render images for ones that are in footnotes), but still, a partial improvement is still better than no improvement.

So why is this still v0.5.x?

At some point I decided that my versioning scheme would be based on “milestones,” and for v0.6 my milestone is having automated testing and unit coverage set up. This seems to be kind of foolish; what I’m doing isn’t quite semantic versioning, which says that versioning should be based on major.minor.patch where major is for backwards-incompatible changes, minor is for added functionality, and patch is for bug fixes. But if I’d been releasing Publ with that scheme, we’d be on, like, v2.193.0 by now or something.

Fortunately, semver does have a provision for in-development software:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

so I think I’m still following in the spirit with semantic versioning for now. If I ever decide to release 1.0 I’ll have to re-evaluate my version numbering though!

Errata

There is an issue with watchdog v0.10.0 which causes issues with Pipenv or other lockfile-based deployment mechanisms. If you are developing on macOS and deploying on Linux, I highly recommend pinning the version with:

pipenv install watchdog==0.9.0
pipenv clean

See the FAQ for more information.

Authl v0.2.0, now in beta status!

2019-08-19T01:49:00-07:00

I’ve released Authl v0.2.0. Changes since v0.1.8:

Added support for Twitter
Big ol' refactor to support Twitter (see the fuller discussion below the cut!)
Released to beta!

And changes from v0.1.7 to v0.1.8 (which I didn’t bother to post an announcement about):

Fixed an incredibly minor security issue in the Mastodon client (the client_secret was leaking but in the context of Mastodon that couldn’t really be used for anything anyway)
Centralize/refactor the login token management, allowing for future flexibility in the service stack
Make callback IDs protocol-stable, which helps with some stricter services (e.g. Twitter)

So, the big ol' refactor: Previously the redirection target wasn’t part of the actual auth flow; the intention was that the site would just encode the redirection target into the callback_url parameter. This was an artifact of how the original Authl prototype was using signed URLs for the email handler, and based on a common mechanism that’s used in a lot of newer APIs in general.

However, it has a few problems:

Twitter (and probably other OAuth providers) require a strict match on the callback URL
It leaks information about what people are logging in to see
It assumes that the app is built in a Flasky way

So, as part of this refactor, the handler.initiate_auth() method now takes an additional parameter, redir, and it’s up to the handler to keep track of that value as part of its flow. And then that value is passed back to the app when an identity is verified, and it’s up to the application to use that value. Which ends up actually being a lot cleaner anyway, and it simplified a bunch of stuff in the default Flask handlers.

This does mean that the API has now changed in an incompatible way, thus the minor version bump, although it only affects anyone who was using Authl outside of Flask, and if anyone was using Authl outside of Flask I’d be pretty surprised. (For that matter I doubt if anyone’s been using Authl inside Flask except me!)

Anyway, another facet of the Twitter handler is that it provides the URL as https://twitter.com/username#id; for example, mine is https://twitter.com/fluffy#993171. The reason for this is that if someone changes their username, someone else could set their username to be able to log in as you. Unfortunately this does mean that if someone changes their Twitter username, their Authl user ID will also change, meaning that they will lose access to whatever access is granted to the old username. I have some ideas on how to make this work a bit better, although that’ll be part of normalizing how user profiles work (and currently user profiles aren’t actually consumed by Publ, for whatever it’s worth).

Of course in a Publ context it’s easy enough to just see that the user ID hasn’t changed and update the ACLs accordingly. It’s annoying but it’s possible (and straightforward and secure).

Anyway, I’d like to thank Kyle Mahan for having written silo.pub, as I used its Twitter handler as a reference for this implementation. And also Kevin Marks for pointing me towards it as a reference in the first place. Because holy heck it’s hard to find useful information on providing web-based Twitter login in Python!

Auth is working nicely

2019-07-08T11:56:58-07:00

I’ve released Authl 0.1.1, which adds support for Mastodon authentication. And the Publ test suite now is up-to-date with that as well.

There’s a few things I want to do on Publ before I release a version for use on my own website, the big one being the ability to provide a better login page, and some refactoring around built-in templates now that built-in templates are becoming a thing.

I also really want to redo how I manage the documentation site, because it’s getting kind of untenable at this point.

Anyway, really soon I’ll have properly-private content on my website again, and hopefully this will be enough of a feature for people to actually be interested in Publ!

Pushl v0.2.0 released

2019-03-07T00:05:24-08:00

So, I just released v0.2.0 of Pushl. It was a pretty big change, in that I pretty much rewrote all the networking stuff, and fixed some pretty ridiculous bugs with the caching implementation as well.

The main thing is now it’s using async I/O instead of thread-per-connection, so it’s way more efficient and also times out correctly.

And oh gosh, I had so many tiny but critical errors in the way caching was implemented – no wonder it kept on acting as if there was no cached state. Yeesh.

Anyway, I’ll let this run on my site for a few days and if I like what I see I’ll upgrade it to beta status on PyPI.