Publ: Development Blog

Reblob!

2019-04-28T02:39:54-07:00

It’s been a while since I’ve worked on IndieWeb stuff, but I finally got around to releasing an extremely preliminary version of reblob, a little commandline thingus to make this stuff easier. Eventually I’ll also have a server-based version here, at least as an example.

Pushl v0.2.4, now with a proper user-agent

2019-03-15T17:29:27-07:00

While trying to figure out some weird access patterns on the day-job site I had the realization Pushl wasn’t actually specifying a user-agent, so it was just coming through as the generic aiohttp one, which isn’t very friendly.

Now it sends a reasonable user-agent by default, and this can be overridden by the --user-agent flag if you want to for your own analytics or whatever.

Oh, and I had quietly released 0.2.3 a few days ago; there were just some minor internal changes to logging and also declaring Pushl as beta, rather than alpha, software.

Pushl 0.2.2

2019-03-10T18:25:58-07:00

I’ve done a bunch more work on Pushl to try to get it more stable. In particular, I’ve made it so that it will only recurse into feeds that are on domains that were declared in the initial requests, and I seem to have cleared up some cases which were causing it to hang and also added a global timeout which will, hopefully, prevent it from hanging indefinitely.

I do wish I could figure out what is causing the hangs when they do happen though. Oh well. Some discussion of the issue below the cut.

So, there are two main tasks, process_feed and process_entry, which can both be spawned by the command line processor, and which can also spawn each other. (process_feed generally spawns process_entry as a matter of course, process_entry only spawns process_feed if -r is set.)

Both of these tasks will asynchronously fetch the data for the item itself, but then will gather a list of additional tasks to start in parallel, such as sending off WebSub/WebMention notifications or the aforementioned additional feed and entry processing tasks. And, because of the way asyncio works, the last thing each task does is wait for its pending tasks to complete.

The thing is, the only thing that ever hangs is that pending wait!

I’ve added a lot of logging to everything to see where every part of every process begins and ends, in a way that I can match things up in pairs, and every single individual task completes. But that await asyncio.wait(pending) will sometimes just wait forever. If I inspect the list of pending tasks when this does happen, every one is in the done state, so asyncio.wait should just be returning for them. But they aren’t.

It’s not even deterministic, which means that there’s probably something timing-related. Which would make me worry about there being a deadlock, but… there’s nowhere that a deadlock could sneak in, either. Any time a task is fired off it’s done as a new instance (except for the specific case of getting a webmention endpoint, which is cached using async_lru but doesn’t have any dependencies on anything that has a pending list, and isn’t a thing that’s hanging anyway), any duplicated work is discarded before any await statement (so there’s no way any cyclic dependencies are happening), all local file access is non-asynchronous, and like, when it does hang, the usual pattern is that there will be 2-3 process_feed tasks waiting on 6-7 process_entry tasks, which have all completed all of their async work but are waiting on their pending tasks.

I’m sure there’s just some dang typo somewhere that is causing something weird to happen, although pylint and flake8 haven’t found any of the usual telltale signs of that.

But of course, now that I’ve written a blog entry about trying to diagnose the problem, I can’t get the problem to recur, even on things that used to reproduce it 100%. WHATEVER.

Pushl v0.2.1 released

2019-03-07T22:27:02-08:00

I’ve been working on getting Pushl much more stable and reliable, particularly around a persistent “too many open files” error I was having, which turned out to be primarily due to a fd leak in the caching routines. Oops.

Anyway, there’s also seemingly a problem with how aiohttp manages its connection pool, at least on macOS, so I’ve disabled connection keep-alive by default. However, if you still want to use keep-alive, there’s now a --keepalive option to allow you to do that. I’m finding that it doesn’t really improve performance all that much anyway.

This is feeling beta-ready but I’ll give it a few days for other issues to shake out first.

Pushl v0.2.0 released

2019-03-07T00:05:24-08:00

So, I just released v0.2.0 of Pushl. It was a pretty big change, in that I pretty much rewrote all the networking stuff, and fixed some pretty ridiculous bugs with the caching implementation as well.

The main thing is now it’s using async I/O instead of thread-per-connection, so it’s way more efficient and also times out correctly.

And oh gosh, I had so many tiny but critical errors in the way caching was implemented – no wonder it kept on acting as if there was no cached state. Yeesh.

Anyway, I’ll let this run on my site for a few days and if I like what I see I’ll upgrade it to beta status on PyPI.

An early-alpha Movable Type importer

2019-02-20T15:42:18-08:00

For those folks who want to import their content from Movable Type over to Publ, I’ve finally gotten around to writing an importer. Currently it only attempts to convert entry content and category metadata, and only using SQLite-formatted database dumps.

See its README.md for the (incredibly rough) usage instructions.

Eventually I want to try to automatically convert templates from MT’s scripting language to Jinja-Publ templates, although there’s a bunch of stuff that’s going to be difficult to port across and a lot of stuff is just plain not feasible to even try, so don’t expect that to become a major thing any time soon.

Pushl v0.1.7

2019-01-14T21:28:44-08:00

I ended up doing some more work on Pushl and have now released v0.1.7. The major changes:

Did a bunch of refactoring to make the code a little cleaner and handle configuration more appropriately
Added a configurable timeout for connections (which now defaults to 15)
Added a --version option on the command line arguments

Also, some suggested usage ideas below the cut!

Installation

An installation guide is available in the project README, but the short version is to make sure you have Python 3 available and then run the following at a command prompt:

pip3 install pushl

which should do everything you need to install it. (On Linux or macOS may need to do sudo pip3 install pushl depending on how your system is set up.)

Some usage ideas

The main use for Pushl is to send Webmention and Pingbacks from any arbitrary blog to link targets, regardless of blogging platform (for example, using Jekyll, Movable Type, Pelican, or, of course, Publ). But it can be used for a lot more than that!

For example, the -e/--entry flag can be used to send webmentions from a specific page; for example:

pushl -e http://example.com/blog/page/12345

And if this page embeds feed discovery tags, you can combine that with -r to also recursively apply to its feeds; for example:

pushl -re http://forum.example.com/

This works especially well with forum software such as phpBB and XenForo, both of which support feed discovery. And this will help website publishers to know when their content is being discussed, with forum posts appearing as “pingbacks” on their site!

Of course, when using it with a forum or a sporadically-updating blog or whatever you’ll probably want it to be in a cron job. There’s more information about how to set that up in the project README.

Pushl v0.1.6 released

2019-01-13T20:48:35-08:00

It’s been a while since I’ve updated Pushl but today I released v0.1.6. It includes the following fixes:

Now it supports Pingback as well as Webmention
Improved the threading defaults and connection pooling
Also checks entries for updates even if the feed didn’t change (in case something changed in the more text or page metadata or whatever)

Anyway, it should just be a pip install --upgrade pushl (or pipenv update) away.

Pushl v0.1.5

2018-12-22T01:35:02-08:00

While I’m fixing random stuff in Publ, I figured I’d finally fix some problems with Pushl too. Nothing major here, just:

Stability: Fixed a bug where feeds that don’t declare links caused the worker to die before entries got processed
Performance: Now we use a global connection pool (so connections can be reused)
Fixed a minor correctness issue with archive feeds (which actually doesn’t make any difference in the real world but whatever)

Embedding webmention.io pings on your site

2018-12-20T23:14:47-08:00

Are you using webmention.io as your webmention endpoint? Want to get your incoming webmentions displayed on your website?

Well you’re in luck, I wrote a simple-ish script for that. (You’ll probably also want to see the accompanying stylesheet too.) And it doesn’t even require that you use Publ – it should work with any CMS, static or dynamic. The only requirement is that you use either webmention.io or something that has a similar enough retrieval API.

I wrote more about it on my blog, where you can also see it in use. For now, I’m just going to use the sample site repository to manage it (and issues against it).

It’s MIT-licensed, so feel free to use it wherever and however you want and to modify it for your needs. I might improve it down the road but for now it’s mostly just a quick itch-scratching hack that does things the way I want it to.