Publ: Development Blog

Pushl 0.2.2

Posted (9 days ago)

I’ve done a bunch more work on Pushl to try to get it more stable. In particular, I’ve made it so that it will only recurse into feeds that are on domains that were declared in the initial requests, and I seem to have cleared up some cases which were causing it to hang and also added a global timeout which will, hopefully, prevent it from hanging indefinitely.

I do wish I could figure out what is causing the hangs when they do happen though. Oh well. Some discussion of the issue below the cut.

So, there are two main tasks, process_feed and process_entry, which can both be spawned by the command line processor, and which can also spawn each other. (process_feed generally spawns process_entry as a matter of course, process_entry only spawns process_feed if -r is set.)

Both of these tasks will asynchronously fetch the data for the item itself, but then will gather a list of additional tasks to start in parallel, such as sending off WebSub/WebMention notifications or the aforementioned additional feed and entry processing tasks. And, because of the way asyncio works, the last thing each task does is wait for its pending tasks to complete.

The thing is, the only thing that ever hangs is that pending wait!

I’ve added a lot of logging to everything to see where every part of every process begins and ends, in a way that I can match things up in pairs, and every single individual task completes. But that await asyncio.wait(pending) will sometimes just wait forever. If I inspect the list of pending tasks when this does happen, every one is in the done state, so asyncio.wait should just be returning for them. But they aren’t.

It’s not even deterministic, which means that there’s probably something timing-related. Which would make me worry about there being a deadlock, but… there’s nowhere that a deadlock could sneak in, either. Any time a task is fired off it’s done as a new instance (except for the specific case of getting a webmention endpoint, which is cached using async_lru but doesn’t have any dependencies on anything that has a pending list, and isn’t a thing that’s hanging anyway), any duplicated work is discarded before any await statement (so there’s no way any cyclic dependencies are happening), all local file access is non-asynchronous, and like, when it does hang, the usual pattern is that there will be 2-3 process_feed tasks waiting on 6-7 process_entry tasks, which have all completed all of their async work but are waiting on their pending tasks.

I’m sure there’s just some dang typo somewhere that is causing something weird to happen, although pylint and flake8 haven’t found any of the usual telltale signs of that.

But of course, now that I’ve written a blog entry about trying to diagnose the problem, I can’t get the problem to recur, even on things that used to reproduce it 100%. WHATEVER.