<?xml version="1.0" encoding="utf-8"?>



<feed xmlns="http://www.w3.org/2005/Atom"
    xmlns:fh="http://purl.org/syndication/history/1.0"
    xmlns:at="http://purl.org/atompub/tombstones/1.0">

    <title>Publ: Development Blog</title>
    <subtitle>A personal publishing system for the modern web</subtitle>
    <link href="https://publ.beesbuzz.biz/blog/feed?tag=help-wanted" rel="self" />
    <link href="https://publ.beesbuzz.biz/blog/feed" rel="current" />
    <link href="https://busybee.superfeedr.com" rel="hub" />
    
    
    <link href="https://publ.beesbuzz.biz/blog/" />
    <fh:archive />
    <id>tag:publ.beesbuzz.biz,2020-01-07:blog</id>
    <updated>2019-03-10T18:25:58-07:00</updated>

    
    <entry>
        <title>Pushl 0.2.2</title>
        <link href="https://publ.beesbuzz.biz/blog/756-Pushl-0.2.2" rel="alternate" type="text/html" />
        <published>2019-03-10T18:25:58-07:00</published>
        <updated>2019-03-10T18:25:58-07:00</updated>
        <id>urn:uuid:d9b55a70-8000-5092-be97-2775d4a24cba</id>
        <author><name>fluffy</name></author>
        <content type="html">
<![CDATA[
<p>I&rsquo;ve done a bunch more work on Pushl to try to get it more stable. In particular, I&rsquo;ve made it so that it will only recurse into feeds that are on domains that were declared in the initial requests, and I seem to have cleared up some cases which were causing it to hang and also added a global timeout which will, hopefully, prevent it from hanging indefinitely.</p><p>I do wish I could figure out what is causing the hangs when they do happen though. Oh well. Some discussion of the issue below the cut.</p>

<p>So, there are two main tasks, <a href="https://github.com/PlaidWeb/Pushl/blob/01b1d438382bd5c06851626d3dadcd6e3d8cb3f3/pushl/__init__.py#L31"><code>process_feed</code></a> and <a href="https://github.com/PlaidWeb/Pushl/blob/01b1d438382bd5c06851626d3dadcd6e3d8cb3f3/pushl/__init__.py#L96"><code>process_entry</code></a>, which can both be spawned by the command line processor, and which can also spawn each other. (<code>process_feed</code> generally spawns <code>process_entry</code> as a matter of course, <code>process_entry</code> only spawns <code>process_feed</code> if <code>-r</code> is set.)</p><p>Both of these tasks will asynchronously fetch the data for the item itself, but then will gather a list of additional tasks to start in parallel, such as sending off WebSub/WebMention notifications or the aforementioned additional feed and entry processing tasks. And, because of the way <code>asyncio</code> works, the last thing each task does is wait for its pending tasks to complete.</p><p>The thing is, the <em>only</em> thing that <em>ever</em> hangs is that pending wait!</p><p>I&rsquo;ve added a lot of logging to everything to see where every part of every process begins and ends, in a way that I can match things up in pairs, and every single individual task completes. But that <code>await asyncio.wait(pending)</code> will sometimes just wait forever. If I inspect the list of pending tasks when this does happen, every one is in the <code>done</code> state, so <code>asyncio.wait</code> should just be returning for them. But they aren&rsquo;t.</p><p>It&rsquo;s not even deterministic, which means that there&rsquo;s probably something timing-related. Which would make me worry about there being a deadlock, but&hellip; there&rsquo;s nowhere that a deadlock could sneak in, either. Any time a task is fired off it&rsquo;s done as a new instance (except for the specific case of getting a webmention endpoint, which is cached using <code>async_lru</code> but doesn&rsquo;t have any dependencies on anything that has a pending list, and isn&rsquo;t a thing that&rsquo;s hanging anyway), any duplicated work is discarded before any <code>await</code> statement (so there&rsquo;s no way any cyclic dependencies are happening), all local file access is non-asynchronous, and like, when it does hang, the usual pattern is that there will be 2-3 <code>process_feed</code> tasks waiting on 6-7 <code>process_entry</code> tasks, which have all completed all of their async work but are waiting on <em>their</em> pending tasks.</p><p>I&rsquo;m sure there&rsquo;s just some dang typo somewhere that is causing something weird to happen, although <code>pylint</code> and <code>flake8</code> haven&rsquo;t found any of the usual telltale signs of that.</p><p>But of course, now that I&rsquo;ve written a blog entry about trying to diagnose the problem, I can&rsquo;t get the problem to recur, even on things that used to reproduce it 100%. <strong>WHATEVER.</strong></p>

]]>
        </content>
    </entry>
    

    
</feed>