<?xml version="1.0" encoding="utf-8"?>



<feed xmlns="http://www.w3.org/2005/Atom"
    xmlns:fh="http://purl.org/syndication/history/1.0"
    xmlns:at="http://purl.org/atompub/tombstones/1.0">

    <title>Publ: Development Blog</title>
    <subtitle>A personal publishing system for the modern web</subtitle>
    <link href="http://publ.beesbuzz.biz/blog/feed?tag=community" rel="self" />
    <link href="http://publ.beesbuzz.biz/blog/feed" rel="current" />
    <link href="https://busybee.superfeedr.com" rel="hub" />
    
    
    <link href="http://publ.beesbuzz.biz/blog/" />
    <fh:archive />
    <id>tag:publ.beesbuzz.biz,2020-01-07:blog</id>
    <updated>2018-09-19T02:27:21-07:00</updated>

    
    <entry>
        <title>Goodbye peewee, hello PonyORM</title>
        <link href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM" rel="alternate" type="text/html" />
        <published>2018-09-19T02:27:21-07:00</published>
        <updated>2018-09-19T02:27:21-07:00</updated>
        <id>urn:uuid:26ccac61-8792-54c6-8681-eb173adee58c</id>
        <author><name>fluffy</name></author>
        <content type="html">
<![CDATA[
<p>For a number of reasons, I have replaced the backing ORM. Previously I was using peewee, but now I&rsquo;m using <a href="http://ponyorm.com">PonyORM</a>. The primary reason for this is purely ideological; I do not want to use software which is maintained by someone with a track record of toxic behavior.  peewee&rsquo;s maintainer responds to issues and feature requests with shouting and dismissive snark; PonyORM&rsquo;s maintainer responds with helpfulness and grace. I am a <a href="//beesbuzz.biz/7502">strong proponent of the latter</a>.</p><p>PonyORM&rsquo;s API is also significantly more Pythonic, and rather than abusing operator overloads for clever query building purposes, it abuses Python&rsquo;s AST functionality to parse <em>actual Python expressions</em> into SQL queries. Seriously, <a href="https://stackoverflow.com/questions/16115713/how-pony-orm-does-its-tricks">look at this explanation of it</a> and tell me that isn&rsquo;t just <em>amazing</em>.</p>

<p>There are a few downsides to Pony so far, though:</p>
<ul>
<li><p>While it&rsquo;s possible to adapt arbitrary types into database fields, queries don&rsquo;t actually work on them (so at least for Enums I have to convert at query time, which turns out to not be a huge deal)</p></li>
<li><p>There&rsquo;s no simple way to incrementally build a query with an OR branch in it (which I don&rsquo;t actually use anywhere at present but I did have to rework some query API stuff to do that)</p></li>
<li><p>Not really a downside but Pony treats <code>&#39;&#39;</code> and <code>NULL</code> as equivalent, which has some fun implications for storing empty strings in a table</p><p>Of course, SQLite does this too, internally, and my existing code for that case wasn&rsquo;t actually &ldquo;correct&rdquo; (but it happened to work with SQLite anyway). So moving to Pony meant I had to make this <em>actually correct</em> which, on the plus side, means that Publ is more likely to work with MySQL or Postgres (which I haven&rsquo;t tested yet)</p></li>
</ul>
<p>In addition to PonyORM I evaluated a few other options; my other front-runner was to simply store all of the data in in-memory tables and using <code>sorted([e for e in model.Entry where e.foo &gt; bar])</code> or whatever. Which was a gigantic pain to think about. Granted, a lot of what made it painful is stuff I had to do in order to support Pony as well (namely the switch from a query-building syntax to incremental list comprehensions), but the Pony approach happens to also be way more efficient since it can use indexes and also does all the filtering at once and so on.</p><p>Anyway, I&rsquo;m rambling here. How about we look at some quick benchmarks to see if this hurts performance! All these timings are based on building <a href="http://beesbuzz.biz">beesbuzz.biz</a>, which is getting to be a reasonably-large site at this point. These timings are based on simply running it locally on my desktop.</p><p>For the index scan I ran a simple Python script that looks like:</p><figure class="blockcode"><pre class="highlight" data-language="python" data-line-numbers><span class="line" id="e1080cb1L1"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb1L1"></a><span class="line-content"><span class="kn">from</span><span class="w"> </span><span class="nn">main</span><span class="w"> </span><span class="kn">import</span> <span class="n">app</span></span></span>
<span class="line" id="e1080cb1L2"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb1L2"></a><span class="line-content"><span class="kn">from</span><span class="w"> </span><span class="nn">publ</span><span class="w"> </span><span class="kn">import</span> <span class="n">model</span></span></span>
<span class="line" id="e1080cb1L3"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb1L3"></a><span class="line-content"><span class="n">model</span><span class="o">.</span><span class="n">scan_index</span><span class="p">()</span></span></span>
</pre></figure><p>which just sets up the configuration as appropriate and scans the index directly and exists. For the spidering I ran it under gunicorn with <code>gunicorn main:app</code> and used the command:</p><figure class="blockcode"><pre><span class="line"><span class="line-content">time wget --spider -r http://localhost:8000 -X /static,/comics</span></span>
</pre></figure><p>To keep things as fair as I could I spidered the entire site once without checking the time (so that the image cache would be pre-populated, to eliminate its I/O overhead as a variable).</p><h3 id="1080_h3_1_peewee"><a href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#1080_h3_1_peewee"></a>peewee</h3><p>Initial index scan:</p><figure class="blockcode"><pre class="highlight" data-language="terminal-session" data-line-numbers><span class="line" id="e1080cb3L1"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb3L1"></a><span class="line-content">$ time pipenv run python ./timing-test.py</span></span>
<span class="line" id="e1080cb3L2"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb3L2"></a><span class="line-content"></span></span>
<span class="line" id="e1080cb3L3"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb3L3"></a><span class="line-content">real    0m33.809s</span></span>
<span class="line" id="e1080cb3L4"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb3L4"></a><span class="line-content">user    0m10.916s</span></span>
<span class="line" id="e1080cb3L5"><a class="line-number" href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#e1080cb3L5"></a><span class="line-content">sys 0m4.004s</span></span>
</pre></figure><p>Time to spider entire website:</p><figure class="blockcode"><pre><span class="line"><span class="line-content">Total wall clock time: 20s</span></span>
<span class="line"><span class="line-content">Downloaded: 421 files, 4.4M in 1.1s (4.12 MB/s)</span></span>
<span class="line"><span class="line-content"></span></span>
<span class="line"><span class="line-content">real    0m20.514s</span></span>
<span class="line"><span class="line-content">user    0m0.285s</span></span>
<span class="line"><span class="line-content">sys 0m0.435s</span></span>
</pre></figure><p>Memory usage after spidering: around 78.6MB according to macOS Activity Monitor</p><h3 id="1080_h3_2_PonyORM"><a href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#1080_h3_2_PonyORM"></a>PonyORM</h3><p>Initial index scan:</p><figure class="blockcode"><pre><span class="line"><span class="line-content">real    0m13.897s</span></span>
<span class="line"><span class="line-content">user    0m4.562s</span></span>
<span class="line"><span class="line-content">sys 0m2.864s</span></span>
</pre></figure><p>Website spider time:</p><figure class="blockcode"><pre><span class="line"><span class="line-content">Total wall clock time: 20s</span></span>
<span class="line"><span class="line-content">Downloaded: 421 files, 4.4M in 1.3s (3.39 MB/s)</span></span>
<span class="line"><span class="line-content"></span></span>
<span class="line"><span class="line-content">real    0m20.041s</span></span>
<span class="line"><span class="line-content">user    0m0.335s</span></span>
<span class="line"><span class="line-content">sys 0m0.444s</span></span>
</pre></figure><p>Memory usage after spidering: 72.6MB</p><h3 id="1080_h3_3_Conclusions"><a href="http://publ.beesbuzz.biz/blog/1080-Goodbye-peewee-hello-PonyORM#1080_h3_3_Conclusions"></a>Conclusions</h3><p>PonyORM takes a little less RAM and it has faster writes. Its queries are also marginally faster. But not enough to make a meaningful difference.</p><p>Anyway, I&rsquo;m mostly just happy that this doesn&rsquo;t significantly <em>hurt</em> performance. The fact that it improves the end product while supporting positive influences in the F/OSS community is a bonus!</p><p>Anyway, the deployed site is still running Publ v0.2.3, but the first Pony-based release will come soon as v0.3.0.</p>

]]>
        </content>
    </entry>
    

    
</feed>