<?xml version="1.0" encoding="utf-8"?>



<feed xmlns="http://www.w3.org/2005/Atom"
    xmlns:fh="http://purl.org/syndication/history/1.0"
    xmlns:at="http://purl.org/atompub/tombstones/1.0">

    <title>Publ: Development Blog</title>
    <subtitle>A personal publishing system for the modern web</subtitle>
    <link href="http://publ.beesbuzz.biz/blog/feed?tag=rant" rel="self" />
    <link href="http://publ.beesbuzz.biz/blog/feed" rel="current" />
    <link href="https://busybee.superfeedr.com" rel="hub" />
    
    
    <link href="http://publ.beesbuzz.biz/blog/" />
    <fh:archive />
    <id>tag:publ.beesbuzz.biz,2020-01-07:blog</id>
    <updated>2018-05-25T21:42:32-07:00</updated>

    
    <entry>
        <title>So much for Dreamhost</title>
        <link href="http://publ.beesbuzz.biz/blog/358-So-much-for-Dreamhost" rel="alternate" type="text/html" />
        <published>2018-05-25T21:42:32-07:00</published>
        <updated>2018-05-25T21:42:32-07:00</updated>
        <id>urn:uuid:1c5aad01-e4f5-4c6c-8322-941734f47fe6</id>
        <author><name>fluffy</name></author>
        <content type="html">
<![CDATA[
<p>One of the overarching reasons I decided to build Publ the way I did was in order to take advantage of Dreamhost&rsquo;s support for Passenger WSGI. I was expecting that to be the primary means of hosting my main site (which is way too big for a Heroku instance) and given how smoothly things were working with this site on Dreamhost I figured it wouldn&rsquo;t be a big deal.</p><p>However, there was a <em>huge</em> monkey wrench thrown into things when I switched my site&rsquo;s configuration over to Passenger; despite all of my configuration being exactly the same between publ.beesbuzz.biz and beesbuzz.biz, the rendition cache on beesbuzz.biz was getting its permissions set wrong, and there was some rather weird behavior with how it was making the temporary files to begin with.</p><p>In investigating this I attempted to upgrade my packages on publ.beesbuzz.biz, and all h*ck broke loose.</p>

<p>Basically, Dreamhost, being shared hosting, is in the business of overselling capacity. They used to do a very good job of managing their capacity. But then things like WordPress happened, and more sites got bigger and more complex and started taking way more memory, and for whatever reason Dreamhost decided that they would shift towards <em>only</em> supporting sites built in WordPress (or basic static hosting), and then they started getting increasingly more aggressive about their &ldquo;procwatch&rdquo; process-killer, and somewhere along the line it reached a tipping point where now you can&rsquo;t even run <code>pipenv install</code> without tripping their process monitoring.</p><p>I must have just been at the knife&rsquo;s edge of that with publ.beesbuzz.biz, because spinning up a second Publ app was too much for it to handle.</p><p>So, for now, <del>I&rsquo;ve rolled beesbuzz.biz back to my old MovableType-based site</del>, I have made the Heroku instance of publ.beesbuzz.biz the official one (if you are reading this then great, DNS has propagated!), and I am going to look into deploying Publ on my <a href="https://www.linode.com/?r=3387618616c77ee52a3a617c0218697a9c36bc9b">LiNode</a> VPS, which it turns out has <em>way</em> more capacity than I&rsquo;m using (thanks to them having given me incremental upgrades over the 6.5 years I&rsquo;ve been with them) and which should be just fine for this purpose.</p><p><mark>UPDATE</mark>: I have now deployed the new beesbuzz.biz on my LiNode VPS and it went off without a hitch, although DNS is probably going to take a while to propagate. Configuration is a bit fiddly though, and I&rsquo;d really like this to be easy for non-server-experts to do!</p><p>In the long run I&rsquo;m going to move my stuff away from Dreamhost, because beesbuzz.biz was my last major site running there and at this point I&rsquo;m basically paying $7/month for mediocre DNS service.</p><p>So, while setting things up on <a href="https://www.linode.com/?r=3387618616c77ee52a3a617c0218697a9c36bc9b">LiNode</a> is going to be more difficult, that is what I&rsquo;ll be going with for now (mostly because my LiNode plan just renewed like a month ago so I have two more years prepaid anyway).</p><p>In the longer term I&rsquo;m going to look at other webhosts; <a href="https://www.webfaction.com">WebFaction</a> looks pretty good, for example, and they come highly-recommended in the Python developer community. And their pricing is quite competitive!</p><p>Anyway, getting Flask running on gunicorn with an Apache reverse proxy was fairly straightforward. It&rsquo;s not the simplest thing to get going but at least I have a <a href="http://beesbuzz.biz">working site</a> (modulo DNS caching, anyway). Hopefully I can get my SSL sorted out soon too.</p>

]]>
        </content>
    </entry>
    
    <entry>
        <title>Dates are hard</title>
        <link href="http://publ.beesbuzz.biz/blog/398-Dates-are-hard" rel="alternate" type="text/html" />
        <published>2018-05-18T12:00:00-07:00</published>
        <updated>2018-05-18T12:00:00-07:00</updated>
        <id>urn:uuid:5f4cc46b-fd8d-4e49-9938-63d8ecf6d08a</id>
        <author><name>fluffy</name></author>
        <content type="html">
<![CDATA[
<p>There&rsquo;s an old joke in programming, that the two hardest things to do are naming things, cache invalidation, and off-by-one errors. But this doesn&rsquo;t pay sufficient respect to one of the other hardest things, namely handling date and time.</p>

<p>Many systems don&rsquo;t bother handling dates in any sort of universal way; they just
treat all entry times as being local time and call it a day. But this has a
problem: whenever the time zone changes, it means that every date it refers to
is now different than how it was when it first happened. Any traveling
photographer who has tried reconciling EXIF times in their photo software after
going across the world understands this pain. So does anyone who attempts to
schedule recurring meetings between different time zones (or even different
hemispheres), especially when Daylight Saving changes in one locale but not in
the other (or in opposite directions).</p><p>This is also a problem for any given CMS, and it&rsquo;s an intractable problem.</p><h3 id="398_h3_1_A-naïve-approach"><a href="http://publ.beesbuzz.biz/blog/398-Dates-are-hard#398_h3_1_A-naïve-approach"></a>A naïve approach</h3><p>An approach I&rsquo;ve seen several times is to simply store entry dates in local
time, and format them accordingly. But this messes up anything that&rsquo;s timezone-
aware; scheduled posts made for 2:30 AM will appear, then disappear, then
reappear again when daylight saving ends, and Atom feeds will have items slip
around whenever there&rsquo;s a time change (or if the person running the site decides
to move into a different timezone or whatever). So, this makes the data
unstable; it&rsquo;s only a minor hassle in the grand scheme of things but it still
represents a data integrity error.</p><h3 id="398_h3_2_Publ-s-attempt-at-being-clever"><a href="http://publ.beesbuzz.biz/blog/398-Dates-are-hard#398_h3_2_Publ-s-attempt-at-being-clever"></a>Publ&rsquo;s attempt at being clever</h3><p>At present, what Publ does is to store dates based on their local time of
writing, but keeps them with a timezone, and indexes them based on UTC for the
purpose of pagination and so on. It formats the date based on its original post
time in its respective time zone, and this seems to work okay; dates always
appear the same no matter when you look at them, and the relative time offset is
stable with respect to when it&rsquo;s being calculated.</p><p>But date-based pagination always has to be based on <em>something</em>, and I chose
local time for that. And this can cause all sorts of weirdness to happen,
especially for entries posted between 11 PM and 1 AM (depending on
circumstances).</p><p>Say an entry is posted at 11:30 PM on January 31; for me (pacific time) this
puts it at 23:30-08:00, which is the same as 07:30 UTC on February 1. Then
later, Daylight Saving kicks in. The entry is still at 23:30-08:00 (i.e. 07:30
UTC).</p><p>Now say someone is looking at January entries. During standard time their
pagination range is going to be 00:00-08:00 on January 1 thru 23:59-08:00 on
January 31, which translates to UTC as 08:00 January 1 - 08:00 February 1. Okay,
so 07:30 UTC on February 1 comes before 08:00 February 1. Great, my January 31
entry still appears in January.</p><p>But then they come back during DST, and the local timezone is now -07:00. So
someone browsing the site for January entries now gets a pagination range of
07:00 January 1 - 07:00 February 1. Suddenly my last-minute-of-January entry is
now part of February&rsquo;s page instead.</p><p>Let&rsquo;s say down the road I move to New York, which means my local timezone is now
-05:00 standard or -04:00 daylight saving. Oops, all of the pagination for my
site has changed again. And what&rsquo;s worse, all the older entries no longer make
<em>any</em> sort of sense, especially my posted-near-midnight comics.</p><p>Incidentally, this violates one of the core tenets of Publ — that pagination
should be <em>stable</em>.</p><h3 id="398_h3_3_Splitting-the-difference"><a href="http://publ.beesbuzz.biz/blog/398-Dates-are-hard#398_h3_3_Splitting-the-difference"></a>Splitting the difference?</h3><p>So, how about this approach: always paginate and sort entries based on what
their <em>local</em> time is (so an entry posted on 01/31/2017 always appears to be on
the page for 01/31/2017 regardless of the indicated time zone), and only use the
UTC normalization for determining a relative interval to the current time (i.e.
whether it&rsquo;s in the future for scheduled posts, and how many seconds ago it was
posted for the &ldquo;N seconds ago&rdquo; display). This <em>seems</em> like an okay compromise,
although it does mean that if a person is traveling between time zones things
might get a little weird around the boundaries, and sorting might not always
make perfect temporal sense (but it exposes fewer boundary conditions that will
make pagination break, so while it&rsquo;s not <a href="https://www.youtube.com/watch?v=hou0lU8WMgo">technically
correct</a> it&rsquo;s at least
predictable).</p><p>But, that seems less broken than other possibilities. It satisfies the principle
of least surprise, it keeps pagination stable, and it keeps the presented date
consistent with the authored date (even if it might cause some weird jumping-
around in some cases).</p><p>So, I think that is what I will change Publ to do. It&rsquo;s (slightly) more code and
more annoyance but it seems like the best path forward.</p><p>Even if it means time will sometimes run backward.</p>

]]>
        </content>
    </entry>
    
    <entry>
        <title>The Trouble with PHP</title>
        <link href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP" rel="alternate" type="text/html" />
        <published>2018-05-08T00:00:00-07:00</published>
        <updated>2018-05-08T00:00:00-07:00</updated>
        <id>urn:uuid:bed88efa-a822-4a13-acaf-77df79bb0a12</id>
        <author><name>fluffy</name></author>
        <content type="html">
<![CDATA[
<p>I&rsquo;ve had people ask me why I&rsquo;m not building Publ using PHP. While <a href="http://phpsadness.com">much</a>
has been <a href="https://www.quaxio.com/wtf/php.html">written</a> on this <a href="http://tracks.ranea.org/post/13908062333/php-is-not-an-acceptable-cobol">subject</a> from
a standpoint of what&rsquo;s wrong with the language (and with which I agree quite a lot!), that isn&rsquo;t, to me, the core of the problem with PHP on the web.</p><p>So, I want to talk a bit about some of the more fundamental issues with PHP, which actually goes back well before PHP even existed and is intractibly linked with
the way PHP applications themselves are installed and run.</p><p>(I will be glossing over a lot of details here.)</p>

<h3 id="246_h3_1_Some-history"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_1_Some-history"></a>Some history</h3><p>Back when the web was first created, it was all based around serving up static files. You&rsquo;d have an HTML file (usually served up from a <code>public_html</code> directory
inside your user account on some server you
had access to, which was sometimes named or aliased <code>www</code> but more often was just some random machine living on your university&rsquo;s network), and it
acted much like a simplified version of FTP — someone would go to a URL like <code>http://example.com/~username/</code> and you&rsquo;d see an ugly directory index of the
files in there (if you didn&rsquo;t override it with an <code>index.html</code> or, more often in those days, <code>index.htm</code>), and then someone would click on the page
they wanted to look at like <code>homepage3.html</code> and it would retrieve this file and whatever flaming skull .gif files it linked to in an <code>&lt;img&gt;</code> tag and the copy of <code>canyon.mid</code> you put an <code>&lt;embed&gt;</code> around, and that would be that. The web server was really just a file server that happened to speak HTTP.</p><p>Then one day, servers started supporting things called SSIs, short for &ldquo;<a href="https://en.wikipedia.org/wiki/Server_Side_Includes">server-side includes</a>.&rdquo; This let you do some very simple templatization of your
site; the server wouldn&rsquo;t just serve up the HTML file directly, but it would scan it for simple SSI tags that told the server to replace this tag
with another file, so that you could, for example, have a single navigation header that was shared between all your pages, and a common
footer or whatever.</p><p>But this mechanism was still pretty limited, and so about two minutes later someone came up with the idea of the <a href="https://en.wikipedia.org/wiki/Common_Gateway_Interface">Common Gateway Interface</a>, or CGI;
this would make it so the server would see a special URL like <code>/cgi-bin/formail.pl</code> and instead of serving up the content of the file, it would
run the file as a separate program and serve up its output.</p><p>At this time, HTTP generally used just a single verb, <code>GET</code>, which would get a resource. CGI needed a way of passing in parameters to the
program. Instead of just running the program like a command line (which would be very insecure), they passed in parameters through
environment variables; for example, if the user requested the &ldquo;file&rdquo; at <code>/cgi-bin/formail.pl?email=fwiffo@example.com&amp;text=Hi+I+like+your+site!</code>,
the web server would set the environment variable <code>QUERY_STRING</code> to the value of everything after the <code>?</code>, which <code>formail.pl</code> would then
parse out.</p><p>If the <code>POST</code> verb were used instead, then the server would also read some additional data from the user&rsquo;s web browser and then send that
to the script via its standard input.</p><p>Basically, the web server was no longer just a file server, but a primitive command processor.</p><h3 id="246_h3_2_Early-security"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_2_Early-security"></a>Early security</h3><p>Back when this first started, system administrators knew better than to let just <em>anyone</em> run just <em>any</em> program from the web server.
After all, people might do silly things like make it very easy to execute arbitrary commands on the server — and since the web server
often ran as the root/administrator user, this would be very bad indeed. Even the admins who were savvy enough to set up a special
sandbox user for the HTTP server would still need it to run everything from a common, trusted account that might have had
access to common areas of the server.</p><p>So, the usual approach was to have just a single <code>/cgi-bin/</code> directory with <em>trusted</em> programs that were vetted and installed by
the administrator, for things that they felt were important or useful for everyone to have. Usually this would be things like
standard guest books (the great-great-grandfather to comment sections) or email contact forms (since spam was starting to become
a problem and it was already dangerous to put your email address on the public web).</p><p>Back in these days people generally didn&rsquo;t have a database — after all, Oracle was expensive — and it didn&rsquo;t really matter anyway;
if you wanted to have a complex website you&rsquo;d just run some sort of static site generator (which was often written in tcsh or Perl or
something) and if you needed scheduled posts you&rsquo;d do it by having a <code>cron</code> job periodically update things. So, it wasn&rsquo;t really
that much of an impediment to have this setup.</p><p>If you were really savvy and wanted to run, say, an interactive online multiplayer game of your own design, you&rsquo;d simply
run your own server (often under your desk in your dorm room) and you&rsquo;d have root access and could install everything you
wanted in <code>/cgi-bin/</code>.</p><p>Because <em>everything</em> in <code>/cgi-bin/</code> was run as a program, you knew better than to let your scripts save other files into
that same directory; if it was a thing where people could upload files or post comments, it&rsquo;s not like it would do any
good anyway (since then the server would try to run them as programs, and you can&rsquo;t run a .jpg).</p><h3 id="246_h3_3_Shared-hosting"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_3_Shared-hosting"></a>Shared hosting</h3><p>Then as the web really started to take off, shared hosting providers started appearing, and CGI access became a pretty
commonly-requested high-end feature. Generally the shared hosting providers didn&rsquo;t want to let just <em>anyone</em> upload a
script to be run by the server, but they also didn&rsquo;t want to have to manually vet each and every script that users
wanted to install. So, as a compromise, they set up special rules so that within your own server space you could have a <code>/cgi-bin/</code> directory
and that things run from that directory would run under your account, rather than as the web server (using a mechanism called <code>suexec</code>).</p><p>This provided a pretty good compromise; users still had to know what they were doing in order to install their scripts, but they still
ran from a little sandboxed location, and because of the way <code>suexec</code> worked it was pretty unlikely for even a very badly-written script
to cause problems, because if the script tried to save out an executable file into the <code>cgi-bin</code> directory, it wouldn&rsquo;t be saved out
with execute privileges, so it would just cause an error 500 to occur. After all, <code>/cgi-bin/picture.jpg</code> wasn&rsquo;t a program, so why should it run?</p><h3 id="246_h3_4_Increased-flexibility"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_4_Increased-flexibility"></a>Increased flexibility</h3><p>But then things started to get a little more complicated. People wanted their main index page to be able to run as a script, without it
forwarding the page to <code>/cgi-bin/index.pl</code> or whatever.</p><p>So, another compromise happened: the CGI mechanism, which previously was set up to only run the scripts from the <code>/cgi-bin/</code> directory, got a
few new rules, such as &ldquo;if the filename ends in <code>.cgi</code> (or other common extensions like <code>.pl</code> or <code>.py</code>)
run it as a script.&rdquo; It still needed permissions to be set correctly, though,
and by this point <code>suexec</code> was generally set up so that there were even more rigorous checks before it would run the script.
And there were so many safety checks in place that this was still <em>generally</em> okay.</p><p>Around this time it also started becoming common to have access to a database such as mySQL or Postgresql, which allowed more flexibility
and more two-way content. Forums became a thing. So did early blogs. Most of this software started out by having the database just for
storage and the software would simply write out static files, but this started to have scaling problems and the webserver got busy with
the software writing these files out <em>all the time</em>, so it became more common for the software to simply read from the database directly
as it ran. This helped somewhat, but it also shifted a significant amount of load over to constantly
establishing short-lived database connections,
because every time the forum program ran it had to connect.</p><h3 id="246_h3_5_Hello-PHP"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_5_Hello-PHP"></a>Hello PHP</h3><p>At some point, PHP started to get popular.</p><p>PHP itself was originally intended as another way of adding server-side scripting into HTML files; it was in effect a templating
system for HTML. In the earliest days it was often just treated as another scripting language; the server would be configured to
consider <code>.php</code> as another name for <code>.cgi</code> or <code>.pl</code> or whatever, and the file would still be run
as a script. In some cases it even needed to
start with <code>#!/usr/local/bin/php</code> and it needed to be set executable with the correct permissions and so on (although this setup was uncommon).</p><p>However, most sites used <code>mod_php</code>, a server extension that allowed the web server to handle PHP files directly. In many respects it was very similar to <code>mod_cgi</code>, except it did a few
interesting things. One of the undeniable benefits was that it was now able to maintain the database connection persistently,
rather than having to re-establish a connection every time a script ran. It was also generally a bit nicer for speed because
commonly-used PHP scripts could stay in memory and not have to be re-interpreted every time a page was loaded.</p><p>But there were a couple of other implications this led to. In particular:</p>
<ul>
<li>It embedded the PHP interpreter into the web server itself (rather than running it as an external program)</li>
<li>Since it was no longer shelling out to an external program, it could always run a .php file regardless of its execution permissions — and so that&rsquo;s what it did</li>
</ul>
<p>There were a few different variations on this and it didn&rsquo;t always just run PHP from the web server (for example, some of the
better hosts figured out that they could have each user run their own separate per-user FastCGI server that would
run the PHP programs as the separate users, or whatever) but regardless of the setup, you now had PHP always running
and not having to care about the permissions of the file, meaning you now had some persistent process running what was
essentially executable code without the usual safeguards that a shared server would have.</p><p>This actually seemed like a good thing at the time, but then many, many pieces of software started allowing arbitrary
people to upload images, and often wouldn&rsquo;t make sure that what was supposedly an image was <em>actually</em> an image&hellip;</p><p>And so that&rsquo;s where we stand today.</p><p>This makes sites potentially vulnerable even if they aren&rsquo;t written in PHP themselves; for example, if your HTML
directory permissions are set to be slightly too permissive, and another site on the server gets hacked, that hacked site
can potentially be used to place a <code>.php</code> file into your site, and since <code>mod_php</code> doesn&rsquo;t check ownership permissions
it now runs on your site with whatever permissions PHP would normally run in your account. (And this isn&rsquo;t just a
theoretical; I&rsquo;ve had sites hacked in this way! Now I run a nightly script that ensures that my directory permissions
are correct and tells me about new <code>.php</code> files that appeared since the last check, just to be sure.)</p><p>So, long story short, one of the biggest problems with PHP isn&rsquo;t with the language itself, but with the way that PHP
gets run; people (and their bots) can find ways to upload
arbitrary files with a .php extension and, if that upload is visible to the webserver (which it often will be), then a
request to view that file will execute that file, regardless of its origin, and from there it can do anything that your own site can.</p><h3 id="246_h3_6_Other-PHP-features-of-note"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_6_Other-PHP-features-of-note"></a>Other PHP features of note</h3><p>Granted, the erroneously-executable upload feature is only responsible for <em>some</em> of the security exploits I&rsquo;ve
seen in the wild. I wasn&rsquo;t really intending to get into language-specific issues (after all, I linked to much
better, more-comprehensive articles about it in the introduction), but it&rsquo;s worth mentioning some of them
anyway, as I have seen all of these be used to hack websites I&rsquo;ve helped to clean up and secure.</p><p>The biggest one: For a very long time, the <a href="http://php.net/manual/en/function.include.php"><code>include()</code> function</a>
would happily support any arbitrary URL and would download and run whatever URL it was given. And it was very easy for a
PHP script to be accidentally written to allow an arbitrary user to provide such an arbitrary URL. (And by &ldquo;a very long time&rdquo;
I mean that this was the default configuration until very recently, and many hosts still configure it that way for backwards
compatibility.)</p>
<blockquote>
<p>Some might be looking at the PHP docs I linked to there and thinking, &ldquo;wait, but it&rsquo;s not running the PHP code locally.&rdquo;
What the docs mean are that if you do like <code>include(&#39;http://example.com/foo.php&#39;);</code> it&rsquo;s the output of <code>foo.php</code> that gets
included. However, that output could in turn be more PHP code, which would then be executed locally, meaning on your server.
And PHP doesn&rsquo;t even care what the file extension is; doing an <code>include()</code> on <code>asdf.txt</code> or <code>pony.jpg</code> will happily execute whatever <code>&lt;?php ?&gt;</code>
blocks exist inside of it as well.</p></blockquote>
<p>There&rsquo;s also a few other features of PHP that lend itself to arbitrary code execution. One particularly <em>fun</em> one was the
PCRE <code>e</code> flag, which indicated that the result of the regular expression should be executed as arbitrary code; and as PCRE
flags are embedded into the regular expression itself, a carefully-crafted search term (on a less-carefully-crafted search page)
could run arbitrary code. Fortunately, this has been removed in PHP 7; unfortunately, a lot of web hosts still run PHP 5
(or older!) and so this option — which never had a single legitimate usage — is still available on the vast majority of
web servers out there.</p><h3 id="246_h3_7_How-Flask-and-therefore-Publ-are"><a href="http://publ.beesbuzz.biz/blog/246-The-Trouble-with-PHP#246_h3_7_How-Flask-and-therefore-Publ-are"></a>How Flask (and therefore Publ) are different</h3><p>So, I&rsquo;m posting this on the Publ blog, which implies that I&rsquo;m trying to build a favorable comparison for Publ. And that&rsquo;s
a perfectly fine inference to take.</p><p>Publ is built on Flask, which uses the <a href="https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface">WSGI</a> (Web Server Gateway Interface),
rather than the CGI, model of execution. This is a bit more complicated
than I want to get into but the short version is that rather than the web server running a program based on the URL,
Publ stays running as a standalone program that the webserver sends commands to as requests come in. So, it&rsquo;s
never asking a file how it should be run, but instead it&rsquo;s telling a single program to handle a request. So, there&rsquo;s
no danger of some random file being executed when it shouldn&rsquo;t be.</p><p>&ldquo;But wait,&rdquo; you might ask, &ldquo;isn&rsquo;t that exactly what you were complaining about <code>mod_php</code> doing?&rdquo; Well, that&rsquo;s true, <code>mod_php</code> works by
always having the PHP interpreter running and able to execute whatever arbitrary code it comes across. However, in the Python
world, code is kept
separate from data. Loading a URL in Flask isn&rsquo;t mapping to a script file that gets loaded and run, it&rsquo;s calling an established,
fixed function that loads a content file and formats it through a template.</p><p>Another thing that Flask does is it separates out template content (which is executable) from static file content. Static files aren&rsquo;t executable
by default. Templates can
embed arbitrarily-complex code, but they can only use functions that are provided to them — there&rsquo;s no direct access to the entire Python standard
library, for example, and so the most dangerous functions aren&rsquo;t included by default. (And Publ does not provide any of
those functions either, at least not purposefully.)</p>
<blockquote>
<p><mark>Important note:</mark> When I say static files aren&rsquo;t executable by default, this simply refers to how Publ sees them. If your site is configured to serve up static files where PHP or CGI scripts are executable, then any such scripts that end up in your static files will indeed be executable. This is going to be the case on pretty much any shared hosting provider, for example.</p><p>Also, regardless of the server setup, Publ can&rsquo;t magically protect your content or template directory from outright misconfigurations with permissions. Even classic static sites need to be secured from third-party/unauthorized access.</p></blockquote>
<p>Publ itself also only knows how to handle a handful of content formats — Markdown, HTML, and images — and ignores everything else. So if a
<code>.php</code> file somehow ends up in the content directory, it won&rsquo;t matter at all — Publ just ignores it. It will never
attempt to run code that&rsquo;s embedded in a content file, nor does it even even know <em>how</em> to. And Publ doesn&rsquo;t handle
arbitrary user uploads anyway (nor is there any plan to ever support this); anything that would be potentially hazardous
would have been put there by some other means.</p><p>Publ&rsquo;s design is basically just a fancy way of presenting static files, just like in the early days of the web. It just
serves up the static files dynamically. Or, as I keep on saying, Publ is like a static publishing system, only dynamic.</p><p>(Of course, if your directory permissions are set wrong, someone can still use someone else&rsquo;s exploited PHP-based site to attack
your account and modify Publ&rsquo;s code. But there&rsquo;s nothing that Flask or Publ can do to prevent that, and this is just a general security problem that impacts everyone regardless of what they&rsquo;re running.)</p><p>It would of course be foolish of me to claim that Publ itself is 100% secure and impossible to hack. And at least on Dreamhost
there&rsquo;s the very real possibility that somehow an arbitrary .php file gets injected into the static files (perhaps by an
incorrect directory permission or whatever), which isn&rsquo;t a flaw in Publ itself but the end result (a hacked site) is the
same. So far as I can tell there&rsquo;s no way to entirely disable PHP on a Dreamhost-based Publ instance, and it&rsquo;s really the
ability to run PHP that makes PHP so dangerous in this world.</p><p>So, I&rsquo;m not going to claim that Publ is 100% secure or unhackable. But it sure has one heck of a head start.</p>

]]>
        </content>
    </entry>
    

    
</feed>