Move by a thousand cuts

Posted by David on Dec 11th, 2008

I am moving.  I work way out in a different suburb now, and Gwinnett lacks the public transportation I depended upon to avoid driving a similar distance to Alpharetta.  Also, my current apartment is awful and I am sick of it.  I am tired of stepping in dog shit in the mornings, of hearing each nuance of my neighbors’ getting crunk, of ignoring the burnt out husk of one of the buildings, the occasional gunshot, the broken alarm in that Georgia Power truck that starts beeping whenever the temperature drops, the ham-handed attempts at gentrification that are clogging the streets with the accoutrements of construction and killing any character that decrepit shopping center on the corner may have once had, and I am tired of ignoring the general atmosphere of apathy and futility.  I need a change.

I’m doing things a little differently this time around.  Instead of renting a U-Haul and bribing some friends with lunch and a case of PBR, I’m hiring people that do this sort of thing for a living.  Moving is awful, and I never again want to drag that washer and dryer up or down any flights of stairs.  I can just do the American thing and throw money at the problem to make it go away, but, unfortunately, there’s more to my bright idea.  Starting with the premises that a) acquiring boxes is a major hassle, and the easy route of buying boxes from the movers that will be used once is wasteful; b) I’m already traveling every morning in the general direction of the new apartment; c) I have a week between getting the keys to the new place and the date reserved with the movers, and d) car can hold things and move them between points, I came to the conclusion that I could just use whatever containers I had on hand—a handful of boxes saved from the last move, recycling bins, stolen milk crates, a bucket—to move everything that isn’t furniture in the week leading up the big move.  I could drop everything off in the morning, empty the boxes and containers into a corner somewhere and bring the empties back at night.  I neglected a couple of things: a) I own several heavy things that are not furniture and b) I drive a compact car.

I’m moving to Suwanee, more or less at the point where Gwinnett, Fulton and Forsyth counties all meet.  It’s farther north than Discover Mills but not as far as you have to drive to see an Imax move that isn’t about birds.  I haven’t yet come to terms with living this far out in exburbia, but it seems to have a lot going for it.   I’ll be close enough to work that I can bike again, there are little pockets of places to go and things to do even if not a whole city’s worth, and most surprising, there’s a lot bikers in the area.  This part of Gwinnett county has bike trails and bike lanes and yuppies fearlessly riding carbon fiber down busy streets.  Maybe it’s be an ok place to be.  I got the keys on Monday.

I won’t be able to move everything I originally wanted by this Saturday, but it’s been going a lot better than the revised, somewhat panicked estimate I made once I figured out how many boxes of stuff I could move at a time.  All of the heavy books are moved, and I’ll at least have all of the furniture cleared off before the weekend, along with most everything else except for a closet or two and probably the bathroom.  I’m waiting for the shelves and tables and cetera before I try to figure out where to place anything in the new apartment, and so far it looks like the kitchen is going to the biggest problem.  The new kitchen, though a little smaller, is also more open and has more usable counter space, so it’s really an upgrade in that I’ll be able to cook without struggling to find room for a cutting board and maybe I won’t break as many things when not rushing around in a claustrophobic alcove.  But I lost some cabinet space.  I’ll have to be more creative about storage.

My first neighborly encounter was with a lady who wears too much perfume and owns a little yappy dog that peed on my car.  She lives in the apartment next to mine, and while I was bumping and clumping around dragging the first wave of heavy boxes up the stairs, Mr. Yappy spent much of time barking at the door, challenging my presence and all the noises of moving stuff.  Once I stepped inside and closed the door, I couldn’t hear a thing.  Maybe this new place won’t be so bad.

Software is awful

Posted by David on Dec 5th, 2008

I used the Wordpress-native XML format to import the nanoblogger data.  WXR is basically just RSS with some extra tags, nanoblogger can create RSS on its own, and for the extra tags I thought that Python would be a good idea.  Python is nice in that it lets you write something quick and sloppy without actually looking too sloppy (I’m looking at you, perl), and it’s supposed to be to handle XML pretty well.  Maybe it can, but I certainly haven’t figured out how.

There are basically two ways to parse XML: the SAX method, a low-level, procedural technique that requres a maze of callback functions to examine the document as it’s parsed, and DOM, an object-oriented method that works with a completely parsed document.  Along with the XML format itself, the W3C also created the Document Object Model, a standard for accessing and manipulating a parsed XML tree, and like most W3C standards, DOM is awful.  It caters to the lowest common denominator of languages (C), and most XML parsers try to implement DOM with standards-compliance in mind, turning what should be a high-level language into an awful, procedural mess.

XML’s verbosity becomes even more boggling once parsed.  For an example, let’s take a look at a simple XML document.

<?xml version="1.0"?>
<root>
   <item>The text you actually want with maybe some <![CDATA[cdata segments]]> in it</item>
</root>

Let’s say you’re starting at the root node of the document, <root>.  What you’d want is a way to get to the text in the <item> child of <root>.  The CDATA part doesn’t matter—it has the same semantics as plain text; CDATA just changes the quoting rules, and the content of <item> is in effect everything that’s there with the <![CDATA[]]> stripped out.  But that’s not what you get.  In DOM terms, <root> contains three nodes: a text node with the linebreak and spaces between <root> and <item>, the actual <item> element, and another text node with the last linebreak before the closing </root>.  <item> also contains three elements: the text leading up to the CDATA section, the CDATA node, and then the text after it.  Given a strict DOM implementation in Python, the most python-y way that easily comes to mind for getting to the text would be something like:

reduce(lambda x, y: x + y.nodeValue, [''] + doc.documentElement.childNodes[1].childNodes)

That, of course, depends on the particulars of all that whitespace we don’t care about.  Now suppose <doc> contains several <item> elements, and perhaps some elements of other types.  You might try to make yourself a list of <item>s using something like

[node for node in doc.documentElement.childNodes if node.nodeName == 'item']

and even now we’re getting sloppy.  nodeName isn’t exactly the same as tagName; nodeName might pick up an unwisely named processing instruction, so we really ought to add a check that node is an Element, and we haven’t even started looking at namespaces.  Xpath offers a query language for getting at particular nodes with particular names and properties, but xpath will just return a NodeList object and leave you back at the beginning as far as getting to the content.

If you don’t see anything wrong with this article so far, you might want to stop reading now.  I am mad at you.

Python comes with a DOM implementation, xml.dom.minidom, for DOM Level 1, and it includes a specification for DOM Level 2—which is basically the same thing as far as node selection—that others can implement.  Pyxml provides DOM Level 2, and both it and minidom are fairly faithful implementations of the W3C standards.  XML in these systems is not an easily manipulated tree, but instead a forest of corner cases and finger-bending verbosity.  This is XML in Python.  Even freaking Javascript handles it better than this.

The best alternative I’ve been able to find is amara, Uche Ogbuji’s attempt to interpret XML in a python-friendly way.  It’s actually pretty nice.  For the document above, I could access the item node (again using “doc” as the parsed document object) using doc.root.item.  For a document with more than one <item>, the same code selects the first <item> node but can also be used as an array or an iterator.  As for the content, the node object implements __str__ sensibly, so just using the in a context that expects a string will provide the text content, CDATA and all.  It just about makes XML make some sense.

Compared to the trials of pyxml or the similarly low-level libxml2 bindings, my problems with amara seem almost trivial.  The first concerns namespaces, an issue that seems doomed to be awful in any implementation.  Google for “xpath default namespace” if you want some fun bedtime reading.  Amara ignores namespaces if you ignore them, which, since you can’t include a colon in a python property, usually works for the best.  The namespace URI is available as a property of the node objects, and, as an added bonus, the amara parser will load the document’s namespace prefixes for use in xpath expressions and serialization.  It also provides a means of specifying a set of namespace prefixes when parsing the document, but I’m not sure where these are actually used.  The extra prefixes seem to be available for xpath, but not for the names used when creating new elements, and serialization will still use whatever was in the original document unless overridden in the serialization function call.  So I guess my complaint here is that the API could stand some better documentation.  And prefixes in element creation would be nice, or at least nicer if it turns out they’re there and I just don’t understand how to use it.

A bigger complaint I have with amara concerns how it handles one of the nastier quirks of Python.  In Python there are two types of strings: the regular kind, and the unicode kind.  Usually this difference isn’t a problem; 'string' and u'string' seem like they would be the same thing, and usually they are.  Python’s idea of objects and types uses a concept known as “duck typing” (if it looks like a duck, and it walks like a duck…), which just means that object types don’t matter as much as the methods they implement.  For example, the str and unicode objects both implement the join() method, so an object of either type can be used in a context that expects join().  The problem with amara is that it requires every string—new element names, attribute names, node and attribute contents—to be a unicode type.  The especially annoying problem with amara is that it doesn’t fail to create nodes using regular strings, but it does fail to serialize nodes using regular strings.

What I really want out of python is about what amara is doing, something that can turn tag names into object names, convert attributes to and from the python dictionary type, and generally hide most of the nastier parts of XML while still exposing enough of it when needed, like the cdataSectionElements parameter in the serializer that I needed in order to make Wordpress not freak out when given unquoted post contents.  But I’d like something that behaves more intuitively for all cases, and, in a language that claims to be pretty alright for XML processing, I’d like XML methods better suited to the language itself built into the standard library.

Try it all over again

Posted by David on Dec 4th, 2008

It’s been a couple of years now, so I figured it was time for another change.

Until fairly recently, I ran this site from my own computer, usually the worst computer I had that could still boot, since anything better was being used for something else.  Besides the administrative headaches the MP part of LAMP was problematic for a machine with limited resources that had a tedency to overheat and crash.  Nanoblogger was an attractive choice when I moved back to HTTP, since it did all of the processing while publishing instead of serving.  Instead of a database it had a directory full of text files, and it created a new set of static HTML pages every time I wrote something.  I didn’t have to worry about overhead or any of the testing and debugging that comes with a web programming language.  There was no run-time to fail; either nanoblogger output new pages every time or it didn’t.

The main downside to nanoblogger is that it makes publishing slow.  Especially with the number of pages I was trying to squeeze through it, any particular blog post would take five or ten minutes to generate, whereas a dynamic content management system could have quickly inserted a few records into a database and regenerated the pages on the fly.  Nanoblogger didn’t react well to changes in categories or pages older than the newest 10, and rebuilding the whole site after some such major change took hours.  The other main downside to nanoblogger is that it’s not very well designed or maintained.  It has a plugin system of sorts, but it’s hard to use (bash isn’t the best language for doing anything interesting, for one), and there isn’t enough of a community to build an interesting library of nanoblogger extensions.  Nanoblogger is an interesting idea, but in practice it’s inconvenient and inflexible.

I don’t need to worry anymore about database administration or processing overhead because I pay someone else to worry about all that, and the hosting comes with a wordpress installer, so I figured I’d give it a shot.  I don’t know if it’s the best or easiest, but it seems to have enough of a community that I don’t need to care.

As for that community, one big snag I hit in the transition was in the choice of theme.  This version of Wordpress comes with two themes: the way it looked a couple of versions ago, which had the same sort of plain, slapped-together look I had with nanoblogger, but with different colors, and the way things look now, which uses a blue background behind a narrow white column of text.  The Wordpress default is part of the popular “fixed width” paradigm, which basically means that a web developer made some incorrect assumptions about the dimensions of your browser window.  Remember those pages back when companies were just starting to think they could make money off the Web, back when everyone had a geocities account; those pages that would say things like “Best viewed in Netscape 3.0 at 800×600”?  Remember how annoying that was twelve freaking years ago?  We’re doing the same thing all over again, it just takes an extra file to say it.

I don’t have a high opinion of CSS.  I understand why it exists and why we should use it, but I don’t think that separating layout from content can be done in something as unrestricted as HTML.  No one writes CSS for all HTML: it can’t be done.  You might be able to do something with fonts and colors, but what about all those <div>’s where your tables used to be, or the <span>’s that you used instead of <font>?  The average stylesheet only makes sense when paired with a particular template.  We haven’t improved anything, just changed the vocabulary and made the layout more difficult as tags turn into id and class selectors that never quite work the way they used to.  But CSS is what we’re stuck with.  I can understand when people take shortcuts to get something working instead of making it conform to what might be correct today, but seriously, we shouldn’t be specifying a width for the entire page in pixels anymore.  It’s like a single newspaper column all the way down the page, and I don’t even get to read Garfield at the end.

The theme I settled on was Zen in Grey, which I chose because it’s variable width and I think it looks mostly ok.  The CSS came broken, since the author apparently doesn’t use the calendar that was turned on by default, but a few extra paddings and marginses at least made all the boxes not overlap.  I don’t know how correct it is and I don’t really care.  I like this theme better than the other options I’ve seen, but I’m not terribly attached to it.  If anyone out there has the motivation to do something better, send me some files.  I’ll buy you lunch or something.