And now for something completely Pythonic...

New Themes in Sphinx 1.0

written by Georg, on Saturday, January 9, 2010 9:42.

Since theming support was introduced in Sphinx 0.6, I’ve seen a few good ones that I’ve subsequently added to the core, and that will be part of Sphinx 1.0.

These themes are (click for larger images):

scrolls, designed by Armin Ronacher, used for Jinja:


agogo, designed by Andi Albrecht, used for sqlparse:


nature, designed by Martin Mahner, used for pip:


haiku, originally designed for the Haiku OS user guide:


My thanks go to all theme designers. I hope that this new selection will lead to increased diversity in Sphinx documentation, and to the development of many new themes in the future!

Mercurial codesmell extension

written by Georg, on Friday, January 8, 2010 13:42.

I know it’s bad behavior, but I’m simply too lazy to do a hg diff before commit, so quite often I committed debugging statements like import pdb; pdb.set_trace() or print foo, left in some module by accident.

Thankfully I’m using Mercurial for most projects now, so it was easy (and fun) to hack up a little extension to “fix” this.

The extension is called hgcodesmell and can be found, as always, on BitBucket. It currently asks you for confirmation when it recognizes any of these smelly changes:

  • Debugging helpers left in Python code:
    • print statements
    • pdb.set_trace()
    • 1/0
  • vim :q and similar ex commands leaking into the source
  • Windows newlines (on non-Windows platforms)

The patterns to recognize are in a simple dictionary mapping from file name glob patterns to a list of (regex, reason) pairs that are checked against all added lines in files that match the respective pattern:

# smelly patterns are tuples (regex, reason)
print_stmt = (re.compile(r'^\+\s*print\b'), 'print statement')
zero_div = (re.compile(r'^\+\s*1/0'), 'zero division error')
set_trace = (re.compile(r'\bpdb\.set_trace\(\)'), 'set_trace')
vim_cmd = (re.compile(r':(w|wq|q|x)$', re.M), 'vim exit command')

# the master dict maps glob patterns to a list of smelly patterns
SMELLY_STUFF = {
    '*.py': [print_stmt, zero_div, set_trace],
    '*': [vim_cmd],
}

It has since saved me quite a few hg rollbacks or worse, commits to fix such stuff, and I think you might find it useful too – get it from here, and please send me any new patterns that are useful for others too!

Status of Sphinx

written by Georg, on Thursday, January 7, 2010 21:32.

Reading several reactions to Tarek’s New Year meme suggestion, I saw Sphinx in the first answer of quite a few of them, and that made me very happy. Thanks to you all!

As a result, I feel I should give a short summary of what is up next for Sphinx.

Although it may have seemed that way before Christmas: Unlike Armin, I am not going to switch away from my Python development (I’ve also volunteered for the post of release manager for Python 3.2.x), although it has been slowed by the diploma thesis quite a bit, so progress is not always as fast as I hoped it could be.

These are the next milestones:

  • A bugfix release 0.6.4 (with a compatibility fix for the recently released Pygments 1.2)
  • A 1.0 beta release containing the domains work
  • Bringing the sphinx-web Summer of Code branch up to date and include it at some point

I’ll also write a short post about other news in 1.0 soon.

New in Sphinx 1.0: Domains

written by Georg, on Saturday, September 12, 2009 21:21.

The time I’m most productive working on my projects is when I should really be doing other important stuff. There’s no better incentive than “at least do something productive” in that case…

So I’ll write a bit about an exciting new feature of Sphinx 1.0, the upcoming release (no, don’t hold your breath).

As you may know, Sphinx started out as a “proprietary” tool used exclusively for building the Python documentation after the switch to reStructuredText for markup. When I decided to make it available for public use, I had to remove quite a lot of Python-specific stuff (and I keep finding small such items even today). But still, Sphinx is very centered on documentation of Python code: Its most fundamental directives, like the class directive, document a Python object. The naming and argument parsing is Python-specific, there is a module index of all Python modules you documented, you can link to other Python objects via intersphinx, etc. But there was already support for documenting C stuff (since the Python/C API was also part of the original documentation), and I know people also document C++. Even the Sphinx docs themselves abuse the Python object markup for HTML template elements.

Exactly that class directive was what triggered a step away from the Python-centric viewpoint: in my naive happiness about how neat .. class:: Foo looks, I overlooked docutils’ own directive of that name. Later I tried to correct this, by exporting the latter as cssclass which doesn’t do it justice though. The thread on the mailing list brought up the possibility to “namespace” directives, and that was the starting point. Finally, at EuroPython, Armin and myself came up with the concept of “domains”.

A domain is a collection of markup to describe and link to “objects” belonging together, e.g. elements of a programming language. Directive and role names are of the form domain:name.

Of course, there is a way to set a “default domain” so that you can continue writing .. function:: foo() instead of the more verbose .. py:function:: foo(). (The name “domain” is awkward, but “language” is already taken for “the language the docs are written in” and anyway doesn’t cover the full possible range of domain usage. Other terms like “group” or “context” don’t feel better.)

Anyway, when domain support is fully implemented, and assuming someone writes a JavaScript domain extension, you will be able to document a JavaScript project (or a mixed project) with Sphinx just as comfortably as you document a Python project right now. Domain-specific objects and rules are used for

  • obviously, directives to document objects
  • resolving references to these objects via roles
  • the search function (it currently lists matching Python objects first)
  • the inventory for intersphinx
  • the global module index (I’m still unsure how to do that, though)
Extensibility will allow adding directives/roles to existing domains, subclassing existing domains to change behavior, and adding completely new domains.

Extension authors will loathe me for this, because I not only broke lots of (internal) interfaces, but also took the opportunity to refactor a few other spots that always bugged me. I do however offer help porting extensions to anyone, once the release is out. I really think this step was necessary for Sphinx to look ahead into a bright future :-)

Please, comment or (preferably) write to sphinx-dev with any wishes, comments or naming suggestions you have. Ah yes, the code is at a BitBucket branch.

Interpolation surprise

written by Georg, on Saturday, September 12, 2009 20:08.

Every moderately proficient Python programmer probably knows that these two snippets are equivalent:
"%s" % (obj,)
"%s" % str(obj)
(The 1-tuple in the first case is to make this equivalent even for tuples.)

Now, what about this:

"%s %s" % (a, b)
"%s %s" % (str(a), str(b))
Surely, these are equivalent as well?

It turns out that they aren’t! Given this code:

class Surprise(object):
    def __str__(self):
        return "[str]"
    def __unicode__(self):
        return u"[unicode]"

surprise = Surprise()

print "%s %s" % ("foo", surprise)
print "%s %s" % (u"foo", surprise)
I think it should by now by apparent what the surprise is :)

In short, as soon as a Unicode item has been interpolated into a bytestring (which makes the result a Unicode string), further items are automatically use Unicode conversion, given by __unicode__(), if available. This came up recently on the docutils mailing list, where a test failure was triggered because Python 2.6′s IOError has a buggy __unicode__() implementation.

Blog changeover notice

written by Georg, on Friday, June 26, 2009 20:54.

This is just a short notice that I'll be moving this blog in the next few days to a new home at pythonic.pocoo.org, finally using the pocoo-grown Zine as the blogging engine.

Finding objects' names

written by Georg, on Saturday, May 30, 2009 11:34.

Every now and then, Python newbies get into a situation where they think they need to know the “name” of an object, i.e. the name it is bound to. The canonical answer is that there is no such thing as a one-on-one mapping of objects and names; an object doesn’t need a name to be alive, only a reference to it, and a name can refer to many objects, depending which namespace you look at.

Still, Python’s introspection capabilities are powerful enough that it is possible to find out what names an object is bound to, which this snippet shows:

import gc, sys

def find_names(obj):
    frame = sys._getframe()
    for frame in iter(lambda: frame.f_back, None):
        frame.f_locals
    result = []
    for referrer in gc.get_referrers(obj):
        if isinstance(referrer, dict):
            for k, v in referrer.iteritems():
                if v is obj:
                    result.append(k)
    return result

foo = []

def demo():
    bar = foo
    print find_names(bar)

demo()
This prints [’foo’, ‘bar’].

The second part of the find_names function is straightforward once you know about the gc module; Python’s GC implementation makes it possible to query for all objects that refer (own a reference to) another object. This can be used to find all namespaces (which are just Python dictionaries) that reference our given object.

The first part is trickier: if you leave out the first three lines of the function, you will only get [’foo’] as the script’s output. It seems that there is no dictionary with “bar” referring to our object. What happens there?

Well, since the names of all locals are known at compile-time, Python “optimizes” function locals, putting them into an array instead of a dictionary, which speeds up access tremendously. But there is a way to get at the locals in a dictionary, namely the locals() function. There must be some way to get Python to create it for us! This is best done accessing the frame object’s f_locals attribute which creates and caches this dictionary, and that can be used to create dict references to all locals if the whole chain of currently executed frames is traversed, as the first for loop does.

Not that this function is useful for anything ;-)

Expert Python Programming - a review

written by Georg, on Thursday, March 19, 2009 19:02.

Like many others, I've been asked to review "Expert Python Programming" by Tarek Ziadé, in exchange for a free copy. I've done so gladly, since I had already proofread the chapter devoted to documentation and especially Sphinx, before the release, and the overall goals and style of the book looked very nice to me.

The author was obviously very qualified to write the book, being a long-time Python developer and nowadays regular contributor to the Python project -- he took over Distutils maintenance a while ago (and it needed a maintainer badly).

After reading the whole thing, I can only recommend the book to anyone who knows basic Python lore, and is eager to learn "how the wizards do it". The material covered is a whole lot, but it is presented in a clear way so that you remember the important things right away, and keep all others at the back of your head, ready to spring forward when the time is right. Best of all, it makes you enjoy coding and using the language and tools, by making the most of them.

A unique aspect of the book is that "programming" in the title doesn't stand for "coding" alone, but that the whole development process is illuminated, and you can see best practices in every step that makes a successful overall programming project, be it an open-source one or a commercial one: code, version control, documentation, testing and optimizing, deployment, ...

The part about advanced coding practices is about what I had expected. Many of those little corners of Python that make it so powerful yet enjoyable are covered: the whole story about iterators, decorators, class internals, to mention a few.

It has been said by others that the lack of 3.0 coverage is a minus for this book. However, since 2.x is going to stick around for a long time, which is easy to see by the number of ported open-source projects, let alone commercial ones where money spent on porting is certainly not an unlimited resource, it is not at all detrimental, and I dare say that when you've read this book you're ready to apply the principles to 3.0-based programming easily :-) That said, a future edition with a few of Python 3.x's many additions would be nice.

What made me especially proud is that, as said before, Sphinx gets mentioned as one possibility to write good documentation for a project. I'm sure the share of Sphinx users that were prompted to try Sphinx out by this book isn't too small. (And, since the version portrayed in the book was still 0.1.x, they will be delighted by what has been improved since then.)

So: my recommendation goes to beginners wanting to know more, as well as interested people who already are experts in other programming languages and want to know "how it's done in Python".

Imports in functions? You sure about that?

written by Georg, on Wednesday, March 4, 2009 17:43.

First, we have to be glad that Python has such a flexible import system, allowing imports everywhere in the code. For some applications, that's quite nice, such as not requiring optional modules or loading them on demand to boost performance in the common case.

However, if you combine that with threading you can get bitten: Python has an import lock, which lets only one thread at a time import a module; that's pretty essential because of the large amount of shared globals involved in importing (think sys.modules, sys.path_importer_cache and friends). And the import lock is even acquired when the requested module is already loaded (i.e. present in sys.modules).

A while ago, I wrote a server application that, on startup, starts a thread that imports modules, and one of these imports has a side-effect that takes quite a long time. (Which is not good practice but required by the environment it runs in.) And curiously, during that startup, no client could connect. What happened?

A bit of debugging showed: the server is a SocketServer.ThreadingTCPServer subclass, and its process_request method calls import threading. This was fixed easily by overwriting that method.

After adding and changing some code, I noticed the problem was there again. And again, I found an import, this time in Queue.__init__. (It imports threading as well.) While the local import in the SocketServer.ThreadingMixIn makes some sense, this one does not, at least to me, since after all, if you import the Queue module you intend to use it, and therefore to instantiate a Queue.

My hackish workaround was to pre-create a freelist of queues and use them in my handler ;)

Fun with SCIgen

written by Georg, on Wednesday, January 28, 2009 20:58.

Probably everyone has heard of the automatic Computer Science paper generator written by three clever MIT students. Well, since I had a bit of time to spare (but was too tired for serious work) I adapted it for physics, especially solid state physics and neutron scattering.

The repository is public, and here is an example paper.

Oh yes, it's written in Perl. If I have a bit more time to spare, I'll perhaps translate it into Python. For now, I mostly adapted the content it generates, added a bit more math, stuff like that.

The amazing thing is how easy it was to adapt -- their "grammar" for papers is really easy to get into (and it doesn't have dollar signs!). Let us thank god no Java is involved, else it'd be one gigantic XML file, very handy to edit. Have a look at the rule file -- it should be clear within a few minutes how it works.

NB: if anyone doubted that English was the right choice for the academic language, that this works is a very compelling argument, isn't it?