And now for something completely Pythonic...

Sphinx, past and future

written by Georg, on Tuesday, December 25, 2012 8:00.

[This entry was written on request for the Japanese "Sphinx advent calendar". These guys are amazing! They even held a Sphinx conference this year, with 70 people present.]

Sphinx is now more than five years old, and it is probably not unreasonable to take a quick look back at the history, as well as the future.

The project started sometime in early 2007; this post is the earliest mention I could find on the Python mailing lists. At the time, the source for the Python documentation was in LaTeX, which I absolutely love for writing scientific content, but I didn't (and don't) think suitable for code documentation: it is not an obvious markup language, and it deters contributors from the documentation -- although we always stressed we'd accept any form of text.

Also, the source was not readily available, and the mapping from a HTML page (which is where most users saw errors) to a file in the documentation source was not obvious. Don't even ask about the toolchain needed to convert the source to HTML (it involved a certain 4-letter language, which is not something a Python developer relishes in hacking in).

My code sought to change this, using the already somewhat established reStructuredText and its implementation "docutils" as the basis. reST is maybe not the prettiest markup language out there, but at the time (and even more today) it fulfilled several requirements:

  • lightweight markup (simple documents are readable without "disturbing" character noise)
  • easily extendable (both code-wise and markup-wise)
  • Python implementation available
  • a strong bond to Python (reST is specified as markup language in several PEPs)

With lots of "+1" for the new format (and the new "green" design, which was created by Armin), the new toolchain was accepted by python-dev, and in August I converted the whole Python docs. (By the way, that was in the middle of the initial Python 3000 work, which is another fascinating story...) Some features were still missing, and some took a long time to finally get implemented, such as chapter numbering.

Now that Python had a nice shiny new documentation, a lot of people asked "I CAN HAZ?". I had not considered that, mostly because the "old LaTeX system" was also available for the public, and nobody seemed to be using it. (In fact, thinking back, I don't recall what people actually were using. If you still remember, let me know!)

But with a system that was actually kind of Pythonic and made certain things easy, it seems I had hit a sweet spot. Quickly, the project named Sphinx was born. Today I wish I had found a better name that wasn't already taken; at the time, it was a play on the "Pyramid" system used for the python.org website. Apologies to CMU Sphinx and Sphinx search, both of which I've since used and which are great projects. (Amusingly, the latter also uses an "eye of Horus" logo. No, I didn't know about this then.)

But Sphinx it was, and it became popular so quickly I was completely surprised. You can look at the history of the "who uses it" page; the best thing was that a couple of big projects like Django, numpy and matplotlib jumped on the train quite fast.

Since the codebase was completely adapted to the needs of Python, it took quite a while at first to remove all specifics and hardcoded strings. But once that was done, I could advance quickly to implement features; most notably one feature that Python doesn't use, but most others do: autodoc. autodoc is a big deal since it represents what I consider a near-perfect match between automatically generated and hand-written documentation. Auto-generated things are usually ugly beasts that you can only make sense of if you already know a lot about the software you're reading about. Tutorials usually have no place there, since you wouldn't want to put them into your source code files. Hand-writing documentation is tedious and a mountainous job, but usually leads to docs that are easy to read and understand, and can include more prose than you're comfortable putting into your source files. That's why Python does it that way.

autodoc combines both in a fashion that I think makes sense: you hand-write the overall structure of your docs, write prose as necessary, and order the description of API items as it makes sense from a logical point of view, not an implementation point of view. Then you include documentation of the API items from docstrings.

Other than autodoc, lots of other features were added over the years. The highlights are, in no special order:

  • Linking between documentations with intersphinx
  • Including doctests and running them from Sphinx
  • HTML themes, with a number of themes now available
  • Media support, including mathematics and diagrams
  • Support for more output formats, such as Texinfo and Epub

Our milestones can be seen here. The "big" 1.0 release took place in 2010, with the addition of "domains" that extend the reach of Sphinx to languages other than Python. As can be expected, we have not taken over the whole world ... yet :)

Until recently, Sphinx has always been more or less a dictatorship. Not that there weren't lots of contributions! In 2010, Sphinx even had several Summer of Code students, working on features as diverse as Python 3 porting, internationalization and the web application interface. But in the end, I was overseeing all fixes and pull requests. And with my graduate studies beginning, I found myself with less and less time to master that mountain of work.

Therefore, this year we formed a team of developers working on the future of Sphinx. So far, the push privileges have been given to

  • Ervin Hegedüs
  • Jon Waltman
  • Kevin Hunter
  • La Min Ko
  • Nikolaj van Omme
  • Robert Lehmann
  • Roland Meister
  • Takayuki Shimizukawa

I'm very grateful to all of them. There is also a new sphinx-dev mailing list for development coordination, while the support and user list has moved to sphinx-users. Since the formation of the development team, a lot has moved and the 1.2 release with mostly bug fixes and a few new features is very near.

How do we go on? I consider the feature set quite complete, but there are always some things missing. Well, these are my and the team's thoughts about the future development:

  • Our version of fully automatic doc generation, sphinx-apidoc, is not very smart yet, since I've never been a fan of complete automation. But I recognize that others are interested in that.
  • Autodoc should become more debuggable. For example, the intermediate generated reST should become accessible.
  • The internationalization feature is not widely used yet, mostly because there are still some warts in the implementation. We plan to fix these, and make a good example by starting to use it for Sphinx itself, with documentation translated to as many languages as possible.
  • The web application support should become easier to use, to e.g. have easy inclusion of comments and suggestions from users.
  • The docutils have grown support for many things pioneered in Sphinx, such as mathematics and code highlighting. I would like to merge these features, which at the moment use incompatible markup.

That's it for now. Merry Christmas!

New Themes in Sphinx 1.0

written by Georg, on Saturday, January 9, 2010 9:42.

Since theming support was introduced in Sphinx 0.6, I’ve seen a few good ones that I’ve subsequently added to the core, and that will be part of Sphinx 1.0.

These themes are (click for larger images):

scrolls, designed by Armin Ronacher, used for Jinja:


agogo, designed by Andi Albrecht, used for sqlparse:


nature, designed by Martin Mahner, used for pip:


haiku, originally designed for the Haiku OS user guide:


My thanks go to all theme designers. I hope that this new selection will lead to increased diversity in Sphinx documentation, and to the development of many new themes in the future!

Mercurial codesmell extension

written by Georg, on Friday, January 8, 2010 13:42.

I know it’s bad behavior, but I’m simply too lazy to do a hg diff before commit, so quite often I committed debugging statements like import pdb; pdb.set_trace() or print foo, left in some module by accident.

Thankfully I’m using Mercurial for most projects now, so it was easy (and fun) to hack up a little extension to “fix” this.

The extension is called hgcodesmell and can be found, as always, on BitBucket. It currently asks you for confirmation when it recognizes any of these smelly changes:

  • Debugging helpers left in Python code:
    • print statements
    • pdb.set_trace()
    • 1/0
  • vim :q and similar ex commands leaking into the source
  • Windows newlines (on non-Windows platforms)

The patterns to recognize are in a simple dictionary mapping from file name glob patterns to a list of (regex, reason) pairs that are checked against all added lines in files that match the respective pattern:

# smelly patterns are tuples (regex, reason)
print_stmt = (re.compile(r'^\+\s*print\b'), 'print statement')
zero_div = (re.compile(r'^\+\s*1/0'), 'zero division error')
set_trace = (re.compile(r'\bpdb\.set_trace\(\)'), 'set_trace')
vim_cmd = (re.compile(r':(w|wq|q|x)$', re.M), 'vim exit command')

# the master dict maps glob patterns to a list of smelly patterns
SMELLY_STUFF = {
    '*.py': [print_stmt, zero_div, set_trace],
    '*': [vim_cmd],
}

It has since saved me quite a few hg rollbacks or worse, commits to fix such stuff, and I think you might find it useful too – get it from here, and please send me any new patterns that are useful for others too!

Status of Sphinx

written by Georg, on Thursday, January 7, 2010 21:32.

Reading several reactions to Tarek’s New Year meme suggestion, I saw Sphinx in the first answer of quite a few of them, and that made me very happy. Thanks to you all!

As a result, I feel I should give a short summary of what is up next for Sphinx.

Although it may have seemed that way before Christmas: Unlike Armin, I am not going to switch away from my Python development (I’ve also volunteered for the post of release manager for Python 3.2.x), although it has been slowed by the diploma thesis quite a bit, so progress is not always as fast as I hoped it could be.

These are the next milestones:

  • A bugfix release 0.6.4 (with a compatibility fix for the recently released Pygments 1.2)
  • A 1.0 beta release containing the domains work
  • Bringing the sphinx-web Summer of Code branch up to date and include it at some point

I’ll also write a short post about other news in 1.0 soon.

New in Sphinx 1.0: Domains

written by Georg, on Saturday, September 12, 2009 21:21.

The time I’m most productive working on my projects is when I should really be doing other important stuff. There’s no better incentive than “at least do something productive” in that case…

So I’ll write a bit about an exciting new feature of Sphinx 1.0, the upcoming release (no, don’t hold your breath).

As you may know, Sphinx started out as a “proprietary” tool used exclusively for building the Python documentation after the switch to reStructuredText for markup. When I decided to make it available for public use, I had to remove quite a lot of Python-specific stuff (and I keep finding small such items even today). But still, Sphinx is very centered on documentation of Python code: Its most fundamental directives, like the class directive, document a Python object. The naming and argument parsing is Python-specific, there is a module index of all Python modules you documented, you can link to other Python objects via intersphinx, etc. But there was already support for documenting C stuff (since the Python/C API was also part of the original documentation), and I know people also document C++. Even the Sphinx docs themselves abuse the Python object markup for HTML template elements.

Exactly that class directive was what triggered a step away from the Python-centric viewpoint: in my naive happiness about how neat .. class:: Foo looks, I overlooked docutils’ own directive of that name. Later I tried to correct this, by exporting the latter as cssclass which doesn’t do it justice though. The thread on the mailing list brought up the possibility to “namespace” directives, and that was the starting point. Finally, at EuroPython, Armin and myself came up with the concept of “domains”.

A domain is a collection of markup to describe and link to “objects” belonging together, e.g. elements of a programming language. Directive and role names are of the form domain:name.

Of course, there is a way to set a “default domain” so that you can continue writing .. function:: foo() instead of the more verbose .. py:function:: foo(). (The name “domain” is awkward, but “language” is already taken for “the language the docs are written in” and anyway doesn’t cover the full possible range of domain usage. Other terms like “group” or “context” don’t feel better.)

Anyway, when domain support is fully implemented, and assuming someone writes a JavaScript domain extension, you will be able to document a JavaScript project (or a mixed project) with Sphinx just as comfortably as you document a Python project right now. Domain-specific objects and rules are used for

  • obviously, directives to document objects
  • resolving references to these objects via roles
  • the search function (it currently lists matching Python objects first)
  • the inventory for intersphinx
  • the global module index (I’m still unsure how to do that, though)
Extensibility will allow adding directives/roles to existing domains, subclassing existing domains to change behavior, and adding completely new domains.

Extension authors will loathe me for this, because I not only broke lots of (internal) interfaces, but also took the opportunity to refactor a few other spots that always bugged me. I do however offer help porting extensions to anyone, once the release is out. I really think this step was necessary for Sphinx to look ahead into a bright future :-)

Please, comment or (preferably) write to sphinx-dev with any wishes, comments or naming suggestions you have. Ah yes, the code is at a BitBucket branch.

Interpolation surprise

written by Georg, on Saturday, September 12, 2009 20:08.

Every moderately proficient Python programmer probably knows that these two snippets are equivalent:
"%s" % (obj,)
"%s" % str(obj)
(The 1-tuple in the first case is to make this equivalent even for tuples.)

Now, what about this:

"%s %s" % (a, b)
"%s %s" % (str(a), str(b))
Surely, these are equivalent as well?

It turns out that they aren’t! Given this code:

class Surprise(object):
    def __str__(self):
        return "[str]"
    def __unicode__(self):
        return u"[unicode]"

surprise = Surprise()

print "%s %s" % ("foo", surprise)
print "%s %s" % (u"foo", surprise)
I think it should by now by apparent what the surprise is :)

In short, as soon as a Unicode item has been interpolated into a bytestring (which makes the result a Unicode string), further items are automatically use Unicode conversion, given by __unicode__(), if available. This came up recently on the docutils mailing list, where a test failure was triggered because Python 2.6′s IOError has a buggy __unicode__() implementation.

Blog changeover notice

written by Georg, on Friday, June 26, 2009 20:54.

This is just a short notice that I'll be moving this blog in the next few days to a new home at pythonic.pocoo.org, finally using the pocoo-grown Zine as the blogging engine.

Finding objects' names

written by Georg, on Saturday, May 30, 2009 11:34.

Every now and then, Python newbies get into a situation where they think they need to know the “name” of an object, i.e. the name it is bound to. The canonical answer is that there is no such thing as a one-on-one mapping of objects and names; an object doesn’t need a name to be alive, only a reference to it, and a name can refer to many objects, depending which namespace you look at.

Still, Python’s introspection capabilities are powerful enough that it is possible to find out what names an object is bound to, which this snippet shows:

import gc, sys

def find_names(obj):
    frame = sys._getframe()
    for frame in iter(lambda: frame.f_back, None):
        frame.f_locals
    result = []
    for referrer in gc.get_referrers(obj):
        if isinstance(referrer, dict):
            for k, v in referrer.iteritems():
                if v is obj:
                    result.append(k)
    return result

foo = []

def demo():
    bar = foo
    print find_names(bar)

demo()
This prints [’foo’, ‘bar’].

The second part of the find_names function is straightforward once you know about the gc module; Python’s GC implementation makes it possible to query for all objects that refer (own a reference to) another object. This can be used to find all namespaces (which are just Python dictionaries) that reference our given object.

The first part is trickier: if you leave out the first three lines of the function, you will only get [’foo’] as the script’s output. It seems that there is no dictionary with “bar” referring to our object. What happens there?

Well, since the names of all locals are known at compile-time, Python “optimizes” function locals, putting them into an array instead of a dictionary, which speeds up access tremendously. But there is a way to get at the locals in a dictionary, namely the locals() function. There must be some way to get Python to create it for us! This is best done accessing the frame object’s f_locals attribute which creates and caches this dictionary, and that can be used to create dict references to all locals if the whole chain of currently executed frames is traversed, as the first for loop does.

Not that this function is useful for anything ;-)

Expert Python Programming - a review

written by Georg, on Thursday, March 19, 2009 19:02.

Like many others, I've been asked to review "Expert Python Programming" by Tarek Ziadé, in exchange for a free copy. I've done so gladly, since I had already proofread the chapter devoted to documentation and especially Sphinx, before the release, and the overall goals and style of the book looked very nice to me.

The author was obviously very qualified to write the book, being a long-time Python developer and nowadays regular contributor to the Python project -- he took over Distutils maintenance a while ago (and it needed a maintainer badly).

After reading the whole thing, I can only recommend the book to anyone who knows basic Python lore, and is eager to learn "how the wizards do it". The material covered is a whole lot, but it is presented in a clear way so that you remember the important things right away, and keep all others at the back of your head, ready to spring forward when the time is right. Best of all, it makes you enjoy coding and using the language and tools, by making the most of them.

A unique aspect of the book is that "programming" in the title doesn't stand for "coding" alone, but that the whole development process is illuminated, and you can see best practices in every step that makes a successful overall programming project, be it an open-source one or a commercial one: code, version control, documentation, testing and optimizing, deployment, ...

The part about advanced coding practices is about what I had expected. Many of those little corners of Python that make it so powerful yet enjoyable are covered: the whole story about iterators, decorators, class internals, to mention a few.

It has been said by others that the lack of 3.0 coverage is a minus for this book. However, since 2.x is going to stick around for a long time, which is easy to see by the number of ported open-source projects, let alone commercial ones where money spent on porting is certainly not an unlimited resource, it is not at all detrimental, and I dare say that when you've read this book you're ready to apply the principles to 3.0-based programming easily :-) That said, a future edition with a few of Python 3.x's many additions would be nice.

What made me especially proud is that, as said before, Sphinx gets mentioned as one possibility to write good documentation for a project. I'm sure the share of Sphinx users that were prompted to try Sphinx out by this book isn't too small. (And, since the version portrayed in the book was still 0.1.x, they will be delighted by what has been improved since then.)

So: my recommendation goes to beginners wanting to know more, as well as interested people who already are experts in other programming languages and want to know "how it's done in Python".

Imports in functions? You sure about that?

written by Georg, on Wednesday, March 4, 2009 17:43.

First, we have to be glad that Python has such a flexible import system, allowing imports everywhere in the code. For some applications, that's quite nice, such as not requiring optional modules or loading them on demand to boost performance in the common case.

However, if you combine that with threading you can get bitten: Python has an import lock, which lets only one thread at a time import a module; that's pretty essential because of the large amount of shared globals involved in importing (think sys.modules, sys.path_importer_cache and friends). And the import lock is even acquired when the requested module is already loaded (i.e. present in sys.modules).

A while ago, I wrote a server application that, on startup, starts a thread that imports modules, and one of these imports has a side-effect that takes quite a long time. (Which is not good practice but required by the environment it runs in.) And curiously, during that startup, no client could connect. What happened?

A bit of debugging showed: the server is a SocketServer.ThreadingTCPServer subclass, and its process_request method calls import threading. This was fixed easily by overwriting that method.

After adding and changing some code, I noticed the problem was there again. And again, I found an import, this time in Queue.__init__. (It imports threading as well.) While the local import in the SocketServer.ThreadingMixIn makes some sense, this one does not, at least to me, since after all, if you import the Queue module you intend to use it, and therefore to instantiate a Queue.

My hackish workaround was to pre-create a freelist of queues and use them in my handler ;)