Sphinx, past and future
Sphinx is now more than five years old, and it is probably not unreasonable to take a quick look back at the history, as well as the future.
The project started sometime in early 2007; this post is the earliest mention I could find on the Python mailing lists. At the time, the source for the Python documentation was in LaTeX, which I absolutely love for writing scientific content, but I didn't (and don't) think suitable for code documentation: it is not an obvious markup language, and it deters contributors from the documentation -- although we always stressed we'd accept any form of text.
Also, the source was not readily available, and the mapping from a HTML page (which is where most users saw errors) to a file in the documentation source was not obvious. Don't even ask about the toolchain needed to convert the source to HTML (it involved a certain 4-letter language, which is not something a Python developer relishes in hacking in).
My code sought to change this, using the already somewhat established reStructuredText and its implementation "docutils" as the basis. reST is maybe not the prettiest markup language out there, but at the time (and even more today) it fulfilled several requirements:
- lightweight markup (simple documents are readable without "disturbing" character noise)
- easily extendable (both code-wise and markup-wise)
- Python implementation available
- a strong bond to Python (reST is specified as markup language in several PEPs)
With lots of "+1" for the new format (and the new "green" design, which was created by Armin), the new toolchain was accepted by python-dev, and in August I converted the whole Python docs. (By the way, that was in the middle of the initial Python 3000 work, which is another fascinating story...) Some features were still missing, and some took a long time to finally get implemented, such as chapter numbering.
Now that Python had a nice shiny new documentation, a lot of people asked "I CAN HAZ?". I had not considered that, mostly because the "old LaTeX system" was also available for the public, and nobody seemed to be using it. (In fact, thinking back, I don't recall what people actually were using. If you still remember, let me know!)
But with a system that was actually kind of Pythonic and made certain things easy, it seems I had hit a sweet spot. Quickly, the project named Sphinx was born. Today I wish I had found a better name that wasn't already taken; at the time, it was a play on the "Pyramid" system used for the python.org website. Apologies to CMU Sphinx and Sphinx search, both of which I've since used and which are great projects. (Amusingly, the latter also uses an "eye of Horus" logo. No, I didn't know about this then.)
But Sphinx it was, and it became popular so quickly I was completely surprised. You can look at the history of the "who uses it" page; the best thing was that a couple of big projects like Django, numpy and matplotlib jumped on the train quite fast.
Since the codebase was completely adapted to the needs of Python, it took quite a while at first to remove all specifics and hardcoded strings. But once that was done, I could advance quickly to implement features; most notably one feature that Python doesn't use, but most others do: autodoc. autodoc is a big deal since it represents what I consider a near-perfect match between automatically generated and hand-written documentation. Auto-generated things are usually ugly beasts that you can only make sense of if you already know a lot about the software you're reading about. Tutorials usually have no place there, since you wouldn't want to put them into your source code files. Hand-writing documentation is tedious and a mountainous job, but usually leads to docs that are easy to read and understand, and can include more prose than you're comfortable putting into your source files. That's why Python does it that way.
autodoc combines both in a fashion that I think makes sense: you hand-write the overall structure of your docs, write prose as necessary, and order the description of API items as it makes sense from a logical point of view, not an implementation point of view. Then you include documentation of the API items from docstrings.
Other than autodoc, lots of other features were added over the years. The highlights are, in no special order:
- Linking between documentations with intersphinx
- Including doctests and running them from Sphinx
- HTML themes, with a number of themes now available
- Media support, including mathematics and diagrams
- Support for more output formats, such as Texinfo and Epub
Our milestones can be seen here. The "big" 1.0 release took place in 2010, with the addition of "domains" that extend the reach of Sphinx to languages other than Python. As can be expected, we have not taken over the whole world ... yet :)
Until recently, Sphinx has always been more or less a dictatorship. Not that there weren't lots of contributions! In 2010, Sphinx even had several Summer of Code students, working on features as diverse as Python 3 porting, internationalization and the web application interface. But in the end, I was overseeing all fixes and pull requests. And with my graduate studies beginning, I found myself with less and less time to master that mountain of work.
Therefore, this year we formed a team of developers working on the future of Sphinx. So far, the push privileges have been given to
- Ervin Hegedüs
- Jon Waltman
- Kevin Hunter
- La Min Ko
- Nikolaj van Omme
- Robert Lehmann
- Roland Meister
- Takayuki Shimizukawa
I'm very grateful to all of them. There is also a new sphinx-dev mailing list for development coordination, while the support and user list has moved to sphinx-users. Since the formation of the development team, a lot has moved and the 1.2 release with mostly bug fixes and a few new features is very near.
How do we go on? I consider the feature set quite complete, but there are always some things missing. Well, these are my and the team's thoughts about the future development:
- Our version of fully automatic doc generation, sphinx-apidoc, is not very smart yet, since I've never been a fan of complete automation. But I recognize that others are interested in that.
- Autodoc should become more debuggable. For example, the intermediate generated reST should become accessible.
- The internationalization feature is not widely used yet, mostly because there are still some warts in the implementation. We plan to fix these, and make a good example by starting to use it for Sphinx itself, with documentation translated to as many languages as possible.
- The web application support should become easier to use, to e.g. have easy inclusion of comments and suggestions from users.
- The docutils have grown support for many things pioneered in Sphinx, such as mathematics and code highlighting. I would like to merge these features, which at the moment use incompatible markup.
That's it for now. Merry Christmas!



