In Defense of PyPI

Everyone on the Python Planet is probably already familiar with Peter Fein’s recent article about PyPI use (or lack thereof). But in case not, particularly striking was the number of folks who joined the “PyPI bashing” in the comments. In fact, it has inspired me to write this post “in defense of PyPI”. I would like to offer the Python community a summary of what I think are the general criticisms, along with my responses as a “sysadmin / developer type”.

First let me say this: I love PyPI! And I agree with Peter, if your package isn’t on PyPI it  “doesn’t exist”. I wouldn’t put it quite like that; but I would say it’s fairly important if you are publishing open source Python code, to consider uploading it to the Python Package Index.

Why?

Because Everybody Wins

Believe it or not, the general Python community is interested in seeing your code. Whether to use it for an example, or to avoid reinventing the wheel, or whatever the reason; we’d like a chance to see your code. But if you don’t publish it to PyPI, we may never get that chance!

For better or worse, PyPI is the canonical place on Earth for Python packages. It’s the CPAN of Python. I understand that not everyone is 100% comfortable with this, but that doesn’t make it any less true. If you accept that “open source is good”, and that “Python rules”, then you simply must take this next leap of faith: “PyPI is the place for Python packages”.

[waves hand]

Moving on, why else should you consider uploading your packages to PyPI?

Because It Is The “Right” Thing To Do

Another thing that struck me is the number of folks who (appear to) confuse “version control” with “distribution”. If I’m not mistaken, Launchpad, Github, and Bitbucket are primarily designed for Bazaar, Git, and Mercurial hosting respectively. These sites can host your distribution tarballs, but they certainly weren’t designed and built to do so. Rather, they were designed and built to host your source code.

In some cases, a project may wish to host it’s own distribution server. Whether it be for redundancy (although PyPI has begun to tackle this) or “branding” or other reasons, I would argue this is the preferred way of handling it: in addition to uploading to PyPI, not in place of it.

Why?

Because It Is Not That Hard

Ahem… we get it. The situation with easy_install is “less than ideal”. But this is something to be fixed, not avoided. If you are receiving too many support requests, may I suggest simply telling people not to use easy_install. Or, if the problem is proper packaging, learn how to test your packages before uploading them. Due to the large number of screwed up releases I’ve made, I’ve come to rely on a local PyPI and a virtualenv to test installations. Others use even simpler methods. And with tools like mkrelease, it’s easy to upload your package to multiple PyPI locations with just a single command (although leaping-tall-buildings-in-a single-bound is not yet supported.)

The point is, please consider helping the community fix the problem rather than simply avoiding it. There are folks actively trying to improve the situation right now.

Let’s see, what else?

Because It Does Not Have To Be Perfect

Over the years I’ve seen various and sundry criticisms of the PyPI user interface. Fine. I have not looked into the current development process, but I assume the author/maintainers would be open to some constructive criticism and/or development assistance.

It doesn’t have to be Github-sexy to be useful. If you would like to report a bug or feature request, do it here (at least, I think that is the right place.)

Conclusion

I hope this convinces at least some folks to consider uploading their packages to PyPI. If it doesn’t, please let me know why in the comments.

Did you enjoy reading this article? If so, please consider helping me help Plone.

  • http://dholth.pip.verisignlabs.com/ Daniel

    I liked your article, but I think the fundamental misunderstanding comes from a disconnect between programmers who have hundreds of dependencies, and those who don’t.

    Those who don’t are in the privileged position of being able to evaluate, download, update, and debug each of their four or five dependencies, writing the rest from scratch. The probability that something will break after ‘upgrade all’ is low, only an occasional nuisance. They live in an illusory world where properly written packages ensure backwards compatibility and upgrades are easy.

    Those of us who are familiar with more complex systems live in a wholly different, more chaotic world. When you are building complex systems out of many components it is simply not feasible to write or even manually download each dependency. The probability that something will break during an upgrade, either due to bugs or unexpected interactions between packages, approaches 1. In this world it is critical to pin every version, test the combination, isolate this set of software from the rest of the system and protect it from unexpected upgrades. From this random universe large, useful software is born, but you need unpleasant things like Apache Maven (for Java) or your own local copies of everything you need out of pypi.

    I keep running across this disconnect. I wish I knew a better way to make the former group believe in the latter group’s problems, but I don’t know any better way than exposure.

  • Anonymous

    Thanks for the feedback, Daniel! I’m not sure I understand though, what does “product complexity” (i.e. software comprised of many Python packages) have to do with the individual package authors either choosing to upload to PyPI or not? In the case of the Plone project, a large number of packages are routinely uploaded to PyPI (with each release) and complexity is managed “downstream” (i.e. by the “product” authors). That means that with the Plone project, you’ll see things like a versions.cfg file published with each release, that contains the correct individual package versions per “top level” release (usually one of the many Python packages, e.g. Products.CMFPlone), and so forth.

  • Stefan

    Just some comments that came to my mind while reading your post:

    * for me the phrase should be: if your package is not in debian/ubuntu repository, it doesn’t exist.

    * Because Everybody Wins: well, open source is good, Python rules, but the Python infrastructure does not rule in the same way as the language rules. I don’t love the python.org page, and I don’t think that the bug tracker is great.
    And the argument: ‘put it there because everyone puts it there’ is a bit circular…

    * Because It Is The “Right” Thing To Do: Launchpad is not designed to host bazaar controlled source code, but to meet the needs of project development and management. That includes source code, but also translations, bug tracking, and so on. Combined with launchpad are PPAs and they are specifically designed to package and distribute software. If I would assume that PyPI is useful, then as a meta-information center, so no code, just meta-data. With great search and rating features. Maybe something like this…

    * Because It Is Not That Hard: I will never again use easy_install or related software. Dependecy handling is a very complicated task, Debian does it very very good, why should I sacrifice all the comfort of automated package management just because Python and Ruby and Perl and Latex and … all have created their own mini distribution sites with their own tools? Why should I sacrifice the ease of ‘apt-get upgrade’ for a manual list of all the tools I need to call to do all this in all the languages I have on my system?
    So: it’s not hard, but also not useful if you already use a linux distribution. Rebuilding something that already exists is hardly ‘helping the community’. Maybe ‘helping the Windows or Mac community’ and I understand that this is a good goal, but it is in contrast to your first point: ‘open source is good’. If you want to help the open source community, then build debian packages…

    * Because It Does Not Have To Be Perfect: If there where the right tools coming with PyPI, it’s user interface would be very uninteresting. What is the user interface of the Ubuntu universe repository? No idea, but apt-get and software-center can handle it, and those tools have great (CLI or GUI) user interfaces!

    So the conclusion: for many people PyPI might be great. Maybe because they don’t have the right tools in their OS or for other reasons. But in general I think there are better tools and better ways.

  • Anonymous

    Good comments Stefan, thank you! The most striking thing about your comments to me is the “use apt instead” argument, which I don’t fully understand. Apt (and other package managers) are used to assemble an operating system. Distutils and Setuptools are used to assemble an application. Yes, there are similarities, but I don’t see wholesale replacement of Distutils/Setuptools/Etc with Apt (or another package manager) as the answer. If it were, wouldn’t someone have done this already? I’m sure the Distutils2 folks have no interest in re-inventing the wheel…

  • Michael

    There is a very big gap in the design and implementation quality of Python and that of PyPi. In addition, I expect that most developers are adept in using Google to find stuff, and the stuff that you can find on Google includes lots and lots of open source Python code. Even if it is a module that was last updated in Python 2.4 days, it may still be of use to you in accelerating your own development projects. Google knows about close to 100% of open source Python stuff, PyPi does not.

    And since PyPi is tied in with that atrocity called “eggs”, I do not make it my first port of call when looking for code. And when I do use it, if there is a tarball or zip, then I download that through the web browser. And when I am stuck with an egg, I unzip it and move the folder into my Python library folder manually.

  • Michael

    I can understand why Linux oriented people would prefer to use apt-get to do package management. What I don’t understand is why the Python package repository does not support generating .deb and .rpm packages. Or .msi packages. PyPi is a tool whose view is too narrow at the technical level and this concept of PyPi being the be-all and end-all Python repository doesn’t make sense. There is a mismatch in what is desired of PyPi and what PyPi delivers.

  • Foo

    PyPI is great. Period.

  • Bas

    I really like PyPI, but I have to say that the search just makes me wanna cry.

  • http://dholth.pip.verisignlabs.com/ Daniel

    In the original article, Peter responds to the non-pypi-packaging author’s quote “Yes I know, I should do this, but I hate such complex and silly technologues as easy_install and eggs and everything that transforms Python into a Java-like ugly piece of “programming-tool-for-the-dummy-masses” ;-)

    I think he may be alluding to Apache Maven, a Java dependency management tool that keeps a local repository of pinned versions of all of your project’s dependencies.

    We are talking about a guy who thinks it’s silly to have tools that help you download and manage all the dependencies for a project. He thinks it is acceptable to manually download and “python setup.py” all of the packages he uses. His projects probably do not have more than a few dependencies. He probably does not use virtualenv to isolate his programs from each other.

    Suppose this person got a job as a Plone developer, working with a project that has more than 100 dependencies. In that case, he would at least be forced to think the complex and silly technologues [sic] were necessary. Eventually he might even be convinced they were useful. Maybe the experience would motivate him to upload his package to pypi.

    Aside: have you ever tried to convince someone of the necessity of source control?

    Basically, the challenge is to convince programmers who do not personally need their software to be on pypi (they have time to download and install all of their dependencies some other way) to do it for the good of the community.

  • Stefan

    Maybe I misunderstand the purpose of PyPI. Let me try to explain by compairing three aspects:

    * the tools (the explanations are nearly copy and paste from the corresponding web pages):
    pip install # pythonpackage is a package on PyPI. This installs the package and all its dependencies

    gem install # rubypackage is a package on rubygems.org. This installs the package and all its dependencies

    apt-get install # package is a package in the repository. installs the package and all its dependencies

    so to me there is a similarity here.

    * the meta information:

    a package on PyPI has information about: name, version, author, author_email, maintainer, maintainer_email, url, description, long_description, download_url, classifiers, platforms, license

    a debian package has information about: name, version, author, author_email, maintainer, maintainer_email, url, description, long_description, architectures, sections, license, and so on (there is a lot more, I know)

    so, again, there seems to be a lot of similarity

    * the purpose:
    give the user/developer the tools to find and use packages.

    I understand that there has to be a tool to assemble an application (but that is not PyPI, or is it?) and a way to assemble an operating system (which is another way of saying: a way to manage libraries and applications).

    I’m not saying that apt (or rpm) are the answer to everything. And I know that debian packages don’t create themselves, but are a lot of work to maintain. And I also know that inside a (python) debian package the still is a setup.py or similar, so this can not be replaced by apt.

    But my point is more that all the individual tools, that ruby, python, latex, perl, and so on created, all try to solve the same problem: help to find software in a central repository and use that software by managing dependencies and having some sort of database of installed software.

    So I fully understand the need for something like that, I just don’t think that PyPI is the best way. But a tool that can also handle ruby, latex, perl, C, C++, haskell, … you name it: one of these tools is the combination of apt and a debain repository.

  • http://whatschrisdoing.com/ Chris Lambacher

    I will attempt to show the difference between an OS level packaging system and a Python level packaging system.

    I am going to ignore for a moment the advantages of having pypi for platforms like Windows that have no native package management format ( installers and .msi files do not count because they do not solve dependencies). I will also ignore ruby and perl since my experience with them is limited.

    At a base level easy_install, pip and zc.buildout provide a convenient way to install a Python package and all of it’s dependencies into a Python “environment”. What is a Python environment? At its most base level it is a version and installation of Python. On most Linux distros you can install a couple of versions of Python, maybe the last 2 versions of 2.x and a version or two of 3.x. You can have an easy_install for each version and it will install the packages into the site-packages of each installation.

    Well, the OS package mangers can handle that, they can provide a package for each version or one package that detects the installed Python versions and installs the package in each of those versions. So that use case is handled.

    But what happens when you have two packages that conflict with each other? Maybe you have one web app that runs on TurboGears 1.0 and one that runs on TurboGears 2.0 and they depend on different versions of the same package. Well in the dark days of Python package management you would probably need to either change the name of the package when you introduced a conflicting version, or the package itself would need to be version aware (PyGTK comes to mind).

    Enter easy_install (or rather setuptools). Setuptools was an extension of distutils that provided some interesting features. The key feature for our particular use case is the ability to install two versions of the same package in the same python environment and have your app tell setuptools which version of the package it needed before the first import statement for that package. This checked for installed *eggs* and inserted the version of the egg required into sys.path.

    Of course this required that everything be aware that this was going on. It was hard and a pain. There were a couple of other early options, including the ability to install to a different location than the global site-package and use PYTHONPATH to tell the interpreter to also include the packages installed there.

    Of course this was all very manual, error prone and required a whole huge amount of extra effort on the part of the developer using the package installed by easy_install. Setuptools also provided a way to provide a virtual python installation that only worked on Posix style systems (i.e. no windows), but I have no experience with that

    Soon after (or around the same time, not sure on the timing) zc.buildout (from the Zope crowd) and workingenv (from Ian Bicking) came out as other virtual python installation methods. workingenv was superseded by virtualenv (also by Ian Bicking). If you do anything in the Zope world, you will likely come in contact with the zc.buildout and otherwise the most prevalent python virtualizer is virtualenv.

    I know more about viratualenv, so I will talk about that. What does virtualenv give you? An easy way to have completely isolated Python envronments.
    $ virtualenv mynewenv
    $ source mynewenv/bin/activate
    (mynewenv) $ easy_install packagetotest
    ….
    (mynewenv) $ #test the new package and determine that you don’t like it
    (mynewenv) $ deactivate
    $ rm -rf mynewenv
    and it is like it never existed. Those two TurboGears apps? Separate virtualenv instances. Django apps that require different versions? Separate virtualenv instances. Want to know if the new version of SciPy breaks your code? Create a new virtualenv instance.

    How is dpkg going do deal with the above use case?

    “Apt (and other package managers) are used to assemble an operating system. Distutils and Setuptools are used to assemble an application.”

  • http://whatschrisdoing.com/ Chris Lambacher

    If your problem is that .egg files are installed zipped, you can also use “easy_install -Z”. See http://peak.telecommunity.com/DevCenter/EasyInstall#compressed-installation for more information and the rationale behind .egg files as compressed files vs directories. pip defaults to uncompressed installation.

  • http://whatschrisdoing.com/ Chris Lambacher

    “If I would assume that PyPI is useful, then as a meta-information center, so no code, just meta-data. With great search and rating features. Maybe something like this…”

    If you are providing your package to PyPI, you can choose to provide links to download rather than host on PyPI directly. Hosting the download is a convenience function which means that people can use github/bitbucket/whatever to host their source code and releases can be as simple as “python setup.py sdist upload”. The developer does not have to think about how to host releases on a web site. You can also setup your entry once using the interface on the site and provide a link to where your downloads are, and easy_install/pip will follow that link and look for installable items. The easy_install crowd get’s the benefits they are looking for and you get the workflow you want.

    For most people, if the software does not exist in the package management system, it does not exist. For Python library authors you get a little bit of wiggle room, but if I can’t work it into my dependency management stream (read as pip or easy_install or zc.buildout) it is not very likely I am going to use it

  • Stefan

    Thanks for the comment, Chris.
    You have a good point here, the handling of different versions of software at the same time is not easy. If I have a library writtten in C, I typically have a filename that contains a version number (libmystuff.so.1.0.1). In this case, dpkg can handle the different versions very well.

    Python does not offer this feature in its import (as far as I know), so it makes the life of developers and maintainers harder. The solutions you describe (e.g. virtuelenv) are good workarounds for this shortcoming. But if you think this solution to it’s end, then a virtual machine would do the job much better. As long as only pure python packages are concerned, ‘virtualenv’ works. But there are cases, where it will stop to work. I was told that in Ruby, the gems mechanism goes as far as replacing shell commands that change the cwd and things like this, just to make sure that the created environment can not be left. But even that will not work in the extreme case, where a python library depends on a certain kernel version. What does virtualenv do then? It can not install and run a complete new kernel alongside the currently running kernel. That would be a virtual machine then.

    And indeed, I guess that a virtual machine might be a very good way to test if the new django or scipy version breaks all your code.

    I’m really not saying that setuptools are bad or virtualenv isn’t great to have (it do use it for django + pinax). But I say that if there would be a choice, it would be for stable APIs, system wide installations and one tool to handle the software management. This is the goal of linux distributions. For a distro release, the APIs of the libraries are frozen (bug fixes are allowed), software is installed system wide, even in multiple version, if the language allows it. And it has a tool to manage dependencies, installations and so on.

    But I still don’t understand why apt should be for the assembly of an operating system and therefore NOT be a replacement for PyPI? Operating systems are made up of software (libraries, programs, …)! apt is certainly not a replacement for ‘make’. And if ‘distutils’ is the equivalent of ‘make’ for python, then what does this tell me about PyPI? It tells me that I need distutils, but not more.

    To sum up, I still think that ‘virtualenv’ should be better left to virtual machines and PyPI better left to distribution repositories. And for example SUSES Build Service (http://en.opensuse.org/Build_Service) does exactly that: test compatibility for a lot of distributions in virtual machines and builds the packages for those distributions. I think that this services sums up what I think should be the right way(TM)

  • http://www.pumaonlinesite.com/ pandora Jewelry

    I recently came across your pandora Jewelry
    pandora Jewelryblog and have been reading along.