A Local Caching Proxy for “pypi.python.org” via Docker


If your infrastructure automation installs packages from PyPI (the Python Package Index) via “pip” or similar tools, you can save yourself from annoying “pypi.python.org timeout” errors by running a local caching proxy of the PyPI service. After trying several of these services, I found “devpi” to be the most resilient. It’s available as both a Python package or as a Docker container that you can run in your data center.


If you have infrastructure automation that tries to install packages from PyPI then you’ve undoubtedly encountered encounded availability issues with the PyPI web service, hosted at “pypi.python.org”. Example email alerts that we see from our Puppet infrastructure look like:

Tue Sep 02 23:47:23 -0700 2014 /Stage[main]/Jenkins::Dev_jenkins_slave/Package[jenkins-check-for-success]
(err): Could not evaluate: Could not get latest version: 
Timeout while contacting pypi.python.org: execution expired

One could choose to ignore these sorts of connectivity issues since they are transient, but there’s quite a few negative consequences to that:

  • If you’re spinning up new machines on-demand and they require a Python package as part of their configuration, then your ability to consistently spin up these machines successfully has become compromised by a dependency which is completely out of your own control.
  • If your infrastructure automation is configured to send email alerts on these types of errors, you’ll be getting un-actionable emails that add to the noise of email alerts that you get from your infrastructure. This effectivly makes your alerting system less valuable as your team will be trained to ignore their email alerts.
  • As you scale your infrastructure to hundreds or thousands of nodes, you’ll be receiving a lot of alerts about connectivity issues with “pypi.python.org” throughout the day and night time hours. When “pypi.python.org” goes down hard for a prolonged period of time, you’ll end up with all of the nodes in your infrastructure simultaneously bombarding you with alerts about not being able to contact “pypi.python.org”.


The solution for this problem is to run a local caching proxy for “pypi.python.org” within your data center. The Python community has developed proxy packages like pypiserver, chishop, devpi, and others specifically for this use case. After extensive research and trying several of them out, I’ve found devpi to be the most resilient as well as the most actively developed as of this writing.

One can either install devpi as a Python package (see instructions on their website) or via a Docker container. Since our infrastructure has been making the move to “All Docker Everything” I’ll write up the steps I took to setup the Docker container running devpi and how I configured our clients to use it.

Devpi Server Installation

Here’s some sample Puppet code for how to download & run the “scrapinghub/devpi” container with an nginx proxy in front of it. (I discussed why having an nginx proxy in front is advantageous in Private Docker Registry w/Nginx Proxy for Stats Collection)

You’ll want to change “DEVPI_PASSWORD” and the hostname for the Nginx vhost below.

# devpi server & nginx configuration
docker::image { 'scrapinghub/devpi': }
docker::run { 'devpi':
    image => 'scrapinghub/devpi',
    ports => ['3141:3141',],
    use_name => true,
    env => ['DEVPI_PASSWORD=1234',],

nginx::resource::upstream { 'pypi_app':
    members => ['localhost:3141',],
nginx::resource::vhost { 'vi-pypi.lab.vi.local':
    proxy => 'http://pypi_app',

Once your container is running you can run “docker logs” to see what it is up to. You can see the “devpi” proxy saving your bacon when “pypi.python.org” occasionally becomes unavailable via log statements like this: - - [22/Jul/2014 22:12:09] "GET /root/public/+simple/argparse/ HTTP/1.0" 200 4316
2014-07-22 22:12:09,251 [INFO ] requests.packages.urllib3.connectionpool: Resetting dropped connection: pypi.python.org
2014-07-22 22:12:09,301 [INFO ] devpi_server.filestore: cache-streaming: https://pypi.python.org/packages/source/a/argparse/argparse-1.2.
1.tar.gz, target root/pypi/+f/2fb/ef8cb61e506c706957ab6e135840c/argparse-1.2.1.tar.gz
2014-07-22 22:12:09,301 [INFO ] devpi_server.filestore: starting file iteration: root/pypi/+f/2fb/ef8cb61e506c706957ab6e135840c/argparse-1.2.1.tar.gz (size 69297)

Python Client Configuration

On the client side we need to configure both “pip” and “easy_install” to use the devpi container we just instantiated. This requires creating a special configuration file for each of those Python package managers. The configuration file tells those package managers to use your devpi proxy server for their package index URL.

You’ll want to change the URL to point to the hostname you use within your own infrastructure.

# ~/.pip/pip.conf

index-url = http://vi-pypi.lab.vi.local/root/public/
# ~/.pydistutils.cfg

index_url = http://vi-pypi.lab.vi.local/root/public/

But Wait There’s More – Uploading Your Own Python Packages

One of the additional benefits to running a local PyPI proxy is that it becomes a distribution point for your private Python packages. Instead of clumsily checking out SCM repos full of your own custom Python scripts to each machine in your infrastructure, you can install your Python scripts as first-order Python packages, the same way you would install packages from PyPI. This lets you properly version your packages and define dependency requirements between your packages.

Creating a “setup.py” file for each of your Python projects is outside the scope of this post, but details can be found online. Once your Python project has its “setup.py” file, uploading your versioned package to your local devpi instance requires just a few simple commands. From our Jenkins job which publishes a new version of a Python package upon a git push:

# from the cwd of "setup.py"
devpi use http://vi-pypi.lab.vi.local/root/public/
devpi login root --password 1234 
devpi upload

More details at: http://doc.devpi.net/latest/quickstart-releaseprocess.html


By running a local caching proxy of “pypi.python.org” we’re able to improve the reliability of our infrastructure because we are no longer beholden to the availability of an external dependency. We also get the added benefit of having a proper Python package distribution point, which allows us to have better development & deployment practices. Finally, this local caching proxy provides better performance for installing packages, as local network copies are significantly faster than downloading from an external website.

3 thoughts on “A Local Caching Proxy for “pypi.python.org” via Docker

  1. hi Dan, nice post. I’m the author of scrapinghub/devpi docker image,
    It was a nice surprise to see it featured in your post.

    To be honest, I have been thinking on adding nginx inside the image to serve files directly as recommended by devpi devs, SSL setup is a plus too. So far the image is very simple and does its job, but I also want to improve it so you can interact with the server using devpi-client commands and linked containers.


    • Hi Daniel, nice running into you here!🙂

      re: “nginx to serve files as recommended by devpi devs” – I wasn’t aware of this recommendation. Is it to offload the serving of the tar balls from the Python layer and serve directly from nginx? Do you have a link to this recommendation?

      re: adding nginx – Hmmm I kind of like the simplicity of the container as you have it now. I would suggest either having the nginx process in a separate linked container, or a wholly separate devpi container/Dockerfile that does a more full-blown configuration with nginx, SSL, etc. My $0.02.

      Thanks for writing!

      • re: “nginx to serve files as recommended by devpi devs”

        Yes, it is about serving static files directly, I think it is a good improvement to offload work of devpi-server process. You probably already found it, the link is http://doc.devpi.net/latest/quickstart-server.html#nginx-devpi-conf-nginx-site-config

        re: adding nginx – Hmmm I kind of like the simplicity of the container as you have it now.

        I understand it is good practice to run specialized containers and link them to get the extra functionality, but I use devpi to speedup development on vagrant, having it in a single container is much easier to distribute and get it running than orchestrating multiple containers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s