Succeeding through Laziness and Open Source

Back in mid-2014 I was in the midst of Docker-izing the build process at Virtual Instruments. As part of that work I’d open sourced one component of that system, the Docker-in-Docker Jenkins build slave which I’d created.


While claiming that I was driven by altruistic motivations when posting this code to GitHub (GH) would make for a great ex-post narrative, I have to admit that the real reasons for making the code publicly available were much more practical:

  • At the time the Docker image repositories on the Docker Hub Registry had to be tied to a GitHub repo (They’ve added Bitbucket support since then).
  • I was too cheap to pay for a private GitHub repo.

… And thus the code for the Docker-in-Docker Jenkins slave became open source! 😀

Unfortunately, making this image publicly available presented some challenges soon thereafter: Folks started linking their blog posts to it, people I’d never met emailed me asking for help in getting set up w/this system, others started filing issues against me on either GH or the Docker Hub Registry, and I started receiving pull-requests (PRs) to my GH repo.

Having switched employers just a few months after posting the code to GH, dealing with the issues and PRs was a bit of a challenge: My new employer didn’t have a Dockerized build system (yet), and short of setting up my own personal Jenkins server and Dockerized build slaves, there was no way for me to verify issues/fixes/PRs for this side-project. And so “tehranian/dind-jenkins-slave” stagnated on GH with relatively little participation from me.

Having largely forgotten about this project, I was quite surprised a few weeks ago when perusing the GH repo for Disqus. I accidentally discovered that the engineering team at Disqus had forked my repo and had been actively committing changes to their fork!

Their changes had:

  • Optimized the container’s layers to make it smaller in size,
  • Updated the image to work with new versions of Docker,
  • And also modified some environment variable names to avoid collisions with names that popular frameworks would use.

Prompted by this, I went back to my own GH repo, looked at the graph of all other forks, and saw that several others had forked my GH repo as well.

One such fork had updated my image to work with Docker Swarm and also to be able to easily use SSH keys for authenticating with the build slave instead of using password-based auth.

“How cool!”, I thought. I’d put an idea into the public domain a year ago, others had found it, and improved it in ways that I couldn’t have imagined. Further, their improvements were now available for myself and others to use!

My Delphix colleague Michael Coyle summed this all up very nicely, saying “As a software developer I can only realistically work for one organization at a time. Open source allows developers from different organizations to collaborate with each other without boundaries. In that way one actually can contribute to more than one organization at once.”

In hindsight I’m absolutely delighted that my unwillingness to purchase a private GitHub repo led to me contributing the Docker-in-Docker Jenkins slave to the public domain. There was nothing proprietary that Virtual Instruments could have used in its product, and by making it available other organizations like Disqus, CloudBees have been able to benefit, along with software developers on the other side of the planet. How exciting!

Ansible Role for Package Hosting & Caching

The Operations Team at Delphix has recently published an Ansible role called delphix.package-caching-proxy. We are using this role internally to host binary packages (ex. RPMs, Python “pip” packages), as well as to locally cache external packages that our infrastructure and build process depends upon.

This role provides:

It also provides the hooks for monitoring the front-end Nginx server through collectd. More details here.

Why Is this Useful?

This sort of infrastructure can be useful in a variety of situations, for example:

  • When your organization has remote offices/employees whose productivity would benefit from having fast, local access to large binaries like ISOs, OVAs, or OS packages.
  • When your dev process depends on external dependencies from services that are susceptible to outages, ex. NPM or PyPI.
  • When your dev process depends on third-party artifacts that are pinned to certain versions and you want a local copy of those pinned dependencies in case those specific versions become unavailable in the future.

Sample Usage

While there are a verity of configuration options for this role, the default configuration can be deployed with an Ansible playbook as simple as the following:

- hosts: all
    - delphix.package-caching-proxy

Underneath the Covers

This role works by deploying a front-end Nginx webserver to do HTTP caching, and also configures several Nginx server blocks (analogous to Apache vhosts) which delegate to Docker containers for the apps that run the Docker Registry, the PyPI server, etc.

Downloading, Source Code, and Additional Documentation/Examples

This role is hosted in Ansible Galaxy at, and the source code is available on GitHub at:

Additional documentation and examples are available in the file in the GitHub repo, at:


Shoutouts to some deserving folks:

  • My former Development Infrastructure Engineering Team at VMware, who proved this idea out by implementing a similar set of caching proxy servers for our global remote offices in order to improve developer productivity.
  • The folks who conceived the Snakes on a Plane Docker Global Hack Day project.

Building Vagrant Boxes with Nested VMs using Packer

In “Improving Developer Productivity with Vagrant” I discussed the productivity benefits gained from using Vagrant in our software development tool chain. Here are some more details about the mechanics of how we created those Vagrant boxes as part of every build of our product.

Using Packer to Build VMware-Compatible Vagrant Boxes

Packer is a tool for creating machine images which was also written by Hashicorp, the authors of Vagrant. It can build machine images for almost any type of environment, including Amazon AWSDocker, Google Compute Engine, KVM, Vagrant, VMwareXen, and more.

We used Packer’s built-in VMware builder and Vagrant post-processor to create the Vagrant boxes for users to run on their local desktops/laptops via VMware Fusion or Workstation.

Note: This required each user to install Vagrant’s for-purchase VMware plugin. In our usage of running Vagrant boxes locally we noted that the VMware virtualization providers delivered far better IO performance and stability than the free Oracle VirtualBox provider. In short, the for-purchase Vagrant-VMware plugin was worth every penny!

Running VMware Workstation VMs Nested in ESXi

One of the hurdles I came across in integrating the building of the Vagrant boxes into our existing build system is that Packer’s VMware builder needs to spin up a VM using Workstation or Fusion in order to perform configuration of the Vagrant box. Given that our builds were already running in static VMs, this meant that we needed to be able to run Workstation VMs nested within an ESXi VM with a Linux guest OS!

This sort of VM-nesting was somewhat complicated to setup in the days of vSphere 5.0, but in vSphere 5.1+ this has become a lot simpler. With vSphere 5.1+ one just needs to make sure that their ESXi VMs are running with “Virtual Hardware Version 9” or newer, and one must enable “Hardware assisted virtualization” for the VM within the vSphere web client.

Here’s what the correct configuration for supporting nested VMs looks like:

2014-09-28 02.05.03 pm

Packer’s Built-in Remote vSphere Hypervisor Builder

One question that an informed user of Packer may correctly ask is: “Why not use Packer’s built-in Remote vSphere Hypervisor Builder and create the VM directly on ESXi? Wouldn’t this remove the need for running nested VMs?”

I agree that this would be a better solution in theory. There are several reasons why I chose to go with nested VMs instead:

  1. The “Remote vSphere Hypervisor Builder” requires manually running an “esxcli” command on your ESXi boxes to enable some sort of “GuestIP hack”. Doing this type of configuration on our production ESXi cluster seemed sketchy to me.
  2. The “Remote vSphere Hypervisor Builder” doesn’t work through vSphere, but instead directly ssh’es into your ESXi boxes as a privileged user in order to create the VM. The login credentials for that privileged ESXi/ssh user must be kept in the Packer build script or some other area of our build system. Again, this seems less than ideal to me.
  3. As far as I can tell from the docs, the “Remote vSphere Hypervisor Builder” only works with the “vmware-iso” builder and not the “vmware-vmx” builder. This would’ve painted us into a corner as we had plans to switch from the “vmware-iso” builder to the “vmware-vmx” builder once it had become available.
  4. The “Remote vSphere Hypervisor Builder” was not available when I implemented our nested VM solution because we were early adopters of Packer. It was easier to stick with a working solution that we already had 😛

Automating the Install of VMware Workstation via Puppet

One other mechanical piece I’ll share is how we automated the installation of VMware Workstation 10.0 into our static build VMs. Since all of the build VM configuration is done via Puppet, we could automate the installation of Workstation 10 with the following bit of Puppet code:

# Install VMware Workstation 10
  $vmware_installer = '/mnt/devops/software/vmware/VMware-Workstation-Full-10.0.0-1295980.x86_64.bundle'
  $vmware_installer_options = '--eulas-agreed --required'
  exec {'Install VMware Workstation 10':
    command => "${vmware_installer} ${vmware_installer_options}",
    creates => '/usr/lib/vmware/config',
    user    => 'root',
    require => [Mount['/mnt/devops'], Package['kernel-default-devel']],

Improving Developer Productivity with Vagrant

As part of improving developer productivity at Virtual Instruments during development of  Virtual Widsom 4.0, I introduced Vagrant to the development team. At the time, the product was being re-architected from a monolithic Java app into a service oriented architecture (SOA). Without Vagrant, the challenge for a given Java developer working on any one of the Java services was that there was no integration environment available for that developer to test the respective service that they were working on. In other words, a developer could run their respective Java service locally, but without the other co-requisite services and databases they couldn’t do anything useful with it.

How Not To Solve

We could have documented a long set of instructions in a wiki, detailing how to setup and run each one of the Java services locally, along with instructions on how to setup and run each of the databases manually, but there would be several problems with this approach:

  1. Following such instructions would be a very manual, time-consuming, and mistake-prone process. The total time on such efforts would be multiplied by the size of the R&D team as each developer would have to duplicate this effort on their own.
  2. Such instructions would be a “living document“, continually changing over time. This means that if Jack followed the instructions on Day X, the instructions that Jane followed on Day X+Y could be potentially different and lead to two very different integration environments.
  3. All of our developers were running Mac OS or Windows laptops, but the product environment was SuSE Linux Enterprise Server 11 (SLES 11). Regardless of how complete our instructions on how to setup the environment could be, there would still be the issue of consistency of environment. If developers were to test their Java services in hand-crafted environments that were not identical to the actual environment that QA tested in or that the customer ran the product in, then we would be sure to hit issues where functionality would work in one developer’s environment, but not in QA or in the customer’s environment! (i.e., “It worked on my box!”)

A Better Approach


Turning our integration environment into a portable Vagrant box (a virtual machine) solved all of these issues. The Vagrant box was an easily distributable artifact generated by our build process that contained fully configured instances of all of the Java services and databases that comprised our product. Developers could download the Vagrant box and get it running in minutes. The process for running the Vagrant box was so simple that even managers and directors could download a “Vagrantfile” and “vagrant up” to get a recent build running locally on their laptops. Finally, the Vagrant box generated by our build process utilized the identical SLES 11 environment that QA and customers would be running with, so developers would not be running into issues related to differences in environment. I will write a follow-up post about how we use Packer in our build process to create the Vagrant box, but for now I’ll provide some details about our Vagrant box workflow.

The “Vagrantfile”

Here’s a partial sample of our “Vagrantfile” where I’d like to call a few things out:

VAGRANTFILE_API_VERSION = "2"  # Do not modify

VM_RAM_MB = "4096"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  # Box name & URL to download from = "vw-21609-431"
  config.vm.box_url = ""


  # Create a private network interface, /dev/eth1. This allows host-only access
  # to the machine using a specific IP. The host OS is available to the guest
  # at the "" IP address. :private_network, ip: ""

  config.vm.provider "vmware_fusion" do |v|
    v.gui = VM_SHOW_CONSOLE
    v.vmx["memsize"]  = VM_RAM_MB
    v.vmx["numvcpus"] = VM_NUM_CPUS



Keep in mind that the “Vagrantfile” is executable Ruby code, so there are virtually limitless possibilities for what one can accomplish depending on your needs and desired workflow.

Private Networking and Our “services.conf”

The workflow used by the developers of our Java services is to run the service that they modifying via the IDE in their host OS (ex: Eclipse or IntelliJ), and to have all other services and databases running within the Vagrant box (the guest OS). In order to facilitate the communication between the host OS and guest OS, we direct the “Vagrantfile” to create a private network with static IP addresses for the host and guest. Here our host OS will have the IP “” while the guest will be available at “”:

  # Create a private network interface, /dev/eth1. This allows host-only access
  # to the machine using a specific IP. The host OS is available to the guest
  # at the "" IP address. :private_network, ip: ""

With private networking connectivity in place, we modified our Java services to read the configuration of where to find their peer-services into a hierarchy of configuration files. Ex: When a Java service initializes, it reads the following hierarchy of configuration files to determine how to connect to the other services:

  • /etc/vi/services.conf (the default settings)
  • /vagrant/services.conf
  • ~/services.conf (highest precedence)

Sample contents for these “services.conf” files:

# /vagrant/services.conf

The “services.conf” hierarchy allows a developer to direct the service running in their IDE/host OS to connect to the Java services running within the Vagrant box/guest OS (via “~/services.conf”), as needed. It also allows the developer to configure the services within the Vagrant box/guest OS to connect to the Java services running on the host OS via the “/vagrant/services.conf” file. One clarification – The “/vagrant/services.conf” file actually lives on the host OS in the working directory of the “Vagrantfile” that the developer downloads. The file appears as “/vagrant/services.conf” via the default shared folder provided by Vagrant. Having the “/vagrant/services.conf” live on the host OS is especially convenient as it allows for easy editing, and more importantly it provides persistence of the developer’s configuration when tearing down and re-initializing newer versions of our Vagrant box.

Easy Downloading with “box_url”

As part of our workflow I found it to be easiest to have users not download the Vagrant .box file directly, but instead to download the small (~3KB) “Vagrantfile” which in turn contains the URL for the .box file. When the user runs “vagrant up” from the cwd of this “Vagrantfile”, Vagrant will automatically detect that the Vagrant box of the respective name is not in the local library and start to download the Vagrant box from the URL listed in the “Vagrantfile”.

  # Box name & URL to download from = "vw-21609-431"
  config.vm.box_url = ""

More details available in the Vagrant docs: Note: Earlier this year the authors of Vagrant released a SaaS service for box distribution called Vagrant Cloud. You may want to look into using this, along with the newer functionality of Vagrant box versioning. We are not using the Vagrant Cloud SaaS service yet as our solution pre-dates the availability of this service and there hasn’t been sufficient motivation to change our workflow.

VM Hardware Customization

In our “Vagrantfile” I wanted to make it dead-simple for people to be able to modify the hardware resources. At VI some developers had very new laptops with lots of RAM while others had older laptops. Putting the following Ruby variables at the top of the “Vagrantfile” made it easy for someone that knows absolutely nothing about Ruby to edit the hardware configuration of their Vagrant box:

VM_RAM_MB = "4096"


  config.vm.provider "vmware_fusion" do |v|
    v.gui = VM_SHOW_CONSOLE
    v.vmx["memsize"]  = VM_RAM_MB
    v.vmx["numvcpus"] = VM_NUM_CPUS


In developing an SOA application, having a Vagrant box for developers to integrate their services that are under development has been a enormous boon for developer productivity. Downloading and running a Vagrant box is orders of magnitude faster than configuring and starting services by hand. The Vagrant box also solves the problem of “consistency of environment”, allowing developers to run their code in an environment that closely matches the QA/customer environment. In the post-mortem analysis of our Virtual Wisdom 4.0 release, having Vagrant boxes for developer integration of our Java services was identified as one of the big “wins” of the release. As the Director of Engineering said, “Without the developer productivity gains from Vagrant, we would not have been able to ship VirtualWisdom 4.0 when we did.”

Docker + Jenkins: Dynamically Provisioning SLES 11 Build Containers


Using JenkinsDocker Plugin, we can dynamically spin-up SLES 11 build slaves on-demand to run our builds. One of the hurdles to getting there was to create a SLES 11 Docker base-image, since there are no SLES 11 container images available at the Docker Hub Registry. We used SUSE’s Kiwi imaging tool to create a base SLES 11 Docker image for ourselves, and then layered our build environment and Jenkins build slave support on top of it. After configuring Jenkins’ Docker plugin to use our home-grown SLES image, we were off and running with our containerized SLES builds!

Jenkins/Docker Plugin

The path to Docker-izing our build slaves started with stumbling across this Docker Plugin for Jenkins: This plugin allows one to use Docker to dynamically provision a build slave, run a single build, and then tear-down that slave, optionally saving it. This is very similar in workflow to the build VM provisioning system that I created while working in VMware’s Release Engineering team, but much lighter weight. Compared to VMs, Docker containers can be spun up in milliseconds instead of a in few minutes and Docker containers are much lighter on hardware resources.

The above link to the Jenkins wiki provides details about how to configure your environment as well as how to configure your container images. Some high-level notes:

  • Your base OS needs to have Docker listening on a TCP port. By default, Docker only listens on a Unix socket.
  • The container needs run “sshd” for Jenkins to connect to it. I suspect that once the container is provisioned, Jenkins just treats it as a plain-old SSH slave.
  • In my testing, the Docker/Jenkins plugin was not able to connect via SSH to the containers it provisioned when using Docker 1.2.0. After trial and error, I found that the current version of the Jenkins plugin (0.6) works well with Docker 1.0-1.1.2, but Docker 1.2.0+ did not work with this Jenkins Plugin. I used Puppet to make sure that our Ubuntu build server base VMs only had Docker 1.1.2 installed. Ex:
    • # VW-10576: install docker on the ubuntu master/slaves
      # * Have Docker listen on a TCP port per instructions at:
      # * Use Docker 1.1.2 and not anything newer. At the time of writing this
      # comment, Docker 1.2.0+ does not work with the Jenkins/Docker
      # plugin (the port for sshd fails to map to an external port).
      class { 'docker':
        tcp_bind => 'tcp://',
        version  => '1.1.2',
  • There is a sample Docker/Jenkins slave based on “ubuntu:latest” available at: I would recommend getting that working as a proof-of-concept before venturing into building your own custom build slave containers. It’s helpful to be familiar with the “Dockerfile” for that image as well:

Once you have the Docker Plugin installed, you need to go to your Jenkins “System Configuration” page and add your Docker host as a new cloud provider. In my proof-of-concept case, this is an Ubuntu 12.04 VM running Docker 1.1.2, listening on port 4243, configured to use the “evarga/jenkins-slave” image, providing the “docker-slave” label which I can then configure my Jenkins build job to be restricted to. The Jenkins configuration looks like this:

Jenkins' "System Configuration" for a Docker host

Jenkins’ “System Configuration” for a Docker host

I then configured a job named “docker-test” to use that “docker-slave” label and run a shell script with basic commands like “ps -eafwww”, “cat /etc/issue”, and “java -version”. Running that job, I see that it successfully spins up a container of “evarga/jenkins-slave” and runs my little script. Note the hostname at the top of the log, and output of “ps” in the screenshot below:

A proof-of-concept of spinning up a Docker container on demand

A proof-of-concept of spinning up a Docker container on demand


Creating Our SLES 11 Base Image

Having built up the confidence that we can spin up other people’s containers on-demand, we now turned to creating our SLES 11 Docker build image. For reasons that I can only assume are licensing issues, SLES 11 does not have a base image up on the Docker Hub Registry in the same vein as the images that Ubuntu, Fedora, CentOS, and others have available.

Luckily I stumbled upon the following blog post:

At Virtual Instruments we were already using Kiwi to build the OVAs of our build VMs, so we were already familiar with using Kiwi. Since we’d already been using Kiwi to create the OVA of our build environment it wasn’t much more work to follow that blog post and get Kiwi to generate a tarball that could be consumed by “docker import”. This worked well for the next proof-of-concept phase, but ultimately we decided to go down another path.

Rather than have Kiwi generate fully configured build images for us, we decided it’d be best to follow the conventions of the “Docker Way” and have Kiwi generate a SLES 11 base image which we could then use with a “FROM” statement in a “Dockerfile” and install the build environment via the Dockerfile. One of the advantages of this is that we only have to use Kiwi to generate the base image the first time. After there we can stay in Docker-land to build the subsequent images. Additionally, having a shared base image among all of our build image tags should allow for space savings as Docker optimizes the layering of filesystems over a common base image.

Configuring the Image for Use with Jenkins

Taking a SLES 11 image with our build environment installed and getting it to work with the Jenkins Docker plugin took a little bit of work, mainly spent trying to configure “sshd” correctly. Below is the “Dockerfile” that builds upon a SLES image with our build environment installed and prepares it for use with Jenkins:

# This Dockerfile is used to build an image containing basic
# configuration to be used as a Jenkins slave build node.

MAINTAINER Dan Tehranian <>

# Add user & group "jenkins" to the image and set its password
RUN groupadd jenkins
RUN useradd -m -g jenkins -s /bin/bash jenkins
RUN echo "jenkins:jenkins" | chpasswd

# Having "sshd" running in the container is a requirement of the Jenkins/Docker
# plugin. See:

# Create the ssh host keys needed for sshd
RUN ssh-keygen -A

# Fix sshd's configuration for use within the container. See VW-10576 for details.
RUN sed -i -e 's/^UsePAM .*/UsePAM no/' /etc/ssh/sshd_config
RUN sed -i -e 's/^PasswordAuthentication .*/PasswordAuthentication yes/' /etc/ssh/sshd_config

# Expose the standard SSH port

# Start the ssh daemon
CMD ["/usr/sbin/sshd -D"]

Running a Maven Build Inside of a SLES 11 Docker Container

Having created this new image and pushed it to our internal docker repo, we can now go back to Jenkins’ “System Configuration” page and add a new image to our Docker cloud provider. Creating a new Jenkins “Maven Job” which utilizes this new SLES 11 image and running a build, we can see our SLES 11 container getting spun up, code getting checked out from our internal git repo, and Maven being invoked:

Hooray! A Successful Maven Build

Hooray! A successful Maven build inside of a Docker container!

Output From the Maven Build. LGTM!

Output from the Maven Build that was run in the container. LGTM!



There are a whole slew of benefits to a system like this:

  • We don’t have to run & support SLES 11 VMs in our infrastructure alongside the easier-to-manage Ubuntu VMs. We can just run Ubuntu 12.04 VMs as the base OS and spin up SLES slaves as needed. This makes testing of our Puppet repository a lot easier as this gives us a homogeneous OS environment!
  • We can have portable and separate build environment images for each of our branches. Ex: legacy product branches can continue to have old versions of the JDK and third party libraries that are updated only when needed, but our mainline development can have a build image with tools that are updated independently.
    • This is significantly better than the “toolchain repository” solution that we had at VMware, where several 100s of GBs of binaries were checked into a monolithic Perforce repo.
  • Thanks to Docker image tags, we can tag the build image at each GA release and keep that build environment saved. This makes reproducing builds significantly easier!
  • Having a Docker image of the build environment allows our developers to do local builds via their IDEs, if they so choose. Using Vagrant’s Docker provider, developers can spin up a Docker container of the build environment for their respective branch on their local machines, regardless of their host OS – Windows, Mac, or Linux. This allows developers to build RPMs with the same libraries and tools that the build system would!

RELENG 2014 Wrap-Up

I was a speaker at the 2nd International Workshop on Release Engineering @ Google HQ last week.  I enjoyed meeting up with other release engineers and sharing ideas with them.  Here are some talks I enjoyed:

Finally, my own presentation went quite well.  There were a lot of people in the audience that came up to chat with me afterwards and it seemed that the message really resonated with the audience (confirmation bias, much? : ).  From the sounds of it, I may have an opportunity to give an extended-version of this talk at Google and VMware in the near future which is great because there were several ROI examples and software industry anecdotes around quality and time-to-market that I had to cut out to meet the time requirements.

Here are the slides I used:

Jenkins Job DSL Plugin Saves the Day

At Virtual Instruments we have some ~50 Jenkins jobs that generate the RPMs from each of our GIT repos as well as the appliance images of our product (ISO, OVA, PXE, Vagrant, etc).

Over the past few months my team had put a lot of time & effort into taking the configuration of all of those pre-existing, manually created jobs and codifying them via the Jenkins Job DSL Plugin.  This week all of that hidden “plumbing” work finally paid its first dividends for us when we had to cut the branch for our upcoming release, and the benefit was enormous.  Since the Job DSL Plugin puts all of our configuration into source control and gives us the ability to programmatically create all of our Jenkins jobs, it saved us many painful hours of manual job configuration via the Jenkins UI.  Instead of having to copy each of those existing 50 jobs and modify them by hand, we only had to do some editing of our Groovy files, send for review, submit the changes to our Git repo, and watch the new jobs get created automatically.

Given that this was our first time doing this type of release branching we had to work out a few kinks and do extra validation, but by 5pm we had all of the continuous builds working for each repo as well as images for our product being successfully created.

Even better is that as we were working out a few obscure kinks with the developers, those developers quickly realized that they too could edit the Groovy files in Git to make simple changes to the job configuration.  In other words, the developers realized that they were not gated upon our DevOps team to make job configuration changes for them.  I’m hoping that we can expand on this in the coming weeks and have the developers take even more ownership of the build job configurations.

The big benefit with the Job DSL Plugin is that it takes job configuration that was previously done by hand in a UI and puts it into a documented format in source control where the changes can be reviewed via Review Board and tracked over time.  With that we can confidently provide access to making job configuration changes to the whole development team which ultimately makes for a quicker configuration change process and a lighter workload for my team.