Succeeding through Laziness and Open Source

Back in mid-2014 I was in the midst of Docker-izing the build process at Virtual Instruments. As part of that work I’d open sourced one component of that system, the Docker-in-Docker Jenkins build slave which I’d created.

docker-meme

While claiming that I was driven by altruistic motivations when posting this code to GitHub (GH) would make for a great ex-post narrative, I have to admit that the real reasons for making the code publicly available were much more practical:

  • At the time the Docker image repositories on the Docker Hub Registry had to be tied to a GitHub repo (They’ve added Bitbucket support since then).
  • I was too cheap to pay for a private GitHub repo.

… And thus the code for the Docker-in-Docker Jenkins slave became open source! 😀

Unfortunately, making this image publicly available presented some challenges soon thereafter: Folks started linking their blog posts to it, people I’d never met emailed me asking for help in getting set up w/this system, others started filing issues against me on either GH or the Docker Hub Registry, and I started receiving pull-requests (PRs) to my GH repo.

Having switched employers just a few months after posting the code to GH, dealing with the issues and PRs was a bit of a challenge: My new employer didn’t have a Dockerized build system (yet), and short of setting up my own personal Jenkins server and Dockerized build slaves, there was no way for me to verify issues/fixes/PRs for this side-project. And so “tehranian/dind-jenkins-slave” stagnated on GH with relatively little participation from me.

Having largely forgotten about this project, I was quite surprised a few weeks ago when perusing the GH repo for Disqus. I accidentally discovered that the engineering team at Disqus had forked my repo and had been actively committing changes to their fork!

Their changes had:

  • Optimized the container’s layers to make it smaller in size,
  • Updated the image to work with new versions of Docker,
  • And also modified some environment variable names to avoid collisions with names that popular frameworks would use.

Prompted by this, I went back to my own GH repo, looked at the graph of all other forks, and saw that several others had forked my GH repo as well.

One such fork had updated my image to work with Docker Swarm and also to be able to easily use SSH keys for authenticating with the build slave instead of using password-based auth.

“How cool!”, I thought. I’d put an idea into the public domain a year ago, others had found it, and improved it in ways that I couldn’t have imagined. Further, their improvements were now available for myself and others to use!

My Delphix colleague Michael Coyle summed this all up very nicely, saying “As a software developer I can only realistically work for one organization at a time. Open source allows developers from different organizations to collaborate with each other without boundaries. In that way one actually can contribute to more than one organization at once.”

In hindsight I’m absolutely delighted that my unwillingness to purchase a private GitHub repo led to me contributing the Docker-in-Docker Jenkins slave to the public domain. There was nothing proprietary that Virtual Instruments could have used in its product, and by making it available other organizations like Disqus, CloudBees have been able to benefit, along with software developers on the other side of the planet. How exciting!

Testing Ansible Roles with Test Kitchen

Recently while attending DevOps Days Austin 2015, I participated in a breakout session focused on how to test code for configuration management tools like Puppet, Chef, and Ansible. Having started to use Ansible to manage our infrastructure at Delphix I was searching for a way to automate the testing of our configuration management code across a variety of platforms, including Ubuntu, CentOS, RHEL, and Delphix’s custom Illumos-based OS, DelphixOS. Dealing with testing across all of those platforms is a seemingly daunting task to say the least!

Intro to Test Kitchen

The conversation in that breakout session introduced me to Test Kitchen (GitHub), a tool that I’ve been very impressed by and have had quite a bit of fun writing tests for. Test Kitchen is a tool for automated testing of configuration management code written for tools like Ansible. It automates the process of spinning up test VMs, running your configuration management tool against those VMs, executing verification tests against those VMs, and then tearing down the test VMs.

What’s makes Test Kitchen so powerful and useful is its modular design:

Using Test Kitchen

After learning about Test Kitchen at the DevOps Days conference, I did some more research and stumbled across the following presentation which was instrumental in getting started with Test Kitchen and Ansible: Testing Ansible Roles with Test Kitchen, Serverspec and RSpec (SlideShare).

In summary one needs to add three files to their Ansible role to begin using Test Kitchen:

  • A “.kitchen.yml” file at the top-level. This file describes:
    • The driver to use for VM provisioning. Ex: Vagrant, AWS, Docker, etc.
    • The provisioner to use. Ex: Puppet, Chef, Ansible.
    • A list of 1 or more operating to test against. Ex: Ubuntu 12.04, Ubuntu 14.04, CentOS 6.5, or even a custom VM image specified by URL.
    • A list of test suites to run.
  • A “test/integration/test-suite-name/test-suite-name.yml” file which contains the Ansible playbook to be applied.
  • One or more test files in “test/integration/test-suite-name/test-driver-name/”. For example, when using the BATS test-runner to run a test suite named “default”: “test/integration/default/bats/my-test.bats”.

Example Code

A full example of Test Kitchen w/Ansible is available via the delphix.package-caching-proxy Ansible role in Delphix’s GitHub repo. Here are direct links to the aforementioned files/directories:682240

Running Test Kitchen

Using Test Kitchen couldn’t be easier. From the directory that contains your “.kitchen.yml” file, just run “kitchen test” to automatically create your VMs, configure them, and run tests against them:

$ kitchen test
-----> Starting Kitchen (v1.4.1)
-----> Cleaning up any prior instances of 
-----> Destroying ...
 Finished destroying  (0m0.00s).
-----> Testing 
-----> Creating ...
 Bringing machine 'default' up with 'virtualbox' provider...
 ==> default: Importing base box 'opscode-ubuntu-14.04'...
==> default: Matching MAC address for NAT networking...
 ==> default: Setting the name of the VM: kitchen-ansible-package-caching-proxy-default-ubuntu-1404_default_1435180384440_80322
 ==> default: Clearing any previously set network interfaces...
 ==> default: Preparing network interfaces based on configuration...
 default: Adapter 1: nat
 ==> default: Forwarding ports...
 default: 22 => 2222 (adapter 1)
 ==> default: Booting VM...
 ==> default: Waiting for machine to boot. This may take a few minutes...

..  ...

-----> Running bats test suite
 ✓ Accessing the apt-cacher-ng vhost should load the configuration page for Apt-Cacher-NG
 ✓ Hitting the apt-cacher proxy on the proxy port should succeed
 ✓ The previous command that hit ftp.debian.org should have placed some files in the cache
 ✓ Accessing the devpi server on port 3141 should return a valid JSON response
 ✓ Accessing the devpi server via the nginx vhost should return a valid JSON response
 ✓ Downloading a Python package via our PyPI proxy should succeed
 ✓ We should still be able to install Python packages when the devpi contianer's backend is broken
 ✓ The vhost for the docker registry should be available
 ✓ The docker registry's /_ping url should return valid JSON
 ✓ The docker registry's /v1/_ping url should return valid JSON
 ✓ The front-end serer's root url should return http 204
 ✓ The front-end server's /_status location should return statistics from our web server
 ✓ Accessing http://www.google.com through our proxy should always return a cache miss
 ✓ Downloading a file that is not in the cache should result in a cache miss
 ✓ Downloading a file that is in the cache should result in a cache hit
 ✓ Setting the header 'X-Refresh: true' should result in a bypass of the cache
 ✓ Trying to purge when it's not in the cache should return 404
 ✓ Downloading the file again after purging from the cache should yield a cache miss
 ✓ The yum repo's vhost should return HTTP 200

 19 tests, 0 failures
 Finished verifying  (1m52.26s).
-----> Kitchen is finished. (1m52.49s)

And there you have it, one command to automate your entire VM testing workflow!

Next Steps

Giving individual developers on our team the ability to quickly run a suite of automated tests is a big win, but that’s only the first step. The workflow we’re planning is to have Jenkins also run these automated Ansible tests every time someone pushes to our git repo. If those tests succeed we can automatically trigger a run of Ansible against our production inventory. If, on the other hand, the Jenkins job which runs the tests is failing (red), we can use that to prevent Ansible from running against our production inventory. This would be a big win for validating infrastructure changes before pushing them to production.

ansible_logo_black_square

Ansible vs Puppet – Hands-On with Ansible

This is part 2/2 in a series. For part #1 see: Ansible vs Puppet – An Overview of the Solutions.

Notes & Findings From Going Hands-On with Ansible

After playing with Ansible for a week to Ansible-ize Graphite/Grafana (via Docker) and Jenkins (via an Ansible Galaxy role), here are my notes about Ansible:

  • “Batteries Included” and OSS Module Quality
    • While Ansible does include more modules out of the box, the “batteries included” claim is misleading. IMO an Ansible shop will have to rely heavily upon Ansible Galaxy to find community-created modules (Ex: for installing Jenkins, dockerd, or ntp), just as a Puppet shop would have to rely upon PuppetForge.
    • The quality and quantity of the modules on Ansible Galaxy is about on par with what is available at PuppetForge. Just as with PuppetForge, there are multiple implementations for any given module (ex: nginx, ntp, jenkins), each with their own quirks, strengths, and deficiencies.
    • Perhaps this is a deficiency of all of the configuration management systems. Ultimately a shop’s familiarity with Python or Ruby may add some preference here.
  • Package Installations
    • Coming from Puppet-land this seemed worthy of pointing out: Ansible does not abstract an OS’s package manager the same way that Puppet does with the “package” resource. Users explicitly call out the package manager to be used. Ex: the “apt” module or “yum” module. One can see that Ansible provides a tad bit less abstraction. FWIW a package installed via “pip” or “gem” in Puppet still requires explicit naming of the package provider. Not saying that either is better or worse here. Just a noticable difference to an Ansible newbie.
  • Programming Language Constructs
  • Noop Mode
  • Agent-less
    • Ansible’s agent-less, SSH-based push workflow actually was notably easier to deal with than a Puppetmaster, slave agents, SSL certs, etc.
  • Learning Curve
    • If I use my imagination and pretend that I was starting to use a configuration management tool for the first time, I perceive that I’d have an easier time picking up Ansible. Even though I’m not a fan of YAML by any stretch of the imagination, Ansible playbooks are a bit easier to write & understand than Puppet manifests.

Conclusions

After three years of using Puppet at VMware and Virtual Instruments, the thought of not continuing to use the market leader in configuration management tools seemed like a radical idea when it was first suggested to me. After spending several weeks researching Ansible and using it hands-on, I came to the conclusion that Ansible is a perfectly viable alternative to Puppet. I tend to agree with Lyft’s conclusion that if you have a centralized Ops team in change of deployments then they can own a Puppet codebase. On the other hand if you want more wide-spread ownership of your configuration management scripts, a tool with a shallower learning curve like Ansible is a better choice.