Managing Secrets with Ansible Vault – The Missing Guide (Part 2 of 2)

(This post is part 2/2 in a series. For part 1 see: Managing Secrets with Ansible Vault – The Missing Guide (Part 1 of 2))

How to use Ansible Vault with Test Kitchen

Once you’ve codified all of your secrets into Ansible “var files” and encrypted them with Ansible Vault, you’ll probably want to test the deployment of these secrets with Test Kitchen. Unfortunately you will quickly find that Test Kitchen does not play with Vault in an ideal way: In order for Test Kitchen to run “ansible-playbook” it now needs the password to your Vault in order to decrypt the secrets within the var files.

How does the “kitchen-ansible” plugin expect to receive the password to your Vault? Via a plain-text file on your filesystem, as specified by the “ansible_vault_password_file” parameter in your “.kitchen.yml” file. Oh boy!

This does not seems like a scalable solution to me… I hardly trust myself to manage a plain-text file with the password to our Vault. Beyond that, I would be terrified to let an entire organization of folks know the password to the Vault and instruct them to store that password in a plain-text file in their own respective file systems just so that they could run Test Kitchen tests as they iterate w/Ansible. In practice this would be only marginally better than simply checking in the secrets as plain-text into git, as all this structure around Vault and Ansible vars would only be pushing the problem of secret management one level higher.

So how can we test with Test Kitchen when using Ansible Vault? Here’s a nifty solution to the problem that builds upon the solution that we implemented in Part 1 of this guide:

  • Define a well-known Unix hostname for your Test Kitchen VM. Ex: “test-kitchen”
  • Create two versions of your vars files: One for production which is encrypted, the other for your test environment which is unencrypted. The structure of the files will be largely the same (ex. the files to be placed, w/their respective owner, group, mode), but the contents of the files for production will differ from the files for your test environment.
  • In “tasks/main.yml”, use “include_vars” to include the appropriate var file for whichever environment you happen to be in. This can be done by using the “with_first_found” arg to “include_vars”. See example below.
# .kitchen.yml
---
# Set the hostname of our Test Kitchen-created VM to be “test-kitchen”
driver:
  name: vagrant
  vm_hostname: test-kitchen
...<snip>...

##########

# vars/vpn-secrets-prod.yml - A Vault-encrypted file
$ANSIBLE_VAULT;1.1;AES256
34336333316361306432303864336464623165316461396266626562393232316565383263663234
3963633535363737613136656535343436613335636663380a373766653966663337666539613166
32313738303263303130353665333031373930353938653766653732623061326462633065393134
3135386639333637630a393439343733616439373731383932383562356164633832363639636633
64373237333661653066346566366135326539636564343632666363663866653264396564396162
62353461326435373433633034313338376265396130363965313464656332373737306462323433
34646361363065656331336337313763313939303533646138323834336330323533353239363663
...<snip>...

# vars/vpn-secrets-test-kitchen.yml
---
vpn_secret_files:
  /etc/openvpn/easy-rsa/keys/ec2-openvpn.key:
    owner: root
    group: root
    mode: "u=r,go="
    content: |
      -----BEGIN PRIVATE KEY-----
      MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQD5koXgI24E360f
      nhxCfOPVORzFW1CN7u/zOQdvKoIStogF0UQifDCnY/POEjoBmzBrg/UyAmsqLIli
      xMtRIuvEhwaGEUQPoZNCaRW+1XtJ3kDvr9MVTlJTcNGOlGe/E+HyAKBq5vinxzzM
      9ba8M9Nc1PQ93B1OTUY1QGHVYRvSFYDJ5Fnz23xKeNsnY3hmRkV7CDZXSdy9nbmy
      1X9uz7z5bG7PKUVD3JZjI75CHAEDJKtscBv9ez/z16YTxwahIL3CXfqBq8peyAZ0
      n4Mzj4Lt8Cwaw2Kw3w3gMhbhf4fy284+hYqHe9uqYJC6dJJSKDIXqoLSD+e8aN+v
      BAEQcAWXAgMBAAECggEAbmHJ6HqDHJC5h3Rs11NZiWL7QKbEmCIH6rFcgmRwp0oo
      GzqVQhNfiYmBubECCtfSsJrqhbXgJAUStqaHrlkdogx+bCmSyr8R3JuRzJerMd6l
      Jd3EJHZBnzoU1VT6Fd77Xge868tASySp1ZUPv2nEoBhn9jw2kf1HgiH5o2CR53ZP
      pnL72Ng7MHpKuyoAZ9DtUU7yGG4RTCN2JuPGD6IwKoXBs1b7tqsMncz86u6Iibwk
      Np4j3vPmSLfQxvBP85T0xzSURlnP+bFCaJDPfXYIgDLROkrFAgJ2ADCm4gwfk93i
      Z/wnk8tFjnxUy2V5UbtWqqkVHmvdHHCc/6bZfcNOsQKBgQD/v94YX3vhgZRiz1kZ
      c0v2lxFZqNgMPC7EADmO34nFq7KtmVXYQfpoiooGDfQXTqfVGQsyTcpg5HLZvlyb
      qm9oaXpZY4yP/SLF6Pc00/iDTleSxGROyqhsaBotXpqSSC3rv92D9Zas/Xdz3lHD
      NSY9EVsiFId7O4OkvLuZVDvZQwKBgQD50Rs873/yUdyCwKx9/GF4yWVRg7//FTyQ
      Cj1KCBK5tDqOc+hiIS1GF0HRkcvIot71owTe+PG9OouXlUuxWrtc+fzgGSPaYjMp
      Ub69EcSNtUsK8MUS+VADbR5VDzS27OM1g+pJO7BbHpPWuEI1cjYmW/+3cCzFYnIV
      5z6OctbjHQKBgEVQWP8+EbMijXbiP4G4T+Q7OUaVjkhynzIb5X2ldA+Q41JNdoiw
      CRAATDwr1/XhKXeF3BT8JFdyUvZUs4C1BpDD1ZcYdeYocx40b5tvv7DGsNFkTNNV
      9aO76yxUsYvn6Bo22/CBxR6Ja7CJlptTclOmuo5YBggOLzWcuTNrMvVFAoGASIoV
      lK4ewuhOVZFJBRRB4Wbpiq/tEk7CVTkD7vlFJrNUxYSWl9f2Y4HhVM83Ez1n7H+3
      rF8xIrdbTVrGresguLDGYvQp2wHkxTy9W/1Ky7M25ShgsU+/kh8fTaeqsOs8Vo/F
      ehpg7TSFzTWX1Bkj7COOr19dQLuDUSTin05tY2kCgYB35ZHVDMR6TlW0Kp/l7gAx
      FQx5hojllzHr3RRv8a4rBbhsdAJGBr5QHZbzVeuw1z6NlDc/4brer3y52FnnHbD3
      fkUrvh+g1xHeXF4Yekr5Mu2D7PoQoFRRai2hjPnIHRLmHI45EPri3USoHuNPl+qB
      l23chS70zQ9VDmqEs9gjLA==
      -----END PRIVATE KEY-----
  /etc/openvpn/easy-rsa/keys/ca.key:
    owner: root
    group: root
    mode: "u=r,go="
    content: |
      -----BEGIN PRIVATE KEY-----
      MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDPm22e2QTeTnLN
      PT//6kyB8tM/2kE6+LsFD3TFA4XvS3gwNZLybjXpPtncF4qLxjq3c4uSBp2tuAa2
      VvWUCAyQX4EcOuCFhh1AIUHX9O4F2JhLtNH366D6LmfGE7Lck85R6bzErYJ5OzBN
      /3WSGtWmLbQWhXTvNwG5re17Ds7DLQ6/XRXCg91lAbtGqYCvw9F6X8N3VNdcovqN
      Ud+tJ4XjmGfPD8ZgSk/iVKeLzz5fuNxON+ygdUJ9IQJGu7kvJOhWD1F3p3lzuS4E
      7zyR8r9QK6lGdk2/ifmY5f+tmI92fvVl2HD2DroEVp42hCYEpNogm8BKXHFHBA9N
      0mugGMzVAgMBAAECggEAAK5a2rWNjYkmWUQFLLrBC4AXb1Mw+ZeNTYPydx7+1n0h
      5M6YL9Fqvdwl7NHq83BwCuAHKjB5XfOHmhuI7LZmDCc0DjqnN+jruaUiSSoVidFf
      Foh+U9jjC08RqhWwdYbKm3wv0VlcXzdxfiADa7pIzyXBPH2tl4dPqyNF7yxqQzum
      F42D4IExbYYkGR7bP6RePrUiaO3iU/EwDL5Dey4+93K+EaxbdxIhMLclvnQ8I0tl
      tFGn4AbbOqPqzPxWZhWk2gT//jMTtJh6FxQLQkvDoEnta5UYQ2E38r33jK+Wasga
      lGZEyNOTMq1MMdPrCzXloJSnerCXC4vTFt62AOdIQQKBgQD3SvUeaXV7Xf67vL0t
      EdBG9YL0Zz2MxxoVAth44svMzQ4gR6/pkakEhMzR51I/Skl/wzCJFHY1Z4nq9DoA
      RY5APjO63uHdZEKKYZ1MTXmO6F+IkUY5MCBvyCtLsnkAcToyuyDuhV4NBfjydw6E
      L5S1H9NI7klvaPxq5I7KkzSeJQKBgQDW6sDBvi6ctV3w8GCTUjP5Ker1FuKYL7Yn
      HI6RIGnWB2hS8NbEe8ODgzsVOVnC6x+WCNBiu/GmF8wlue7PCH7rLEa8diiM+J9/
      QYXtezfLIhPqhPJZDj5IX7bIotkvUzv+ywvUfCtJ3aCAu8DMi09x1GRgU6go/4ZK
      SCmVmj588QKBgQCNhr2gCRTuZM37nbnayF4drjajL06/eddIfRdsn8epTxWtjbl0
      gCNt7Z7W5n9gr2A/GXN2kFpSmA4LhHiJXUVbKP4sDZDQRqf6UIFYgOJ30i+SlinN
      Yui9cJ6utNahVSvMiuH/AB7iby+ZfF+3cQ+3VR5zl8Q5WalUd7fs4bB0bQKBgBI1
      x+lipO5wS6pro7M35uF41Mi5jK+ac1OzDr1rQqx46jUE5R224uUUzH/K4Tkr1PxQ
      eN+0zw/kuk6EB6ERNjfVA5VaaaswMcuFkMSDiUGz/H4Fj8dN9qcJPSKY8dAZvF6l
      c7YoYz6aAcyGnBp4v12EwpCK5he7NvS6UpOzgxHxAoGBAOjiBQtwikKLzLYwg1gF
      QYh1TLvEJIRFYEFQveVUKxmSskN4W6VQrTrcqobYHM9tOSbSe+Ib/y/khpaEz0PE
      E5gxeUbxhTj0PVvOKJmyCKWDPL8o61MGVhX1nAJarfbdP1XM9fl4S3pZH14bIhOU
      FG0e4jNsDq6vdwytV9R/GyAv
      -----END PRIVATE KEY-----

#######

# tasks/main.yml
#
# Leverage the fact that our ".kitchen.yml" file is setting the hostname of
# test VMs to "test-kitchen". Using "with_first_found" we can load the
# unencrypted "vpn-secrets-test-kitchen.yml" for test VMs, otherwise load the
# Ansible Vault-encrypted "vpn-secrets-prod.yml" file.
#
# Use "no_log: true" to keep from echoing the key contents to stdout.
# See: http://docs.ansible.com/faq.html#how-do-i-keep-secret-data-in-my-playbook
#
- name: VPN Server | Load VPN secret keys
  include_vars: "{{ item }}"
  no_log: true
  with_first_found:
    - "vpn-secrets-{{ ansible_hostname }}.yml"
    - "vpn-secrets-prod.yml"

- name: VPN Server | Copy secret files
  copy:
    dest="{{ item.key }}"
    content="{{ item.value.content }}"
    owner="{{ item.value.owner }}"
    group="{{ item.value.group }}"
    mode="{{ item.value.mode }}"
  with_dict: vpn_secret_files
  no_log: true
  notify:
    - restart openvpn

The magic lies in the “with_first_found” argument above. In the Test Kitchen environment “vpn-secrets-{{ ansible_hostname }}.yml” will interpolate to “vpn-secrets-test-kitchen.yml” because of our well-defined hostname. Since this “vpn-secrets-test-kitchen.yml” file exists in unencrypted form under “vars/”, Ansible will grab that var file for your Test Kitchen environment. If the hostname is something other than “test-kitchen” (ie. production), then Ansible’s “with_first_found” will reach the “vpn-secrets-prod.yml” var file, which is encrypted with Vault and will require a password to unlock and proceed.

Sanity Checking Ourselves with Serverspec

Now that we have Vault working nicely with Test Kitchen, a final step would be to add automated tests to make sure that we are indeed deploying files with the correct permissions, now and in the future. For more details on using Ansible & Test Kitchen with Serverspec, see Testing Ansible Roles with Test Kitchen. Here’s what a Serverspec test for our above files would look like:

# test/integration/default/serverspec/secret_keys_spec.rb

require 'serverspec'

# Secret keys should not be world readable.
secret_keys = [
  '/etc/openvpn/dh2048.pem',
  '/etc/openvpn/ipp.txt',
  '/etc/openvpn/openvpn.key',
  '/etc/openvpn/ta.key',
  '/etc/openvpn/easy-rsa/keys/ca.key',
  '/etc/openvpn/easy-rsa/keys/ec2-openvpn.key'
]

for secret_key in secret_keys
  describe file(secret_key) do
    it { should be_file }
    it { should be_mode 400 }
    it { should be_owned_by 'root' }
    it { should be_grouped_into 'root' }
  end
end

Deploying to Production with Jenkins

A final piece of the puzzle to figure out was how to actually run “ansible-playbook” with a code base that utilizes Ansible Vault within the context of a job-runner like Jenkins. In order words, how to provide Jenkins with the password to unlock the Vault. I found a couple of options here:

  • Put the Vault password into a locked-down file (mode 400) on your Jenkins slaves that run Ansible. This only works if your Jenkins slaves have some level of security around the users that Jenkins uses. I’m not crazy about passwords in text files, but in theory this shouldn’t be any worse than a locked-down, 400-mode file like those in “/etc/sudoers.d/…”.
  • Modify the Jenkins job that runs Ansible to require a Password parameter, run “ansible-playbook” within that job with that password parameter being echo’d in, and then use the Jenkins Mask Passwords plugin to mask the contents of that password from your build logs. The downside of this is that it complicates automated execution of the Jenkins job that invokes Ansible as it now requires a password to be invoked.
  • Store the Ansible Vault password in another secret management system like HashiCorp’s Vault. This starts to get pretty meta 🙂

Ultimately you have to decide which of these three options fits best within your infrastructure and workflow.

Conclusion

There you have it, my two-part guide to using Ansible Vault from soup to nuts. Hopefully you’ve found these notes to be useful in getting an end to end system for securely managing your infrastructure’s secrets. Please let me know in the comments if I’ve left anything out. Thanks!

ansible_logo_black_square

Advertisements

Testing Ansible Roles with Test Kitchen

Recently while attending DevOps Days Austin 2015, I participated in a breakout session focused on how to test code for configuration management tools like Puppet, Chef, and Ansible. Having started to use Ansible to manage our infrastructure at Delphix I was searching for a way to automate the testing of our configuration management code across a variety of platforms, including Ubuntu, CentOS, RHEL, and Delphix’s custom Illumos-based OS, DelphixOS. Dealing with testing across all of those platforms is a seemingly daunting task to say the least!

Intro to Test Kitchen

The conversation in that breakout session introduced me to Test Kitchen (GitHub), a tool that I’ve been very impressed by and have had quite a bit of fun writing tests for. Test Kitchen is a tool for automated testing of configuration management code written for tools like Ansible. It automates the process of spinning up test VMs, running your configuration management tool against those VMs, executing verification tests against those VMs, and then tearing down the test VMs.

What’s makes Test Kitchen so powerful and useful is its modular design:

Using Test Kitchen

After learning about Test Kitchen at the DevOps Days conference, I did some more research and stumbled across the following presentation which was instrumental in getting started with Test Kitchen and Ansible: Testing Ansible Roles with Test Kitchen, Serverspec and RSpec (SlideShare).

In summary one needs to add three files to their Ansible role to begin using Test Kitchen:

  • A “.kitchen.yml” file at the top-level. This file describes:
    • The driver to use for VM provisioning. Ex: Vagrant, AWS, Docker, etc.
    • The provisioner to use. Ex: Puppet, Chef, Ansible.
    • A list of 1 or more operating to test against. Ex: Ubuntu 12.04, Ubuntu 14.04, CentOS 6.5, or even a custom VM image specified by URL.
    • A list of test suites to run.
  • A “test/integration/test-suite-name/test-suite-name.yml” file which contains the Ansible playbook to be applied.
  • One or more test files in “test/integration/test-suite-name/test-driver-name/”. For example, when using the BATS test-runner to run a test suite named “default”: “test/integration/default/bats/my-test.bats”.

Example Code

A full example of Test Kitchen w/Ansible is available via the delphix.package-caching-proxy Ansible role in Delphix’s GitHub repo. Here are direct links to the aforementioned files/directories:682240

Running Test Kitchen

Using Test Kitchen couldn’t be easier. From the directory that contains your “.kitchen.yml” file, just run “kitchen test” to automatically create your VMs, configure them, and run tests against them:

$ kitchen test
-----> Starting Kitchen (v1.4.1)
-----> Cleaning up any prior instances of 
-----> Destroying ...
 Finished destroying  (0m0.00s).
-----> Testing 
-----> Creating ...
 Bringing machine 'default' up with 'virtualbox' provider...
 ==> default: Importing base box 'opscode-ubuntu-14.04'...
==> default: Matching MAC address for NAT networking...
 ==> default: Setting the name of the VM: kitchen-ansible-package-caching-proxy-default-ubuntu-1404_default_1435180384440_80322
 ==> default: Clearing any previously set network interfaces...
 ==> default: Preparing network interfaces based on configuration...
 default: Adapter 1: nat
 ==> default: Forwarding ports...
 default: 22 => 2222 (adapter 1)
 ==> default: Booting VM...
 ==> default: Waiting for machine to boot. This may take a few minutes...

..  ...

-----> Running bats test suite
 ✓ Accessing the apt-cacher-ng vhost should load the configuration page for Apt-Cacher-NG
 ✓ Hitting the apt-cacher proxy on the proxy port should succeed
 ✓ The previous command that hit ftp.debian.org should have placed some files in the cache
 ✓ Accessing the devpi server on port 3141 should return a valid JSON response
 ✓ Accessing the devpi server via the nginx vhost should return a valid JSON response
 ✓ Downloading a Python package via our PyPI proxy should succeed
 ✓ We should still be able to install Python packages when the devpi contianer's backend is broken
 ✓ The vhost for the docker registry should be available
 ✓ The docker registry's /_ping url should return valid JSON
 ✓ The docker registry's /v1/_ping url should return valid JSON
 ✓ The front-end serer's root url should return http 204
 ✓ The front-end server's /_status location should return statistics from our web server
 ✓ Accessing http://www.google.com through our proxy should always return a cache miss
 ✓ Downloading a file that is not in the cache should result in a cache miss
 ✓ Downloading a file that is in the cache should result in a cache hit
 ✓ Setting the header 'X-Refresh: true' should result in a bypass of the cache
 ✓ Trying to purge when it's not in the cache should return 404
 ✓ Downloading the file again after purging from the cache should yield a cache miss
 ✓ The yum repo's vhost should return HTTP 200

 19 tests, 0 failures
 Finished verifying  (1m52.26s).
-----> Kitchen is finished. (1m52.49s)

And there you have it, one command to automate your entire VM testing workflow!

Next Steps

Giving individual developers on our team the ability to quickly run a suite of automated tests is a big win, but that’s only the first step. The workflow we’re planning is to have Jenkins also run these automated Ansible tests every time someone pushes to our git repo. If those tests succeed we can automatically trigger a run of Ansible against our production inventory. If, on the other hand, the Jenkins job which runs the tests is failing (red), we can use that to prevent Ansible from running against our production inventory. This would be a big win for validating infrastructure changes before pushing them to production.

ansible_logo_black_square

Ansible Role for Package Hosting & Caching

The Operations Team at Delphix has recently published an Ansible role called delphix.package-caching-proxy. We are using this role internally to host binary packages (ex. RPMs, Python “pip” packages), as well as to locally cache external packages that our infrastructure and build process depends upon.

This role provides:

It also provides the hooks for monitoring the front-end Nginx server through collectd. More details here.

Why Is this Useful?

This sort of infrastructure can be useful in a variety of situations, for example:

  • When your organization has remote offices/employees whose productivity would benefit from having fast, local access to large binaries like ISOs, OVAs, or OS packages.
  • When your dev process depends on external dependencies from services that are susceptible to outages, ex. NPM or PyPI.
  • When your dev process depends on third-party artifacts that are pinned to certain versions and you want a local copy of those pinned dependencies in case those specific versions become unavailable in the future.

Sample Usage

While there are a verity of configuration options for this role, the default configuration can be deployed with an Ansible playbook as simple as the following:

---
- hosts: all
  roles:
    - delphix.package-caching-proxy

Underneath the Covers

This role works by deploying a front-end Nginx webserver to do HTTP caching, and also configures several Nginx server blocks (analogous to Apache vhosts) which delegate to Docker containers for the apps that run the Docker Registry, the PyPI server, etc.

Downloading, Source Code, and Additional Documentation/Examples

This role is hosted in Ansible Galaxy at https://galaxy.ansible.com/list#/roles/3008, and the source code is available on GitHub at: https://github.com/delphix/ansible-package-caching-proxy.

Additional documentation and examples are available in the README.md file in the GitHub repo, at: https://github.com/delphix/ansible-package-caching-proxy/blob/master/README.md

Acknowledgements

Shoutouts to some deserving folks:

  • My former Development Infrastructure Engineering Team at VMware, who proved this idea out by implementing a similar set of caching proxy servers for our global remote offices in order to improve developer productivity.
  • The folks who conceived the Snakes on a Plane Docker Global Hack Day project.

Ansible vs Puppet – Hands-On with Ansible

This is part 2/2 in a series. For part #1 see: Ansible vs Puppet – An Overview of the Solutions.

Notes & Findings From Going Hands-On with Ansible

After playing with Ansible for a week to Ansible-ize Graphite/Grafana (via Docker) and Jenkins (via an Ansible Galaxy role), here are my notes about Ansible:

  • “Batteries Included” and OSS Module Quality
    • While Ansible does include more modules out of the box, the “batteries included” claim is misleading. IMO an Ansible shop will have to rely heavily upon Ansible Galaxy to find community-created modules (Ex: for installing Jenkins, dockerd, or ntp), just as a Puppet shop would have to rely upon PuppetForge.
    • The quality and quantity of the modules on Ansible Galaxy is about on par with what is available at PuppetForge. Just as with PuppetForge, there are multiple implementations for any given module (ex: nginx, ntp, jenkins), each with their own quirks, strengths, and deficiencies.
    • Perhaps this is a deficiency of all of the configuration management systems. Ultimately a shop’s familiarity with Python or Ruby may add some preference here.
  • Package Installations
    • Coming from Puppet-land this seemed worthy of pointing out: Ansible does not abstract an OS’s package manager the same way that Puppet does with the “package” resource. Users explicitly call out the package manager to be used. Ex: the “apt” module or “yum” module. One can see that Ansible provides a tad bit less abstraction. FWIW a package installed via “pip” or “gem” in Puppet still requires explicit naming of the package provider. Not saying that either is better or worse here. Just a noticable difference to an Ansible newbie.
  • Programming Language Constructs
  • Noop Mode
  • Agent-less
    • Ansible’s agent-less, SSH-based push workflow actually was notably easier to deal with than a Puppetmaster, slave agents, SSL certs, etc.
  • Learning Curve
    • If I use my imagination and pretend that I was starting to use a configuration management tool for the first time, I perceive that I’d have an easier time picking up Ansible. Even though I’m not a fan of YAML by any stretch of the imagination, Ansible playbooks are a bit easier to write & understand than Puppet manifests.

Conclusions

After three years of using Puppet at VMware and Virtual Instruments, the thought of not continuing to use the market leader in configuration management tools seemed like a radical idea when it was first suggested to me. After spending several weeks researching Ansible and using it hands-on, I came to the conclusion that Ansible is a perfectly viable alternative to Puppet. I tend to agree with Lyft’s conclusion that if you have a centralized Ops team in change of deployments then they can own a Puppet codebase. On the other hand if you want more wide-spread ownership of your configuration management scripts, a tool with a shallower learning curve like Ansible is a better choice.

Running Docker Containers on Windows & Mac via Vagrant

If you ever wanted to run a Docker container on a non-Linux platform (ex: Windows or Mac), here’s a “Vagrantfile” which will allow you to do that quickly and easily with Vagrant.

The Vagrantfile

For the purposes of this post, suppose that we want to run the “tutum/wordpress” Docker container on our Mac. That WordPress container comes with everything needed for a fully-functioning WordPress CMS installation, including MySQL and WordPress’s other dependencies.

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  config.vm.box = "puphpet/ubuntu1404-x64"

  config.vm.provision "docker" do |d|
    d.run "tutum/wordpress", args: "-p '80:80'"
  end

  config.vm.network "forwarded_port", guest: 80, host: 8080

end

Explanation

  • This “Vagrantfile” will download the “puphpet/ubuntu1404-x64” Vagrant box which is a widely-used VirtualBox/VMware image of Ubuntu 14.04.
  • Once that Vagrant box is downloaded and the VM has booted, Vagrant will run the “docker” provisioner. The “docker” provisioner will the download and run the “tutum/wordpress” Docker container, with the “docker run” argument to expose port 80 of the container to port 80 of the Ubuntu 14.04 OS.
  • The final line of our “Vagrantfile” tells Vagrant to expose port 80 of the Ubuntu guest OS to port 8080 of our host OS (i.e., Windows or Mac OS). When we access http://localhost:8080 from our host OS, that TCP traffic will be transparently forwarded to port 80 of the guest OS which will then transparently forward the traffic to port 80 of the container. Neat!

Results

After running “vagrant up” the necessary Vagrant box and Docker container downloads will start automatically:

==> default: Waiting for machine to boot. This may take a few minutes
...
==> default: Machine booted and ready!
==> default: Forwarding ports...
 default: -- 80 => 8080
 default: -- 22 => 2222
==> default: Configuring network adapters within the VM...
==> default: Waiting for HGFS kernel module to load...
==> default: Enabling and configuring shared folders...
 default: -- /Users/tehranian/Downloads/boxes/docker-wordpress: /vagrant
==> default: Running provisioner: docker...
 default: Installing Docker (latest) onto machine...
 default: Configuring Docker to autostart containers...
==> default: Starting Docker containers...
==> default: -- Container: tutum/wordpress

Once our “vagrant up” has completed, we can access the WordPress app that is running within the Docker container by pointing our web browser to http://localhost:8080. This takes us to the WordPress setup wizard where we can finish the installation and configuration of WordPress:

Wordpress Setup Wizard

 

Voila! A Docker container running quickly and easily on your Mac or Windows PC!

Testing Puppet Code with Vagrant

At Virtual Instruments we use Vagrant boxes to locally test our Puppet changes before pushing those changes into production. Here are some details about how we do this.

Puppet Support in Vagrant

Vagrant has built-in support for using Puppet as a machine provisioner, either by contacting a Puppet master to receive modules and manifests or by running “puppet apply” with a local set of modules and manifests (aka. masterless Puppet). We chose to use masterless Puppet with Vagrant in our test environment due to its simplicity of setup.

Starting with a Box for the Base OS

Before we can use Puppet to provision our machine, we need to have a base OS available with Puppet installed. At Virtual Instruments our R&D infrastructure is standardized on Ubuntu 12.04 which means that we want our Vagrant base box to be a otherwise minimal installation of Ubuntu 12.04 with Puppet also installed. Luckily this is a very common configuration and there are pre-made Vagrant boxes available for download at VagrantCloud.com. We’re going to use the box named “puppetlabs/ubuntu-12.04-64-puppet“.

If you are using a different OS you can search the Vagrant Cloud site for a Vagrant base box that matches the OS of your choice. See: https://vagrantcloud.com/discover/featured

If you can find a base box for your OS but not a base box for that OS which has Puppet pre-installed, you can use one of @mitchellh‘s nifty Puppet-bootstrap scripts with a Vagrant Shell Provisioner to get Puppet installed into your base box. See the README included in that repo for details: https://github.com/hashicorp/puppet-bootstrap/blob/master/README.md

The Vagrantfile

Having found a suitable base box, one can use the following “Vagrantfile” to start that box and invoke Puppet to provision the machine.

VAGRANTFILE_API_VERSION = "2"

# set the following hostname to a name that Puppet will match against. ex:
# "vi-cron9.lab.vi.local"
MY_HOSTNAME = "vi-nginx-proxy9.lab.vi.local"


Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # from: https://vagrantcloud.com/search?utf8=✓&sort=&provider=&q=puppetlabs+12.04
  config.vm.box = "puppetlabs/ubuntu-12.04-64-puppet"
  config.vm.hostname = MY_HOSTNAME

  # needed to load hiera data for puppet
  config.vm.synced_folder "hieradata", "/data/puppet-production/hieradata"

  # Vagrant/Puppet docs:
  #   http://docs.vagrantup.com/v2/provisioning/puppet_apply.html
  config.vm.provision :puppet do |puppet|
    puppet.facter = {
      "is_vagrant_vm" => "true"
    }
    puppet.hiera_config_path = "hiera.yaml"
    puppet.manifest_file  = "site.pp"
    puppet.manifests_path = "manifests"
    puppet.module_path = "modules"
    # puppet.options = "--verbose --debug"
  end

end

Breaking Down the Vagrantfile

Setting Our Hostname

# Set the following hostname to a name that Puppet will match against. ex:
# "vi-cron9.lab.vi.local"
MY_HOSTNAME = "vi-nginx-proxy9.lab.vi.local"

Puppet determines which resources to apply based on the hostname of our VM. For ease of use, our “Vagrantfile” has a variable called “MY_HOSTNAME” defined at the top of the file which allows users to easily define which node they want to provision locally.

Defining Which Box to Use

# From: https://vagrantcloud.com/search?utf8=✓&sort=&provider=&q=puppetlabs+12.04
config.vm.box = "puppetlabs/ubuntu-12.04-64-puppet"

The value for “config.vm.box” is the name of the box we found on vagrantcloud.com. This allows Vagrant to automatically download the base VM image from the Vagrant Cloud service.

Puppet-Specific Configurations

  # Needed to load Hiera data for Puppet
  config.vm.synced_folder "hieradata", "/data/puppet-production/hieradata"

  # Vagrant/Puppet docs:
  #   http://docs.vagrantup.com/v2/provisioning/puppet_apply.html
  config.vm.provision :puppet do |puppet|
    puppet.facter = {
      "is_vagrant_vm" => "true"
    }
    puppet.hiera_config_path = "hiera.yaml"
    puppet.manifest_file  = "site.pp"
    puppet.manifests_path = "manifests"
    puppet.module_path = "modules"
    # puppet.options = "--verbose --debug"
  end

Here we are setting up the configuration of the Puppet provisioner. See the full documentation for Vagrant’s masterless Puppet provisioner at: https://docs.vagrantup.com/v2/provisioning/puppet_apply.html

Basically this code:

  • Sets up a shared folder to make our Hiera data available to the guest OS
  • Set a custom Facter fact called “is_vagrant_vm” to “true“. This fact can then be used by our manifests for edge-cases around running VMs locally (like routing collectd/SAR data to a non-production Graphite server to avoid pollution of the production Graphite server)
  • Tells the Puppet provisioner where the root Puppet manifest file is and where necessary Puppet modules can be found.

Conclusion

Vagrant is a powerful tool for testing Puppet code changes locally. With a simple “vagrant up” one can fully provision a VM from scratch. One can also use the “vagrant provision” command to locally test incremental updates to Puppet code as it is iteratively being developed, or to test changes to long-running mutable VMs.

Improving Developer Productivity with Vagrant

As part of improving developer productivity at Virtual Instruments during development of  Virtual Widsom 4.0, I introduced Vagrant to the development team. At the time, the product was being re-architected from a monolithic Java app into a service oriented architecture (SOA). Without Vagrant, the challenge for a given Java developer working on any one of the Java services was that there was no integration environment available for that developer to test the respective service that they were working on. In other words, a developer could run their respective Java service locally, but without the other co-requisite services and databases they couldn’t do anything useful with it.

How Not To Solve

We could have documented a long set of instructions in a wiki, detailing how to setup and run each one of the Java services locally, along with instructions on how to setup and run each of the databases manually, but there would be several problems with this approach:

  1. Following such instructions would be a very manual, time-consuming, and mistake-prone process. The total time on such efforts would be multiplied by the size of the R&D team as each developer would have to duplicate this effort on their own.
  2. Such instructions would be a “living document“, continually changing over time. This means that if Jack followed the instructions on Day X, the instructions that Jane followed on Day X+Y could be potentially different and lead to two very different integration environments.
  3. All of our developers were running Mac OS or Windows laptops, but the product environment was SuSE Linux Enterprise Server 11 (SLES 11). Regardless of how complete our instructions on how to setup the environment could be, there would still be the issue of consistency of environment. If developers were to test their Java services in hand-crafted environments that were not identical to the actual environment that QA tested in or that the customer ran the product in, then we would be sure to hit issues where functionality would work in one developer’s environment, but not in QA or in the customer’s environment! (i.e., “It worked on my box!”)

A Better Approach

logo-952f2ab5

Turning our integration environment into a portable Vagrant box (a virtual machine) solved all of these issues. The Vagrant box was an easily distributable artifact generated by our build process that contained fully configured instances of all of the Java services and databases that comprised our product. Developers could download the Vagrant box and get it running in minutes. The process for running the Vagrant box was so simple that even managers and directors could download a “Vagrantfile” and “vagrant up” to get a recent build running locally on their laptops. Finally, the Vagrant box generated by our build process utilized the identical SLES 11 environment that QA and customers would be running with, so developers would not be running into issues related to differences in environment. I will write a follow-up post about how we use Packer in our build process to create the Vagrant box, but for now I’ll provide some details about our Vagrant box workflow.

The “Vagrantfile”

Here’s a partial sample of our “Vagrantfile” where I’d like to call a few things out:

VAGRANTFILE_API_VERSION = "2"  # Do not modify

VM_NUM_CPUS = "4"
VM_RAM_MB = "4096"
VM_SHOW_CONSOLE = false


Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  # Box name & URL to download from
  config.vm.box = "vw-21609-431"
  config.vm.box_url = "http://devnull.vi.local/builds/aruba-images-master/aruba-images-21609/vagrant/vmware/portal_appliance.vmware.21609-431.box"

...

  # Create a private network interface, /dev/eth1. This allows host-only access
  # to the machine using a specific IP. The host OS is available to the guest
  # at the "192.168.33.1" IP address.
  config.vm.network :private_network, ip: "192.168.33.10"

  config.vm.provider "vmware_fusion" do |v|
    v.gui = VM_SHOW_CONSOLE
    v.vmx["memsize"]  = VM_RAM_MB
    v.vmx["numvcpus"] = VM_NUM_CPUS
  end

...

end

Keep in mind that the “Vagrantfile” is executable Ruby code, so there are virtually limitless possibilities for what one can accomplish depending on your needs and desired workflow.

Private Networking and Our “services.conf”

The workflow used by the developers of our Java services is to run the service that they modifying via the IDE in their host OS (ex: Eclipse or IntelliJ), and to have all other services and databases running within the Vagrant box (the guest OS). In order to facilitate the communication between the host OS and guest OS, we direct the “Vagrantfile” to create a private network with static IP addresses for the host and guest. Here our host OS will have the IP “192.168.33.1” while the guest will be available at “192.168.33.10”:

  # Create a private network interface, /dev/eth1. This allows host-only access
  # to the machine using a specific IP. The host OS is available to the guest
  # at the "192.168.33.1" IP address.
  config.vm.network :private_network, ip: "192.168.33.10"

With private networking connectivity in place, we modified our Java services to read the configuration of where to find their peer-services into a hierarchy of configuration files. Ex: When a Java service initializes, it reads the following hierarchy of configuration files to determine how to connect to the other services:

  • /etc/vi/services.conf (the default settings)
  • /vagrant/services.conf
  • ~/services.conf (highest precedence)

Sample contents for these “services.conf” files:

# /vagrant/services.conf

com.vi.ServiceA=192.168.33.1
com.vi.ServiceB=localhost
com.vi.ServiceC=192.168.33.10

The “services.conf” hierarchy allows a developer to direct the service running in their IDE/host OS to connect to the Java services running within the Vagrant box/guest OS (via “~/services.conf”), as needed. It also allows the developer to configure the services within the Vagrant box/guest OS to connect to the Java services running on the host OS via the “/vagrant/services.conf” file. One clarification – The “/vagrant/services.conf” file actually lives on the host OS in the working directory of the “Vagrantfile” that the developer downloads. The file appears as “/vagrant/services.conf” via the default shared folder provided by Vagrant. Having the “/vagrant/services.conf” live on the host OS is especially convenient as it allows for easy editing, and more importantly it provides persistence of the developer’s configuration when tearing down and re-initializing newer versions of our Vagrant box.

Easy Downloading with “box_url”

As part of our workflow I found it to be easiest to have users not download the Vagrant .box file directly, but instead to download the small (~3KB) “Vagrantfile” which in turn contains the URL for the .box file. When the user runs “vagrant up” from the cwd of this “Vagrantfile”, Vagrant will automatically detect that the Vagrant box of the respective name is not in the local library and start to download the Vagrant box from the URL listed in the “Vagrantfile”.

  # Box name & URL to download from
  config.vm.box = "vw-21609-431"
  config.vm.box_url = "http://devnull.vi.local/builds/aruba-images-master/aruba-images-21609/vagrant/vmware/portal_appliance.vmware.21609-431.box"

More details available in the Vagrant docs: http://docs.vagrantup.com/v2/vagrantfile/machine_settings.html Note: Earlier this year the authors of Vagrant released a SaaS service for box distribution called Vagrant Cloud. You may want to look into using this, along with the newer functionality of Vagrant box versioning. We are not using the Vagrant Cloud SaaS service yet as our solution pre-dates the availability of this service and there hasn’t been sufficient motivation to change our workflow.

VM Hardware Customization

In our “Vagrantfile” I wanted to make it dead-simple for people to be able to modify the hardware resources. At VI some developers had very new laptops with lots of RAM while others had older laptops. Putting the following Ruby variables at the top of the “Vagrantfile” made it easy for someone that knows absolutely nothing about Ruby to edit the hardware configuration of their Vagrant box:

VM_NUM_CPUS = "4"
VM_RAM_MB = "4096"
VM_SHOW_CONSOLE = false

...

  config.vm.provider "vmware_fusion" do |v|
    v.gui = VM_SHOW_CONSOLE
    v.vmx["memsize"]  = VM_RAM_MB
    v.vmx["numvcpus"] = VM_NUM_CPUS
  end

Conclusion

In developing an SOA application, having a Vagrant box for developers to integrate their services that are under development has been a enormous boon for developer productivity. Downloading and running a Vagrant box is orders of magnitude faster than configuring and starting services by hand. The Vagrant box also solves the problem of “consistency of environment”, allowing developers to run their code in an environment that closely matches the QA/customer environment. In the post-mortem analysis of our Virtual Wisdom 4.0 release, having Vagrant boxes for developer integration of our Java services was identified as one of the big “wins” of the release. As the Director of Engineering said, “Without the developer productivity gains from Vagrant, we would not have been able to ship VirtualWisdom 4.0 when we did.”