Ansible vs Puppet – Hands-On with Ansible

This is part 2/2 in a series. For part #1 see: Ansible vs Puppet – An Overview of the Solutions.

Notes & Findings From Going Hands-On with Ansible

After playing with Ansible for a week to Ansible-ize Graphite/Grafana (via Docker) and Jenkins (via an Ansible Galaxy role), here are my notes about Ansible:

  • “Batteries Included” and OSS Module Quality
    • While Ansible does include more modules out of the box, the “batteries included” claim is misleading. IMO an Ansible shop will have to rely heavily upon Ansible Galaxy to find community-created modules (Ex: for installing Jenkins, dockerd, or ntp), just as a Puppet shop would have to rely upon PuppetForge.
    • The quality and quantity of the modules on Ansible Galaxy is about on par with what is available at PuppetForge. Just as with PuppetForge, there are multiple implementations for any given module (ex: nginx, ntp, jenkins), each with their own quirks, strengths, and deficiencies.
    • Perhaps this is a deficiency of all of the configuration management systems. Ultimately a shop’s familiarity with Python or Ruby may add some preference here.
  • Package Installations
    • Coming from Puppet-land this seemed worthy of pointing out: Ansible does not abstract an OS’s package manager the same way that Puppet does with the “package” resource. Users explicitly call out the package manager to be used. Ex: the “apt” module or “yum” module. One can see that Ansible provides a tad bit less abstraction. FWIW a package installed via “pip” or “gem” in Puppet still requires explicit naming of the package provider. Not saying that either is better or worse here. Just a noticable difference to an Ansible newbie.
  • Programming Language Constructs
  • Noop Mode
  • Agent-less
    • Ansible’s agent-less, SSH-based push workflow actually was notably easier to deal with than a Puppetmaster, slave agents, SSL certs, etc.
  • Learning Curve
    • If I use my imagination and pretend that I was starting to use a configuration management tool for the first time, I perceive that I’d have an easier time picking up Ansible. Even though I’m not a fan of YAML by any stretch of the imagination, Ansible playbooks are a bit easier to write & understand than Puppet manifests.


After three years of using Puppet at VMware and Virtual Instruments, the thought of not continuing to use the market leader in configuration management tools seemed like a radical idea when it was first suggested to me. After spending several weeks researching Ansible and using it hands-on, I came to the conclusion that Ansible is a perfectly viable alternative to Puppet. I tend to agree with Lyft’s conclusion that if you have a centralized Ops team in change of deployments then they can own a Puppet codebase. On the other hand if you want more wide-spread ownership of your configuration management scripts, a tool with a shallower learning curve like Ansible is a better choice.


Building a Better Dashboard for Virtual Infrastructure


We built a pretty sweet dashboard for our R&D infrastructure using Graphite, Grafana, collectd, and a home-made VMware VIM API data collector script.

Screenshot of Our Dashboard

Background – Tracking Build Times

The second major project I worked on after joining Virtual Instruments resulted in improving the product build time performance 15x, from 5 hours down to 20 minutes. Having spent a lot of time and effort to accomplish that goal, I wanted to setup tracking of build time metrics going forward by having the build time of each successful build recorded into a system where it could be visualized. Setting up the collection and visualization of this data seemed like a great intern project and to that end for the Summer of 2014 we brought back our awesome intern from the previous Summer, Ryan Henrick.

Within a few weeks Ryan was able to setup a Graphite instance and add a “post-build action” for each of our Jenkins jobs (via our DSL’d job definitions in SCM) in order to have Jenkins push this build time data to the Graphite server. From Graphite’s web UI we could then visualize this build time data for each of the components of our product.

Here’s an example of our build time data visualized in Graphite. As you can see it is functionally correct, but it’s difficult to get excited about using something like this:

Graphite Visualization of Build Times

Setting up the Graphite server with Puppet looks something like this:

  class { 'graphite':
    gr_max_creates_per_minute    => 'inf',
    gr_max_updates_per_second    => 2000,

    # This configuration reads from top to bottom, returning first match
    # of a regex pattern. "default" catches anything that did not match.
    gr_storage_schemas        => [
        name                => 'build',
        pattern             => '^build.*',
        retentions          => '1h:60d,12h:5y'
        name                => 'carbon',
        pattern             => '^carbon\.',
        retentions          => '1m:1d,10m:20d'
        name                => 'collectd',
        pattern             => '^collectd.*',
        retentions          => '10s:20m,30s:1h,3m:6h,12m:1d,1h:7d,4h:60d,1d:2y'
        name                => 'vCenter',
        pattern             => '^vCenter.*',
        retentions          => '1m:12h,5m:1d,1h:7d,4h:2y'
        name                => 'default',
        pattern             => '.*',
        retentions          => '10s:20m,30s:1h'
    gr_timezone                => 'America/Los_Angeles',
    gr_web_cors_allow_from_all => true,

There are some noteworthy settings there:

  • gr_storage_schemas – This is how you define the stat roll-ups, ie. “10 second data for the first 2 hours, 1 minute data for the next 24 hours, 10 minute data for the next 7 days”, etc. See Graphite docs for details.
  • gr_max_creates_per_minute – This limits how many new “buckets” Graphite will create per minute. If you leave it at the default of “50” and then unleash all kinds of new data/metrics onto Graphite, it may take some time for the database files for each of the new metrics to be created. See Graphite docs for details.
  • gr_web_cors_allow_from_all – This is needed for the Grafana UI to talk to Graphite (see later in this post)

Collecting More Data with “collectd”

After completing the effort to log all Jenkins build time data to Graphite, our attention turned to what other sorts of things we could measure and visualize. We did a 1-day R&D Hackathon project around collecting metrics from the various production web apps that we run within the R&D org, ex: Jira, Confluence, Jenkins, Nexus, Docker Registry, etc.

Over the course of doing our Hackathon project we realized that there was a plethora of tools available to integrate with Graphite. Most interesting to us was “collectd” with its extensive set of plugins, and Grafana, a very rich Node.js UI app for rendering Graphite data in much more impressive ways than the static “Build Time” chart that I included above.

We quickly got to work and leveraged Puppet’s collectd module to collect metrics about CPU, RAM, network, disk IO, disk space, and swap activity from the guest OS’s of all of our production VMs that run Puppet. We were also able to quickly implement collectd’s postgresql and nginx plugins to measure stats about the aforementioned web apps.

Using Puppet, we can deploy collectd onto all of our production VMs and configure it to send its data to our Graphite server with something like:

# Install collectd for SLES & Ubuntu hosts; Not on OpenSuSE (yet)
  case $::operatingsystem {
    'SLES', 'Ubuntu': {
      class { '::collectd':
        purge        => true,
        recurse      => true,
        purge_config => true,
        version      => installed,
      class { 'collectd::plugin::df': }
      class { 'collectd::plugin::disk':
        disks          => ['/^dm/'],
        ignoreselected => true,
      class { 'collectd::plugin::swap':
        reportbydevice => false,
        reportbytes    => true,
      class { 'collectd::plugin::write_graphite':
        protocol     => 'tcp',
        graphitehost => '',
      collectd::plugin { 'cpu': }
      collectd::plugin { 'interface': }
      collectd::plugin { 'load': }
      collectd::plugin { 'memory': }
      collectd::plugin { 'nfs': }
    default: {}

For another example w/details on monitoring an HTTP server, see my previous post: Private Docker Registry w/Nginx Proxy for Stats Collection

Collecting vCenter Data with pyVmomi

After the completion of the Hackathon, I pointed the intern to the pyvmomi Python module and the sample scripts available at, and told him to get crackin’ on making a Python data collector that would connect to our vCenter instance, grab metrics about ESXi cluster CPU & memory usage, VMFS datastore usage, along with per-VM CPU and memory usage, and ship that data off to our Graphite server.

Building a Beautiful Dashboard with Grafana

With all of this collectd and vCenter data now being collected in Graphite, the only thing left was to create a beautiful dashboard for visualizing what’s going on in our R&D infrastructure. We leveraged Puppet’s grafana module to setup a Grafana instance.

Grafana is a Node.js app that talks directly to Graphite’s WWW API (bypassing Graphite’s very rudimentary static image rendering layer) and pulls in the raw series data that you’ve stored in Graphite. Grafana allows you to build dashboards suitable for viewing in your web browser or for tossing up a TV display. It has all kinds of sexy UI features like drag-to-zoom, mouse-over-for-info, and annotation of events.

There’s a great live demo of Grafana available at:

Setting up the Grafana server with Puppet is pretty simple. The one trick is that in order to save your Dashboards you need to setup Elasticsearch:

  class { 'elasticsearch':
    package_url => '',
    java_install => true,

  elasticsearch::instance { 'grafana': }

  class { 'nginx':
      confd_purge => true,
      vhost_purge => true,

  nginx::resource::vhost { 'localhost':
    www_root       => '/opt/grafana',
    listen_options => 'default_server',

  class { 'grafana':
    install_dir         => '/opt',
    elasticsearch_host  => $::fqdn,

Closing Thoughts

One of the main benefits of this solution is that we can have our data from disparate data sources in one place, and plot these series against one another. Unlike vCenter’s “Performace” graphs which are slow to setup and regularly time-out before rendering, days or weeks of rolled up data will load very quickly in Graphite/Grafana.

Screenshot of Our Dashboard


  • Thanks to our intern Ryan Henrick for his work this past Summer in implementing these data collectors and creating the visualizations.
  • Thanks to Alan Grosskurth from Package Lab for pointing me to Grafana when I told him what we were up to with our Hackathon project.