Docker is the Heroku Killer

Docker | Heroku | - May 14, 2014 // Barry

After getting an intense look at Docker last night, I firmly believe that it is going to be the most disruptive server technology that we've seen in the last few years. It fills a much needed hole that's currently managed by very expensive solutions and it's being actively funded by some of the biggest players in the market.

Full discussion on Hacker News.

Last night I went to the Docker Meetup in Greenville for an in depth look at what it is capable of doing. After taking a peek under the hood, I came away with some strong impressions.

First, let's look at what Docker is though. Docker is classified as "an easy, lightweight virtualized environment for portable applications." what exactly does that mean.

Basically it means that Docker is actively working to replace the need for hypervisors, virtual machines (VMs) and configuration management tools like Puppet / Chef / CFEngine in MOST cases. It does that by isolating application requirements within a Linux installation so that it's as if the Docker application is running within a VM without any of the actual overhead of a VM. Docker is coded without an abstraction layer and interacts directly with the Linux kernel. Additionally, when different containers need modified parts they reuse whatever existing libaries they can and only create independent copies for things they need to modify.

A very intensive benchmark of Docker vs KVM with OpenStack was put together a couple of weeks ago by Boden Russell. Docker wins going away with approximately a 26 to 1 performance improvement. With a VM you have a host operating system (OS), a very expensive hypervisor, and then all of the virtual machines running on top of it. The virtual machines are all booting up their own OS, managing their own processes and interacting with the layers below them. With Docker you get the majority of the benefits of a virtual machine without any of the extra overhead from running additional operating systems.

How is this disruptive to...?

I'm glad you asked.

case: Puppet/Chef/CFEngine

Configuration management tools have become a huge deal in the development stack specifically so that people can package application specific VM configurations with their code base. This allows for an entire VM to be configured to the needs of an application, and then kept up to date as those needs change.

Now, virtual machines aren't going anywhere so these tools are still going to be needed...but, where previously developers needed to be able to specify the entire configuration of a VM with one of these, Docker allows just specifying the needs of their application instead. That's much more straight forward and a whole lot simpler than locking down a Linux installation, setting up private VPNs in a cloud environment, managing port access, log aggregation, etc. With Docker, "DevOps" can separate back out to Dev/Docker and Ops/Puppet. Ops will configure a single VM with Puppet/Chef/CFEngine and developers will drop their containers on it.

This is also incredibly important for developers. Tools like Vagrant that use Puppet to configure VMs for developers locally are great...unless you're dealing with an application that has a lot of moving parts and thus a lot of VMs. Running 4-5 virtual machines on your laptop is fairly resource intensive. With Docker, you can run one VM with all of your application containers deployed to it.

case: Heroku / PaaS Providers

Heroku is awesome. Let's just get that out of the way. Heroku lets you push your code up via git, analyzes it to see what type of application it is, configures your environment and downloads all of your needed packages and then runs your processes in little isolated "Dynos". They manage the entire environment and developers get to focus on their code without having to worry about the hosting aspects of things. Their support is great and there's an entire ecosystem surrounding it for simple, scalable 3rd party integrations. When you need to scale, you can do so horizontally very easily by just adding more dynos to handle web processes, background workers, etc.

But Heroku is expensive. A single dyno is basically 1 core, with 512mb of RAM, and an additional 1gb of swap. There's also no permanent file storage (you need to use S3) and it's locked into Amazon's datacenters. The price for that is $34.50 / month.

By comparison, Digital Ocean can give me a 512mb virtual machine, 1 core, and a 20gb SSD for $5 / month. Heroku's high-end PX Dynos provide 6GB of RAM, 8 cores for approximately $552 / month. For $480 / month with Digital Ocean I can get a VM with 16 cores and 48gb of RAM as well as 480gb of SSD storage.

Now, those prices have been like that for a while but managing VMs was still a decent amount of work compared to just pushing up to Heroku. With Digital Ocean's new Docker-ready images though, I can provision a VM and deploy my little application container to an environment that will configure itself...just like Heroku. In fact, within minutes of tweeting my initial impressions of Docker I got a reply from @tutumcloud letting me know about a tool called Buildstep that uses Heroku's very own open sourced buildpacks to create a Dockerfile for your application...automatically. There's even a tool called Dokku which provides a "Docker powered mini-Heroku in around 100 lines of Bash."

case: Hypervisors

Digital Ocean isn't the only company that's working to make Docker a first class citizen either. RedHat is pushing it as part of its innovative OpenShift platform. Amazon is actively funding it as well. Eventually, you're going to see native Docker deployments in every cloud hosting company out there because it will allow them to provide that Heroku-like experience AND let them get more mileage out of their hardware. If these companies are eventually able to cut out the expensive hypervisors, it will reduce their operating costs on a large scale AND provide more resources to their customers.

You can use Docker to deploy to VMs or bare metal servers, meaning that companies like Rackspace who provide both bare metal and cloud hosting will enable their customers to have a very clean scaling path moving from resource constrained cloud servers to beastly high end rack mounted servers.

It will also make things easier on IT departments who want to provide a cloud-like hosting experience for their development teams but want to do so in their own datacenters.

So what are the drawbacks?

There aren't many, but the biggest drawback to be aware of is that because these applications are all deployed in isolation you don't get the perks of deploying into a shared environment. For example, if you have a Dockerfile with a PHP system and deploy it, Docker will run a separate copy of apache/nginx/fpm for that PHP application. That doesn't matter much if you wanted to deploy the application to its own virtual machine anyway, but if you intended to deploy many applications to one server there are more resource efficient ways of doing that.

Application Servers like Jboss / Wildfly / Torquebox which let you deploy many JVM based applications across a managed cluster that provides background processing, queueing, messaging, and load balancing fall in the same category. You may want to have a Dockerfile to handle configuring JBoss while still deploying multiple applications to that container.

What does a Dockerfile look like?

Here's an example of a complex Dockerfile to setup Torquebox. This installs Java, installs Torquebox, creates a user, gives permission to the user to run Torquebox, sets proper environment variables, and opens all necessary about 30 lines of code.

Stop drooling. Stop it. You're going to mess up your keyboard.