My next book: Working with Cloud-Native Data on Kubernetes
Since the beginning of 2021, I’ve been super-focused on the idea of running databases on Kubernetes, and specifically Cassandra and the K8ssandra open source project. Over the past few months, I’ve formed a strong opinion: we are now at a point as an industry at which stateful workloads are not just “ok” to run on Kubernetes, but we actually need to move aggressively toward running all our cloud workloads on Kubernetes. Standardizing on Kubernetes as a platform will be an accelerator, enabling us to worry less about infrastructure and more about new capabilities.

As you can probably guess from the headline, I’ve already doubled down on this position, but let’s take a step back to see how I got there. My journey can be traced in a few key articles which I’ll summarize below.
Cassandra on Kubernetes? Skeptic to Convert
Chris Bradford’s A Case for Databases on Kubernetes from a Former Skeptic is a great description of his evolving perspective on running databases first in containers, and then orchestrating those containers in Kubernetes. Over time, his point of view changed from “no way!” to “yes, definitely!”
This mirrors my own path: when I wrote the 2nd edition of Cassandra: The Definitive Guide in 2015, I echoed the prevailing wisdom of the Cassandra community at the time regarding deploying Cassandra in Docker — “don’t try this in production”. By the time I was working on the 3rd edition in early 2020, I was ready to say “OK” to Cassandra in Docker, but the operators for managing clusters at scale in K8s weren’t yet mature.
Combining efforts on a unified Cassandra Operator
Speaking of operator maturity, the Cassandra community took a major step forward by taking the best ideas and code from several parallel Kubernetes operator projects, as Rahul Singh shared in his article for Container Journal. This unified effort was a precursor to more collaboration to come.
Building a full production data layer
As the operator development was maturing, John Sanda and others started talking about bringing together the best Cassandra tooling alongside cass-operator to produce a production-grade distribution of Cassandra known as K8ssandra. At this point, I was fully committed, and helped explain the rationale for this project in my post Why K8ssandra?
Joining a community of communities
Around this time my horizons began to broaden as I was exposed to a broader ecosystem of people and organizations working toward this common goal of stateful workloads on Kubernetes. I started participating in the Data on Kubernetes Community and was pleasantly surprised to discover it had already been running for months. You can read about the vision for the DoKC in A Call for Collaboration: Announcing the Data On Kubernetes Community.
Defining cloud-native for databases
The conversations in the DoKC got me thinking about what it meant for a database to be “cloud-native” — that is, to truly embody the characteristics required to maximize the advantages of the cloud. Further discussions with Cedrick Lunven led to a couple of posts. In the The Search for a Cloud-Native Database we proposed some definitions for cloud-native data concepts, and then built on those definitions in A Maturity Model for Cloud-Native Databases to discuss steps along the path to becoming fully cloud-native.
A Line in the Sand
Ultimately, the time came for me to admit I was “all in” on this idea of databases and other stateful workloads running on Kubernetes. I started noticing that many of the design patterns and principles embodied in K8s were really advantageous for databases seeking this “cloud nativeWhile the headline Why a Cloud-Native Database Must Run on K8s is maybe a bit stronger than I put it in the article, it’s not too much of an exaggeration.
Doubling down on data on K8s
Now, it’s time to put some weight behind my opinions and help push this conversation forward. I’m excited to share that Patrick McFadin and I have contracted with O’Reilly to write a new book which we’re calling “Managing Cloud-Native Data on Kubernetes”.
In this book, we plan to explore how the cloud commodities of compute, networking and storage can be combined to produce new cloud-native infrastructure for databases, streaming, and analytics, enabling new use cases that will drive the next decade of innovation.
How you can help
We are looking for input from the developers, operators, and data professionals that this book is for, so the book outline is available on GitHub so that you can see what will be included. Are we missing key topics? Just submit an issue or pull request.
We also intend to include several case studies in the book, with inputs from experts in a few key areas, so stay tuned as we develop those opportunities.
While the book isn’t scheduled to publish until next year, look for chapters to start appearing in the second half of 2021.
Thanks for reading this far, and for all your support of communities like Cassandra, K8ssandra and Data on Kubernetes. Take care!