When you first start learning about Kubernetes (or K8s as we affectionately call it), etcd pops up pretty early, as it is a component of Kubernetes’ control plane. It is in the top 3 of the CNCF (Cloud Native Computing Foundation) projects with the most GitHub stars and we selected it as one of the top 10 CNCF projects of the year. Do you want to know why it is so special?
What is etcd?
Etcd is an open-source distributed key-value store that was first released in 2013 by CoreOS, a software company that was later acquired by Red Hat. It was the early container days – Docker was first released in 2013 and Kubernetes in 2014. CoreOS was developing its own container platform and it needed a distributed configuration store. Etcd became a central component of this platform, as it provided a way for containerized applications to share configuration information across a cluster of machines.
Etcd was designed to be a simple and reliable way for applications to store and retrieve information in a distributed system. Kelsey Hightower, one of its creators, was already raving about etcd as a distributed configuration store back in 2013. As he has said, it was inspired by the Apache ZooKeeper project, which is another distributed coordination system. However, the focus of etcd was to be simple, secure, fast, and reliable.
In 2015, CoreOS donated the project to the CNCF. Etcd quickly became an important part of the cloud-native landscape, as more and more projects began to rely on it for distributed coordination and data storage. In particular, etcd became a critical component of Kubernetes.
It has also been adopted by many other companies and projects, which use it to store configuration data, service discovery information, and other key-value data in a distributed system.
Why is it called etcd?
The name “etcd” is short for “etcetera” plus the final “d” that is typically used for daemon services names. “Etcetera” is a Latin term that means “and other things” or “and so on”. The name reflects the fact that etcd was designed to be a general-purpose key-value store. It can be used for storing configuration data, but also for service discovery, feature flags, and other data that needs to be shared across a distributed system.
What is the difference between Kubernetes and etcd?
You cannot really compare Kubernetes and etcd because they are two different technologies that serve separate purposes.
Kubernetes is a container orchestration platform. It manages the deployment, scaling, and operation of containerized applications at large. It provides automated deployment and scaling, load balancing, service discovery, and more.
On the other hand, etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It is a key part of Kubernetes, which uses it to store and manage configuration data and maintain a consistent view of the state of the cluster.
So in fact, Kubernetes and etcd are complementary technologies that work together to provide a powerful and flexible platform for building and deploying distributed systems.
Does Kubernetes still use etcd?
Yes! Etcd is Kubernetes’ primary data storage. It uses etcd for storing and managing configuration data and even secrets, as well as for maintaining a consistent view of the cluster – it stores the state of pods, services, deployments, and other resources. You could say etcd is the only stateful part of Kubernetes. (In fact, periodically backing up the etcd cluster data will allow you to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes).
“Architecturally speaking, the [Kubernetes] API server is a CRUD application that’s not fundamentally different from, say, WordPress — most of what it does is storing and serving data. And like WordPress, it needs a database to store its persisted data, which is where etcd fits into the picture.”
Check it out to learn more about how etcd works, why K8s uses etcd and not an SQL database, real-life examples of etcd in action, and what k3s uses instead of etcd (yikes!).
Napptive enables developer self-service. We encourage you to try our playground and experience accelerated cloud-native development. It’s completely free, all you need to do is simply sign up and get started!
Is etcd encrypted?
We have just said that etcd stores secrets. These secrets are certainly not your crush’s name or your credit card pin code. Right from the Kubernetes documentation, “a Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. […] Using a Secret means that you don’t need to include confidential data in your application code.” So if etcd stores this crucial information, which security measures are we taking to prevent unauthorized access to it?
Well, none at all. By default, secrets are stored in etcd unencrypted. Anyone with API access or access to etcd can retrieve or modify a secret. What is more, anyone who is authorized to create a pod in a namespace can use that access to read any secret in that namespace. It is no wonder that more than two out of three insider threat incidents are caused by negligence. So be sure to follow the Good Practices for Kubernetes Secrets guide to secure your etcd sensitive information.
Who is using etcd?
For starters, all Kubernetes users. We know all Kubernetes clusters use etcd as their primary data store. This means etcd users include such companies as Pokemon Go, Box, CoreOS, Ticketmaster, Salesforce, and many more. Its simplicity, reliability, and consistency have made it a popular choice for managing distributed systems, so its adoption apart from K8s continues to grow in the cloud-native ecosystem. Let’s see some use cases!
- Rook – It is an open-source, cloud-native storage orchestrator for Kubernetes. It provides a variety of storage solutions, including block, file, and object storage. Rook uses etcd as its primary storage backend for metadata about the resources it manages, such as information about the storage clusters, nodes, and volumes.
- CoreDNS – This is a CNCF graduate project. It is an extensible DNS server meant for a multitude of environments. It was designed to be easily integrated with cloud-native environments such as Kubernetes. CoreDNS can use etcd as a backend for storing DNS zone data and other configuration information. This way, it can be easily accessed by multiple instances of CoreDNS running in a K8s cluster.
- Uber – The ride-hailing platform needs to be able to quickly store and access billions of metrics on its back-end systems at any given time. For this reason, they built a metrics platform named M3. Etcd plays a critical role in the architecture of M3DB, the scalable storage backend for M3, as it is used as a coordination and configuration service, as well as for distributed locking.
You can find many other production users on the etcd site.
In summary, etcd is a distributed key-value database that provides a feature set that is a perfect match for Kubernetes and other use cases: It is strongly consistent – so that it can act as a central coordination point for the cluster –, but it is also highly available thanks to the Raft consensus algorithm. In addition, its interface is very simple – you can use standard HTTP tools, such as curl – and changes are streamed in real time.
In case you have not yet tried Napptive, we encourage you to sign up for free and discover how we are helping propel the development of cloud-native apps.