Beyond YAML: Core IT for Kubernetes
When it comes to learning Kubernetes, many people focus on getting the YAML files deployed and think, "Great, it worked. I'm good to go."
However, if you want to learn Kubernetes the right way, it's important to have a solid understanding of core IT fundamentals.
While it's possible to learn Kubernetes without this knowledge, as DevOps engineers, it's important to go beyond merely deploying YAML files. A comprehensive understanding of the entire system is essential for effective troubleshooting and designing more robust solutions.
I'm not saying you can't get by without this foundation, but knowing these areas will give you a significant edge:
In this guide, I'll provide a high-level overview of the following core concepts.
- Containers
- Distributed system
- Authentication & Authorization
- Key Value Store
- API
- YAML
- Service Discovery
- Networking Basis
- Linux
While it's not possible to cover every topic in detail, you can use this as a basic roadmap to further expand your knowledge using the links added in each section.
1. Containers
Kubernetes orchestrates containers, so understanding container concepts is fundamental.
It's essential to understand container basics and gain hands-on experience with container tools like Docker or Podman. Additionally, I recommend exploring the Open Container Initiative (OCI)
2. Distributed Systems
As the name indicates, a distributed system is a group of servers that work together to perform tasks as if they were a single system.
These servers, called nodes, share the workload and communicate with each other to get things done. The goal is to make the system scalable, reliable, and fault-tolerant, so it can handle more tasks and recover from failures.
Here is how kubernetes relates to distributed system.
- Scalability: Kubernetes lets you easily scale your applications by adding or removing containers across multiple servers in the cluster, which is a key feature of distributed systems.
- Fault Tolerance: If one machine in the cluster fails, Kubernetes automatically moves the containers to other healthy machines, keeping your applications running smoothly.
- Load Balancing: Kubernetes spreads incoming traffic across multiple containers, preventing any single container from being overloaded, much like how distributed systems manage resources.
- Decentralization: In a distributed system, no single machine is in charge of everything. Similarly, Kubernetes distributes management tasks across different cluster components, all working together to keep your applications running as expected.
- Shared Storage: Kubernetes works with distributed storage systems, allowing containers to share data across the cluster.
3. Authentication & Authorization
It's a fundamental IT concept, but it's easy for engineers just starting out to get confused. So, take the time to really understand it. You'll encounter these terms frequently in Kubernetes.
- Authentication is the process of verifying the identity of a user or system. It answers the question, "Who are you?" by requiring credentials like usernames and passwords, tokens, or certificates.
- Authorization determines what an authenticated user or system is allowed to do. It answers the question, "What are you allowed to do?" by checking permissions and roles assigned to the user or system.
A solid understanding of authentication and authorization is important for securing and managing access within a Kubernetes cluster, ensuring that only the right people and systems have the necessary permissions to access resources in the cluster.
Refer: Authentication vs. Authorization
4. Key-Value Store
It is a type of NoSQL Database. Understand just enough basics and their use cases.
Kubernetes uses etcd - A distributed key-value store used for cluster state management and configurations.
Refer: What is a Key Value Store
5. API
Kubernetes is fundamentally an API-driven system (REST anf gRPC).
This means that all operations, from deploying applications to managing cluster resources, are performed through API calls. Understanding APIs is important for effectively working with and extending Kubernetes.
If you never worked in APIs, create a simple API using python flask to understand it practically.
RESTful API: It uses HTTP requests to perform CRUD (Create, Read, Update, Delete) operations on resources.
gRPC: gRPC is a high-performance, open-source framework for remote procedure calls that enables communication between services in different languages efficiently.
While the main Kubernetes API is RESTful, gRPC is primarily used for internal communications within the cluster.
6. YAML
YAML is the language kubernetes speaks.
Itβs very easy to learn
In Kubernetes, YAML is the primary language for defining and configuring resources.
Refer: Yaml Tutorial
7. Service Discovery
Service discovery is a mechanism that allows services in a distributed system to find and communicate with each other dynamically.
It's important in environments where services can be added, removed, or scaled up and down frequently.
DNS-based service discovery is a method used in Kubernetes to allow services within a cluster to locate and communicate with each other using DNS names instead of IP addresses.
8. Networking Basics
Networking is a key part of Kubernetes. To understand Kubernetes networking (primarily services concept), you need to have a fair knowledge of the following topics.
CIDR Notation & Types of IP Addresses: CIDR (Classless Inter-Domain Routing) notation is a way to define IP address ranges. Knowing the difference between public and private IP addresses and how IP ranges are defined helps you manage network resources in Kubernetes.
IPv4 & IPv6 - K8s supoorts dual stack networking
OSI Layers (L2, L3, L4, L7):
- L2 (Data Link): Handles direct data transfer between devices on the same network.
- L3 (Network): Manages data routing between different networks (like using IP addresses).
- L4 (Transport): Ensures data is delivered error-free and in order (like TCP/UDP).
- L7 (Application): Involves data that's meaningful to applications (like HTTP, DNS).
SSL/TLS (One-way & Mutual TLS):
- One-way TLS secures communication from client to server.
- Mutual TLS secures both client and server communication, verifying both ends.
Proxy: A proxy acts as an intermediary between a client and a server, handling requests and responses. In Kubernetes, proxies help manage traffic, load balancing, and access control.
DNS (Domain Name System): DNS translates domain names (like google.com) into IP addresses. In Kubernetes, DNS helps services within the cluster find and connect to each other using names instead of IP addresses.
IPVS/IPTables/NFtables: These are tools used for routing and managing network traffic within Linux systems. Kubernetes uses these tools to manage how
Virtual Interfaces: Virtual interfaces are software-based network interfaces that allow multiple network connections to run over a single physical connection. In Kubernetes, virtual interfaces are used to connect pods to the network.
Overlay Networking: Overlay networking creates a virtual network on top of the existing physical network, allowing pods across different hosts to communicate as if they were on the same network.
This is essential in Kubernetes for connecting pods across different nodes in a cluster.
9. Linux Concepts
You need to have a fair amount knowlege of the following linux concepts.
- IPTables: Manages internal load balancing and routing.
- Filesystems: Handles data storage and management.
- Mount points: Directories where storage volumes attach to pods.
- Swap: Disk space used to supplement RAM (Used in specific use cases).
- Systemd: Manages services, including Kubernetes components.
- journalctl, syslog: Tools for accessing system logs.
- SELinux: Enforces security policies on pods.
- AppArmor: Restricts pod access to system resources.