SE Radio 591: Yechezkel Rabinovich on Kubernetes Observability
Yeckezkel Rabinovich, CTO of Groundcover, speaks with host Philip Winston about observability and eBPF as it applies to Kubernetes. Rabinovich was previously the chief architect at the healthcare security company CyberMDX and spent eight years in the cyber security division of the Israeli Prime Minister's Office. This episode explores the three pillars of observability, extending the Linux Kernel with eBPF, the basics of Kubernetes, and how Groundcover uses eBPF as the basis for its observability platform.
Yeckezkel Rabinovich, CTO of Groundcover, speaks with host Philip Winston about observability and eBPF as it applies to Kubernetes. Rabinovich was previously the chief architect at the healthcare security company CyberMDX and spent eight years in the cyber security division of the Israeli Prime Minister’s Office. This episode explores the three pillars of observability, extending the Linux Kernel with eBPF, the basics of Kubernetes, and how Groundcover uses eBPF as the basis for its observability platform.
Show Notes
- Yechezkel Rabinovich
- Twitter – https://twitter.com/yechezkel__
- Groundcover – https://www.groundcover.com/
- Groundcover Blog – https://www.groundcover.com/blog
- eBPF – https://ebpf.io/
- eBPF Cilium – https://cilium.io/
- eBPF Verifier – https://docs.kernel.org/bpf/verifier.html
- Kubernetes – https://kubernetes.io/
- APM Golden Signals – https://sre.google/sre-book/monitoring-distributed-systems/
- Prometheus – https://prometheus.io/
- Grafana – https://grafana.com/
- Commonground Case Study – https://www.groundcover.com/customer-stories/commonground
Related Episodes
- Episode 455: Jamie Riedesel on Software Telemetry
- Episode 446: Nigel Poulton on Kubernetes Fundamentals
- Episode 334: David Calavera on Zero-downtime Migrations and Rollbacks with Kubernetes
- Episode 319: Nicole Hubbard on Migrating from VMs to Kubernetes
- Episode 445: Thomas Graf on eBPF (extended Berkeley Packet Filter)
Related IEEE
Transcript
Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.
Philip Winston 00:00:18 Welcome to Software Engineering Radio. My guest is the CTO and Co-founder at ground cover, which provides full stack observability for Kubernetes. He was previously the chief architect at the Healthcare Security Company, cyber MDX, and spent eight years in the cybersecurity division of the Israeli Prime Minister’s office. He holds degrees in electrical engineering, physics, and biomedical engineering. First, I’d like to ask you to pronounce your name and if there’s anything you’d like to add to your bio.
Yechezkel Rabinovich 00:00:49 Hey, so it’s Rabinovich, but you can call me Chaz. It’ll be easier. Yeah. Mainly worked around Linux embedded software and, uh, distributed systems in the last 10 years or so.
Philip Winston 00:01:01 Great. This episode is going to be at the intersection of three technologies: observability, EBPF, and Kubernetes. I’m not going to go into too much depth on either one of these, so I’m gonna list three episodes here that covered it in more detail. Episode 455, Jamie Wiesel on Software Telemetry, episode 446, Nigel Polton on Kubernetes Fundamentals, and episode 445, Thomas Graph on EBPF. Let’s start with observability. What is observability?
Yechezkel Rabinovich 00:01:34 So it’s common to think about observability in three pillars of data. One is the, is logging the text messages we’re creating from our applications. The other one is metrics, which are basically counters and gauges that our applications are creating. You can think about that. There’s the speed of a car, right? That’s a gauge, right? The fuel amount of fuel you left in your car. And the third one is tracing, which are samples of data that represent interactions between two services. So if you’re calling an HTP request, that will be a trace or a span, which is part of a trace. And observability is the ability to query all those three in a very meaningful way of troubleshoot or understanding the state of application. It could be for security, it could be for, uh, performance investigations. Basically everything that developers are interested on.
Philip Winston 00:02:24 From my experience, observability can make a huge difference to sort of the quality of life of the developer trying to get to the bottom of some problem. Can you give an example of a system that didn’t have good observability that was a struggle to work with, and contrast that with something that had sort of full observability and how that accelerated the debugging or the investigation?
Yechezkel Rabinovich 00:02:49 Oh yeah, actually I have a, a good example of the, I think that was the trigger for me to start Guan cover. So we had a, a problem, uh, with our platform where customers experienced data loss and we had, uh, some complex pipeline of data. So, you know, 30 microservices all talking to each other with message queues and readies and API calls and everything you can think of from a modern application. And so, you know, where do you start, right? We got this lead that our customers saying, you know, we are missing data, but where do you start? At that time, we had a lot of logs that we paid a lot for them, but you know, you can’t really read for, you know, 20 or 30 million logs lined and understand what’s going on. So we’ve decided to instrument our application with a lot of counters that will represent the pipeline of the, of the system.
Yechezkel Rabinovich 00:03:45 So it took us somewhere around two months to instrument. We had somewhere around 1 million counters, and then we can finally detected where the leakage is and start solving it. But this process was a nightmare, right? So we walked two months just to get, just to see where the problem is. On the other hand, you know, when we started ground cover, we knew that we had to set as an an example to how you observe and monitor production. So we, we built our entire stack from day one to have performance monitors and meaningful logs and traces that will help us troubleshoot and investigate any performance issue. So, you know, it feels like the difference is like, you know, using a paper as a map or a a navigation app, that’s the difference for me. You know, something guided, not, you know, you don’t need to search for just see answers.
Philip Winston 00:04:36 Yeah, it’s a major difference. Let’s move on to EBPF. eeb. PF is a technology that’s used perhaps for many different purposes. We’re gonna talk about observability, but let’s just talk a little bit about the technology at first. So what is EBPF
[...]