rss-bridge 2024-04-03T19:59:00+00:00

SE Radio 610: Phillip Carter on Observability for Large Language Models

Phillip Carter, Principal Product Manager at Honeycomb and open source software developer, talks with host Giovanni Asproni about observability for large language models (LLMs). The episode explores similarities and differences for observability with LLMs versus more conventional systems. Key topics include: how observability helps in testing parts of LLMs that aren't amenable to automated unit or integration testing; using observability to develop and refine the functionality provided by the LLM (observability-driven development); using observability to debug LLMs; and the importance of incremental development and delivery for LLMs and how observability facilitates both. Phillip also offers suggestions on how to get started with implementing observability for LLMs, as well as an overview of some of the technology's current limitations. This episode is sponsored by WorkOS.

Phillip Carter, Principal Product Manager at Honeycomb and open source software developer, talks with host Giovanni Asproni about observability for large language models (LLMs). The episode explores similarities and differences for observability with LLMs versus more conventional systems. Key topics include: how observability helps in testing parts of LLMs that aren’t amenable to automated unit or integration testing; using observability to develop and refine the functionality provided by the LLM (observability-driven development); using observability to debug LLMs; and the importance of incremental development and delivery for LLMs and how observability facilitates both. Phillip also offers suggestions on how to get started with implementing observability for LLMs, as well as an overview of some of the technology’s current limitations.

This episode is sponsored by WorkOS.

Show Notes

SE Radio

Episode 594 – Sean Moriarity on Deep Learning with Elixir and Axon

582 – Leo Porter and Daniel Zingaro on Learning to Program with LLMs

556 – Alex Boten on Open Telemetry

534 – Andy Dang on AI / ML Observability

522 – Noah Gift on MLOps

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Giovanni Asproni 00:00:18 Welcome to Software Engineering Radio. I’m your host Giovanni Asproni and today I will be discussing observability for large language models with Philip Carter. Philip is a product manager and open-source software developer, and he’s been working on developer tools and experiences his entire career building everything from compilers to high-level ID tooling. Now he’s working out how to give developers the best experience possible with observability tooling. Philip is the author of Observability for Large Language Models , published by O’Reilly. Philip, welcome to Software Engineering Radio. Is there anything amiss that you’d like to add?

Phillip Carter 00:00:53 No, I think that about covers it. Thanks for having me.

Giovanni Asproni 00:00:56 Thank you for joining us today. Let’s start with some terminology and context to introduce the subject. So first of all, can you give us a quick refresher on observability in general, not specifically for large language models?

Phillip Carter 00:01:10 Yeah, absolutely. So observability is well, unfortunately in the market it’s kind of a word that every company that sells observability tools sort of has their own definition for, and it can be a little bit confusing. Observability can sort of mean anything that a given company says that it means, but there is actually sort of a real definition and a real set of problems that are being solved for that. I think it’s better to sort of root such a definition within. So the general principle is that when you’re debugging code and it’s easy to reproduce something on your own local machine, that’s great. You just have the code there, you run the application, you have your debugger, maybe you have a fancy debugger in your IDE or something that helps you with that and gives you more information. But that’s sort of it. But what if you can’t do that?

Phillip Carter 00:01:58 Or what if the problem is because there’s some interconnectivity issue between other components of your systems and your own system or what if it is something that you could pull down on your machine but you can’t necessarily debug it and reproduce the problem that that you’re observing because there’s maybe like 10 or 15 factors that are all going into a particular behavior that an end user is experiencing but that you can’t seem to actually reproduce yourself. How do you debug that? How do you actually make progress when you have that thing because you can’t just have that poor behavior exist in production forever in perpetuity because your business is probably just going to go away if that’s the case people are going to move on. So that’s what observability is trying to solve. It’s about being able to determine what is happening, like what is the ground truth of what is going on when your users are using things that are live without needing to like change that system or like debug it in sort of a traditional sense.

Phillip Carter 00:02:51 And so the way that you accomplish that is by gathering signals or telemetry that capture important information at various stages of your application and you have a tool that can then take that data and analyze it and then you can say okay, we are observing sort of let’s say a spike in latency or something like that, but where is that coming from? What are the factors that that go into that? What are the things that are happening on the output that can give us a little bit better signal as to why something is happening? And you’re really sort of answering two fundamental questions. Where is something occurring and to the extent that you can, why is it occurring in that way? And depending on the observability tool that you have and the richness of the data that you have, you may be able to get to a very fine grained detail to like the, this specific user ID in this specific region and this specific availability zone where you’ve deployed into the cloud or something like that is what is the most correlated with the spike in latency.

Phillip Carter 00:03:46 And that allows you to sort of like very narrow down and isolate something that’s going on. There is a more academic definition of observability that comes from control theory, which is that you can understand the state of a system without having to change that system. I find that to be less helpful though because most developers I think care about problems that they observe in the real world, sort of what mentioned and what they can do about those problems. And so that’s what I try to keep a definition of observability rooted in. It’s about asking questions about what’s going on and continually getting answers that help you narrow down behavior that you’re seeing whether that’s an error or a spike in latency or maybe something is actually fine but you’re just curious how things are actually performing and what healthy performance even means for your system.

[...]

Original source

SE Radio 610: Phillip Carter on Observability for Large Language Models

Show Notes

SE Radio

Links

Transcript