rss-bridge 2024-07-03T00:04:00+00:00

SE Radio 623: Michael J. Freedman on TimescaleDB

Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, spoke with SE Radio host Gavin Henry about TimescaleDB. They revisit what time series data means in 2024, the history of TimescaleDB, how it integrates with PostgreSQL, and they take the listeners through a complete setup. Freedman discusses the types of data well-suited for a timeseries database, the types of sectors that have these requirements, why PostgreSQL is the best, Pg callbacks, Pg hooks, C programming, Rust, their open source contributions and projects, data volumes, column-data, indexes, backups, why it is common to have one table for your timeseries data, when not to use timescaledb, IoT data formats, Pg indexes, how Pg works without timescaledb, sharding, and how to manage your upgrades if not using Timescale Cloud. Brought to you by IEEE Computer Society and IEEE Software magazine.

Michael J. Freedman, the Robert E. Kahn Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, speaks with SE Radio host Gavin Henry about TimescaleDB. They revisit what time series data means in 2024, the history of TimescaleDB, how it integrates with PostgreSQL, and they take the listeners through a complete setup. Freedman discusses the types of data well-suited for a timeseries database, the types of sectors that have these requirements, why PostgreSQL is the best, Pg callbacks, Pg hooks, C programming, Rust, their open source contributions and projects, data volumes, column-data, indexes, backups, why it is common to have one table for your timeseries data, when not to use timescaledb, IoT data formats, Pg indexes, how Pg works without timescaledb, sharding, and how to manage your upgrades if not using Timescale Cloud. Brought to you by IEEE Computer Society and IEEE Software magazine.

Show Notes

Related Episodes

SE Radio 484: Audrey Lawrence on Timeseries Databases

SE Radio 583: Lukas Fittl on Postgres Performance

SE Radio 511: Ant Wilson on Supabase (Postgres as a Service)

SE Radio 362: Simon Riggs on Advanced Features of PostgreSQL

From IEEE

Comparison of Time Series Databases

TS-Benchmark: A Benchmark for Time Series Databases

Closed Loop Benchmark for Timeseries Databases

A Time-Position Join Method for Periodicity Mining in Time Series Databases

From the Show

https://michaelfreedman.org

@michaelfreedman on X

https://www.timescale.com: PostgreSQL ++ for time series and events

GitHub – timescale/pgspot: Spot vulnerabilities in postgres SQL scripts

GitHub – SentryPeer/SentryPeerHQ: Fraud Detection for VoIP

Transcript

Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.

Gavin Henry 00:00:18 Welcome to Software Engineering Radio. I’m your host Gavin Henry. And today my guest is Michael J. Friedman. Michael J. Friedman is the Robert Econ Professor in the Computer Science department at Princeton University, as well as the co-founder and CTO of Timescale, building a category defining relational database and Cloud platform for time series data. His work broadly focuses on distributed systems, networking and security, and has led to commercial products deployed systems reaching millions of users daily. Honors include, right here’s a big list, ACM Grace Murray Hopper Award, ACM SIG Ops Marg Weiser Award, I think. ACM Fellow Presidential Early Career Award, that sounds like a good one. Sloan Fellow and the DARPA CSSG member. Mike, welcome to Software Engineering Radio. Is there anything I missed in that impressive bio that you’d like to add?

Mike Freedman 00:01:08 Thanks for having me Gavin. I think it’s academics like to give ourselves accolades, so there you go.

Gavin Henry 00:01:14 Very impressive. So we’re going to have a chat about what time series data means in 2024, why TimescaleDB is needed and how it integrates with PostgreSQL. Lastly, we’ll close off discussing a full example use case. So Mike, let’s start with time series data. How would you define it today in 2024,

Mike Freedman 00:01:34 We think of time series data as any type of metric event information that you generally want to collect over time because it changes. This is a type of workload that often actually has a pattern that makes it look append mostly. So you are often collecting these either regular or irregular streams of data and you want to collect it over time because what’s valuable to you for your business, for your product is not only what’s happening now but the ability to look back on how that information changes over time and use that to understand the past and also predict the future.

Gavin Henry 00:02:21 What sort of common timeframes do you see in what we understand time series data be?

Mike Freedman 00:02:26 I think it depends a lot on the application. We have users who really care about what happened in the last 24 hours. We often have users in different domains. For example, in the energy space in renewables, in things like oil and gas, there are often compliance reasons and historical reasons to understand seasonality, where they store it for going back 10 years and use that. Also to understand what, seasonal trends are and how the data changes over time. That often translates to features that you end up building or that, we end up building as part of our database. To also recognize that how you might access information in the last few minutes, few hours might be very different from how you want to access data or the derived roll-ups you have on that data stretching back months or years.

Gavin Henry 00:03:22 And what in particular makes this type of data difficult to work with? For example, the volumes we get the time spans orÖ

Mike Freedman 00:03:31 Yeah, I think when we set out to build Timescale, we did it really because of the scale and performance that this domain needs. Both, and you even alluded to this, both the volume and the velocity of the data is often different than you might see in a traditional kind of e-commerce or transactional application. You often have data arriving at a very fast rate that because you are also keeping back the historical data, it’s not just you don’t delete it after a short period of time. So that transits to large data volumes and yet at the same time you want to actually build operational workloads. And by that, I mean you are using this data not to, you don’t throw all this data in some data lake or data warehouse, you never look at it again. When people are coming to Timescale, they’re doing so because they’re building operational applications.

Mike Freedman 00:04:24 And by that I mean we actually have developers who are using this to build products on top of their showing those products typically customer facing, they’re mission critical and so they also need a responsiveness and good performance when they’re continuing to build those applications, run the APIs, show the live dashboards to their customers as opposed to just somewhere where it sits around for a while and occasionally do some ad hoc reporting on top of that. So it puts a lot of demand on what you technically need for a database in order to satisfy those requirements.

Gavin Henry 00:04:56 And I know you mentioned Mike, their operations. So in my head I presume the intervals of the data we’re working with need to be consumed, processed and available in the ops tools, systems alarming monitoring, whatever it is, you’re using that data and consuming it for pretty close to the intervals it’s arriving or are there lags that you can accept? How does that look?

[...]

Original source