SE Radio 572: Gregory Kapfhammer on Flaky Tests
Gregory Kapfhammer, associate professor at Allegheny College, discusses the common problem of 'flaky tests' with SE Radio's Nikhil Krishna. Flaky tests are test cases that unreliably pass or fail even when no changes are made to the source code under test or to the test suite itself, which means that developers can't tell whether the failures indicate bugs that needs to be resolved. Flaky tests can hinder continuous integration and continuous development by undermining trust in the CI/CD environment. This episode examines sources of flaky tests, including physical factors such as CPU or memory changes, as well as program-related factors such as performance issues. Gregory also describes some common areas that are prone to flaky tests and ways to detect them. They discuss tooling to detect and automatically mark flaky tests, as well as how to tackle these issues to make tests more reliable and even ways to write code so that it's less susceptible to flaky tests.
Gregory Kapfhammer, associate professor at Allegheny College, discusses the common problem of ‘flaky tests’ with SE Radio’s Nikhil Krishna. Flaky tests are test cases that unreliably pass or fail even when no changes are made to the source code under test or to the test suite itself, which means that developers can’t tell whether the failures indicate bugs that needs to be resolved. Flaky tests can hinder continuous integration and continuous development by undermining trust in the CI/CD environment. This episode examines sources of flaky tests, including physical factors such as CPU or memory changes, as well as program-related factors such as performance issues. Gregory also describes some common areas that are prone to flaky tests and ways to detect them. They discuss tooling to detect and automatically mark flaky tests, as well as how to tackle these issues to make tests more reliable and even ways to write code so that it’s less susceptible to flaky tests.
Show Notes
References
- “A Survey of Flaky Tests,” Transactions on Software Engineering and Methodology, 31:1, 2022, https://www.gregorykapfhammer.com/research/papers/Parry2022/
- “Evaluating Features for Machine Learning Detection of Order- and Non-order-dependent Flaky Tests,” Proc. the 15th Int’l Conference on Software Testing, Verification and Validation, 2022, https://www.gregorykapfhammer.com/research/papers/Parry2022a/
- “Surveying the Developer Experience of Flaky Tests,” Proc. the 44th Int’l Conference on Software Engineering – Software Engineering in Practice Track, 2022, https://www.gregorykapfhammer.com/research/papers/Parry2022b/
- “What do developer-repaired flaky tests tell us about the effectiveness of automated flaky test detection?” Proc. the 3rd Int’l Conference on Automation of Software Test, 2022, https://www.gregorykapfhammer.com/research/papers/Parry2022c/
Links
Related SE Radio Episodes
- 474 – Paul Butcher on Fuzz Testing
- 461 – Michael Ashburne and Maxwell Huffman on Quality Assurance
- 322 – Bill Venners on Property-Based Tests
- 283 – Alexander Tarlinder on Developer Testing
- 256 – Jay Fields on Working Effectively with Unit Tests
Transcript
Transcript brought to you by IEEE Software magazine.
This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number and URL.
Nikhil 00:00:16 Hello and welcome to Software Engineering Radio. I’m your host, Nikhil, and in today’s Software Engineering Radio episode, we welcome Gregory Kapfhammer, an associate professor at Allegheny College with a PhD from the University of Pittsburgh. His research focuses on software engineering and testing, particularly on flaky tests. Gregory is involved in various academic roles, including associate editor, program committee member, and reviewer. He also contributes to open-source software testing and analysis tools on GitHub. Our discussion will center around the common problem of flaky tests encountered in software development. So welcome to the show, Gregory, and was there anything in the bio that you might want to add that I missed?
Greg Kapfhammer2 00:01:05 First of all, let me say thank you for inviting me to chat today about flaky tests. I’m excited to be a guest on Software Engineering Radio, and I think everything that you shared was wonderful. Let’s dive into the conversation.
Nikhil 00:01:17 Great, let’s start with the basics, right? So what is a flaky test and why as a software developer should I care about flaky tests?
Greg Kapfhammer2 00:01:26 So a flaky test is a test case that passes or fails even when you don’t change the source code of the program under test or the test suite itself. This is a real challenge for software developers because the quality of the test case signal is diminished. This means that a test might pass sometimes and then fail sometimes, and the developer won’t be able to know whether the failures actually indicate that there’s a bug in the program that they need to resolve.
Nikhil 00:01:59 That sounds kind of annoying, obviously. That’s not great, right? So what exactly does it kind of hinder, right? So obviously one is basically it’s not great to have a test that sometimes starts and fails, but are there any other process-driven things that might get impacted?
Greg Kapfhammer2 00:02:17 Yeah, that’s a really good point. So when you have a flaky test in your test suite, it often hinders continuous integration and continuous development. So you might have a build that fails in continuous integration and you wonder why it fails, and you also wonder why it sometimes passes and sometimes fails. So if that’s due to a flaky test, then it causes you as a developer to stop trusting the CI and CD environment and therefore limits your ability to add new features or to introduce bug fixes.
Nikhil 00:02:52 Yeah, that sounds like something that I would do as well. If I had a CICD environment that’s flaky and there’s something to be delivered, I’d like, oh yeah, it fails half the time. I’m going to just throw the dice as they say. Maybe we can talk about a real-world example that you might have encountered where this is common.
Greg Kapfhammer2 00:03:12 Yeah, that’s a really good question. So I myself have implemented a number of software testing or automated assessment tools that I have released to GitHub. One of the tools that I have built is an automated assessment tool that I use when I am checking my students’ work. There were some order-dependent test cases within that test suite, and so if I ran the tests in a different order than the standard order, the test cases would sometimes fail in a flaky fashion. This made it really difficult for me as a developer to know whether or not the new feature that I had added was correct, and the test case was failing in a flaky fashion, or there was actually a real bug inside of my program.
Nikhil 00:03:59 Right. I’m sure you must have given your student an earful about that. So, okay, so you talked about order-dependent and non-order-dependent. So is that a way to categorize flaky tests as order-dependent or not? And maybe you could give us some examples of what a non-order-dependent, if you can.
[...]