You need quality engineers to turn AI into ROI
Pete Johnson, Field CTO, Artificial Intelligence at MongoDB, joins the podcast to say that looking at AI’s impact as a job killer is a flawed metric.
January 7, 2026
You need quality engineers to turn AI into ROI
Pete Johnson, Field CTO, Artificial Intelligence at MongoDB, joins the podcast to say that looking at AI’s impact as a job killer is a flawed metric.
Pete Johnson, Field CTO, Artificial Intelligence at MongoDB, joins the podcast to talk about a recent OpenAI paper on the impact that AI will have on jobs and overall GDP. Pete, who reads the papers (and datasets) so you don’t have to, says that looking at AI’s impact as a job killer is a flawed metric. Instead, he and Ryan talk about how AI will be a collaborator for actual human workers, how embeddings and vectorization will move the productivity needle, and the five decisions you need to make to realize ROI on AI.
Episode notes:
If you’re curious, read the OpenAI blog post and paper yourself.
For those of you looking for inspiration, check out Werner Vogel’s keynote from re:Invent 2025.
MongoDB provides a flexible and dynamic database that excels with AI data.
Connect with Pete on LinkedIn.
Congrats to Populist badge winner Scheff's Cat for dropping a banger of an answer on error: non-const static data member must be initialized out of line.
TRANSCRIPT
[Intro Music]
Ryan Donovan: Hello everyone, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your humble host, Ryan Donvan, and today we have a podcast sponsored by the fine folks at MongoDB talking about the race to prove out the agentic value. So, my guest today is MongoDB Field CTO, Pete Johnson. Welcome to the show, Pete.
Pete Johnson: Hi, Ryan. Thanks so much for having me.
Ryan Donovan: Of course. So, before we get into talking about this OpenAI paper, tell us a little bit about yourself. How did you get into software and technology?
Pete Johnson: I wrote my first line of code as a sixth grader in 1981.
Ryan Donovan: Wow.
Pete Johnson: And I'm one of those lucky people that was able to turn a childhood hobby into a, now, what is it, 31-plus-year-career after college. So, I know that's a common story for a lot of people, but I asked for an Intelevision for Christmas of 1981, and if you know, you know.
Ryan Donovan: Yep.
Pete Johnson: I instead received a TRS-80 color computer, the 4K version, not the 16K. That came with a variant of the Microsoft basic interpreter called Color Basic at the time, and I used it to generate a little program that tracked rebounding and scoring stats from my sixth-grade basketball team.
Ryan Donovan: Nice. I think I also got the old switcheroo with the Intelevision Commodore 64.
Pete Johnson: Well, C 64, you had a real disk drive. I had cassette tapes and storage on the TRS-80 color.
Ryan Donovan: Right.
Pete Johnson: Or the cocoa, as people called it back then.
Ryan Donovan: So, obviously it's been a long journey from then. You've turned a hobby into a career.
Pete Johnson: Yeah, I did 20 years at HP. I did 17 of that in HPIT, where I wrote my first web application, went into production in January in 96. That was about 13 months after the first W3C meeting. I became HP.com Chief Architect at the end of that HPIT tenure, and then I was one of the founding members of HP Cloud Services, which was HP's attempt to try to compete directly with AWS on top of OpenStack. And while that didn't work out for the company, that sure worked out for me personally. I moved out of engineering and into sales and marketing and went on couples of different startups. One was acquired by Cisco. A little bit of the stint where I was prior to MongoDB, I was a field CTO at the services arm of CDW, and then I've been here since June.
Ryan Donovan: All right. Well, a lot has changed since the old TRS 80 days. Today everybody's talking about AI and agents and, you know, as people try to get this to have real world impact—I think I saw the stat that 95% of projects fail. People are looking at how, you know, what's the ROI of this? And OpenAI had an interesting paper talking about the sort of GDP impact, how they could evaluate that impact of agents and agentic tasks. Can you tell us a little bit more about this paper?
Pete Johnson: Yeah, sure. So, that paper, the GDP vow paper. So, there was a blog article, there was a white paper, and then there was a dataset. And I'm the kind of guy that I'll read everything to sort of see where the goodness or where the hiding stuff might be, 'cause there's always some hiding that goes on in white papers. And if you just look at the blog article, what that'll tell you is they looked at 44 occupations across different vertical sectors of the economy. They then went and hired experts with at least 14 years experience in each one of those occupations, and they had those people define 30 common tasks to each of those occupations. They then took a subset of those, five per occupation, and ran it through a version of a touring test, where what they did was they did a one-shot prompt to try to complete the task and sped that to an LLM. And then they found a person with a decent amount of experience in that occupation and asked them to complete the same task. Then they had an independent third party, a human being, then evaluate which one was better. And then they established sort of a win rate between the human being and different LLMs. And to their credit, OpenAI didn't just test OpenAI LLMs; they tested some of their competitors, as well. That was the basic structure of the testing that they ran that was the result of that white paper.
Ryan Donovan: Right. And like you said, you went in through all three stages of this paper down to the dataset. What was the sort of interesting takeaway? What is the stuff that is sort of hidden there?
Pete Johnson: Well, if you just look at the blog article, sort of the glory graphic that was part of the blog article showed what the scores were for each of the different individual LLMs. And I've got some notes here. I'll read 'em off here real quick. For example, at the time what they were testing was things like Claude Opus 4.1 did the best, where it got a score of 47.6, which meant that either won or tied in the different—according to the Human Evaluator—the different tasks that it was graded against. GPT-4.0 was the lowest-scoring of the seven that they tested, and that was 12.4. And so, the way that they did the blog articles, they showed GPT-4 at 12.4, Grok 4 at 24.3, Gemini 2.5 Pro at 25.5, o4-Mini-High at 27.903, o3-High at 34.1, and GPT-5-High at 38.8, before Claude Opus at 4.1. And that was, like I said, kind of the glory diagram from the blog article. But if you look at the white paper, there was an, I thought, was an even more interesting diagram, and I'll tell you, it was on page seven, it's figure seven.
Ryan Donovan: Right.
Pete Johnson: And it showed, in addition to the main testing, they also did some analysis of what happened when the AI and the people worked together. And that's when they saw really big gains. So, they showed a cost and speed improvement, and they did this just with GPT-5-High of one and a half on both speed and cost improvement. And I think, you know, the glory statistic was about 'how close are we to AGI?' But I think really, when I read through the paper, it turned me into an AGI-skeptic. It made me really think about how I think we're entering an era where everybody's gonna be AI-enhanced and see cost and speed improvements similar to what they found in that figure seven.
[...]