AI attention span so good it shouldn’t be legal
We have another two-for-one special this week, with two more interviews from the floor of re:Invent.
February 6, 2026
AI attention span so good it shouldn’t be legal
We have another two-for-one special this week, with two more interviews from the floor of re:Invent.
- Credit: Alexandra Francis*
First, Ryan welcomes Pathway CEO Zuzanna Stamirowska and CCO Victor Szczerba to dive into their development of Baby Dragon Hatchling, the first post-transformer frontier model, from how continual learning and memory will transform AI to the real-world use cases for longer LLM attention span.
In the second part of this episode, Ryan is joined by Rowan McNamee, co-founder and COO of Mary Technology, to discuss bringing AI into the carefully governed world of litigation and how LLMs are helping lawyers manage and interpret the vast amounts of legal evidence that pass across their desks every day.
Pathway is building the first post-transformer frontier model that solves for attention span and continual learning.
Mary Technology is an AI for attorneys that turns evidentiary documents into structured, easy-to-review facts.
Connect with Zuzanna on LinkedIn and Twitter.
Reach out to Victor at his email: victor@pathway.com
Connect with Rowan on LinkedIn.
We want to know what you're using to upskill and learn in the age of AI. Take this five minute survey on learning and AI to have your voice heard in our next Stack Overflow Knows Pulse Survey.
TRANSCRIPT
[Intro Music]
Ryan Donovan: Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I'm your host, Ryan Donovan, and today we have two recordings back from AWS re;Invent, recorded on the floor. We have interviews with Pathway and Mary Technologies, so please enjoy.
Ryan Donovan: I'm here at re;Invent talking about models other than Transformers, the next level, and I'm here with Zuzanna Stamirowska, CEO of Pathway, and Victor Szczerba, Chief Commercial Officer of Pathway. So, welcome to the show. Can you tell me a little bit about what Pathway is doing?
Zuzanna Stamirowska: Yeah, absolutely. Hey, thank you very much for having us.
Ryan Donovan: Of course.
Zuzanna Stamirowska: So, Pathway is building the first post-transformer frontier model, which resolves the fundamental problem of current LLMs, which is the question of memory. Models that we are actually training right now will be capable of continual learning, capable of long-term reasoning, and of adaptation, imagine life AI. So, this is what we're building, and this is really innovation which is very deep. So, we took the first principles view on how intelligence works, and how transformer works, actually; and rolled back a bit in history and rethought all of it from the first principles view; and then, looked a little bit at the brain, how it works, and found a link between transformers and the brain; and we published bits of what we were doing already. So we published the BDH Dragon Hatchling Architecture, which was trending on Hanging Face in October. And yes, this is the beginning of the post-transformer era.
Ryan Donovan: The model, is it still a neural net? Would it be familiar if somebody looked at the weights and biases of a transformer model? Would it be familiar, or is it something completely different?
Zuzanna Stamirowska: Yes and no.
Ryan Donovan: Yeah.
Zuzanna Stamirowska: So, first of all, maybe a little bit of background. Right now, almost all of the models that we see out there feel the same because they're the same. They're based transformer, and there was a brute force approach. We put more data, more compute, more layers, more everything, and then this will just get better. We've seen there won't be enough energy to actually power all the inferences, and we see that LLMs, specifically LLMs seen as just scaling with more data, et cetera, won't get us to AGI. So, right now, it's even open AI researchers saying that openly. And then, in terms of what we've done, we looked a little bit at– we even rethought attention. So, yes, it's very different. The way our model works is way closer to the brain. So, [the] brain is a beautiful structure where you have neurons. Neurons can be viewed as small computational entities, simply this, small computational entities. And neurons are connected between each other forming a network of connections. It's a physical system with local activations. So, we have a brain, which is pretty large. So, specifically, we have 100 billion neurons and a thousand trillion synapse connections, which are still packed in a very efficient structure, because our heads [have] to be light. [It] has to fit into our frame. The head has to be light. We have to be able to walk on two feet and not fall over. The brain is super efficient. It is capable of generalizing over time, it is capable of lifelong learning. We're born, we learn, we taste soap once, and we know we shouldn't be eating soap. We don't need to see all the soap data or whatever, or taste soap thousands of times before we understand that soap is not good for you. The brain is a physical system that exists, that has all the required properties that we would love to have in an AGI. So, what we did is we kinda looked at the brain a little bit. Scientists were looking at the planes, and wanted to make transformer look at, 'okay, how can we get from transformer, what is missing in the transformer to get us closer to the brain?' So, we found that link, and the model works, and the architecture works in such way. We have neurons, we have synapses. When there is a new token of information that arrives—it may arrive at any time—we have neurons that fire up. So, we have one neuron, for example, that fires up and then sends message to its neighbors, to whom it's connected by wire, right? By the synapse. So, it passes the message. Let's say a certain threshold of importance is reached for the neighbor, and the neighbor fires up as well. But this is a basic principle of something that's called Hagen learning. It's actually a very simple brain model, really. But these interactions are local. So, this means its range is a very simple rule of I have a message, I send you a letter, if you care enough, you fire up as well. If you fire up the synapse, the connection between us becomes stronger. Since the connection becomes stronger, this lets implications become stronger, as well. Ultimately, it gives us intrinsic memory. So, we actually do have memory inside of the architecture train, like the architecture itself, and we have local dynamics. And this locality gives a lot of nice features. So, one is the fact that it's extremely computationally efficient, because we don't fire up huge matrices, but we literally just have it apply all the principles of distribute computing. It distributes nicely because you can chart easily. So, we can distribute it differently than with a transformer. [When] you have this energy efficiency, memory is a give-in.
Ryan Donovan: It sounds like– we talked about the synapses and nodes. How do you represent that in a sort of storage computation manner? Like with a neural net, it is just a series of floats, and a thousand things in array with billions and billions of parameters. Is this a sort of vector math, or is it the same sort of array?
[...]