SE Radio 611: Ines Montani on Natural Language Processing
Ines Montani, co-founder and CEO of Explosion, speaks with host Jeremy Jung about solving problems using natural language processing (NLP). They cover generative vs predictive tasks, creating a pipeline and breaking down problems, labeling examples for training, fine-tuning models, using LLMs to label data and build prototypes, and the spaCy NLP library.
Ines Montani, co-founder and CEO of Explosion, speaks with host Jeremy Jung about solving problems using natural language processing (NLP). They cover generative vs predictive tasks, creating a pipeline and breaking down problems, labeling examples for training, fine-tuning models, using LLMs to label data and build prototypes, and the spaCy NLP library.
Show Notes
Conference talks
SE Radio
- 391 – Jeremy Howard on Deep Learning and fast.ai
- 493 – Ram Sriharsha on Vectors in Machine Learning
Transcript
Transcript brought to you by IEEE Software magazine and IEEE Computer Society. This transcript was automatically generated. To suggest improvements in the text, please contact [email protected] and include the episode number.
Jeremy Jung 00:00:46 Hey, this is Jeremy Jung for Software Engineering Radio. Today I’m talking to Ines Montani. She’s the co-founder and CEO of Explosion, which is a company specializing in developer tools for machine learning and Natural Language Processing. She’s also a core developer of the spaCy NLP library and the Prodigy Annotation tool. Ines, welcome to Software Engineering Radio.
Ines Montani 00:01:08 Hi, thanks for having me. Very excited.
Jeremy Jung 00:01:11 So I think the first thing we could start with is defining NLP. So what is NLP for people who aren’t familiar?
Ines Montani 00:01:17 Yeah, I mean it’s a good question because actually, if you look around on the internet or follow the field, you might actually find slightly different definitions. I would describe it as processing large volumes of text. So you have text, and you want to find something out about that text. And more recently a lot of people have also included more general natural language understanding in the terminal P. So even you have a Chatbot that generates text, something ChatGPT, which most people are familiar with, that’s usually also defined under that umbrella even though itís a slightly different task than really just processing the text. But basically, I think the underlying question is there’s text and you want to use a computer to do something with it. That’s NLP
Jeremy Jung 00:02:04 And some of your talks, there’s sort of two categories I think you put problems into. One is generative and then predictive. Okay.
Ines Montani 00:02:11 Yeah, and I think that was also something kind of in response to, yeah, maybe people even being a bit confused about what’s NLP and maybe mostly thinking about the generative part and not so much about the predictive part where you extract structured data from text even though I would say it’s actually still probably in industry and in production, the main area of NLP that is used, especially in companies because there’s just so much unstructured text and there’s so many cases where you want to get the text into a format that you can compute with or work with. So I thought that distinction was very important.
Jeremy Jung 00:02:46 Can you give some specific examples of what are some generative things and what are some predictive examples?
Ines Montani 00:02:53 Yes, generative, it’s of course something classic talking to a dialogue system like a Chatbot question, answering translation. That’s also a task where text goes in, text comes out. And then predictive is really more things along the lines of information extraction. you have a text, and you want to, or you have emails that are coming in and you want to decide are these emails spam, is it about billing? Thatís what usually would be referred to as text classification. So you assign one label to the whole text or then there are other tasks where you’re really extracting spans of texts, person names, organizations, phrases and so on from text. That’s also an area of information extraction and really an area where you predict something, and very structured information based on unstructured text.
Jeremy Jung 00:03:44 And so it sounds generative is something where you’re creating new text, I suppose, and predictive is more you want to know something or learn something about text that’s already there. Is that kind of accurate?
Ines Montani 00:03:58 Yeah, you can put it like that. And often for predictive what’s important is that what you want to find out about a text, you want to map that back into the text. you want to know where are these person names, what are they, how are they related to each other, how are they used? Often you also want to stack these things on top of each other, like you want to start by maybe deciding whatís spam and then for everything that’s not spam, you want to extract what department is it about. And then based on that, if it’s about that department and it’s about billing, what’s the invoice inbound mentioned in the invoice and so in the email and so on. So there’s often there really is a pipeline of steps that you want to apply that can depend on each other and at each have different requirements and different difficulties. Some of them you can maybe use rules for, you can connect it to your database and other things are much more complex and nuanced. And then other things you actually don’t even need machine learning for because you can just use a regular expression or just write it in code. So yeah, there’s like a lot of different things people are trying to do.
Jeremy Jung 00:05:08 Since your company started as a consultancy for NLP work, can you give some examples of projects that your company took on just so that people can get a sense of what are NLP problems I suppose?
Ines Montani 00:05:22 Yeah, so I mean there’s a lot. We actually still do consulting occasionally because we feel like it’s actually very important to stay close to the use cases because if you’re developing tools for people that should solve a problem for them, you want to make sure that you’re actually solving your own problems with the technology. So we still have projects like that. And to give you examples, one common topic or use case is extracting certain information from news and then feeding that into an internal knowledge base. So you might have a company that even stuff like, a company wants to find out whether something in their supply chain or in someone else’s supply chain might be impacted and cause them a lot of problems. That’s, it’s kind of sound like a boring use case, but if you think about it insanely valuable.
Ines Montani 00:06:11 So then what do you do? You want to scrape news and find out that hey, there’s a strike in this small town here and that might later come back to us via these channels. Something like that. Or you want to analyze financial documents about acquisitions and mergers, who was bought, what amounts, and then at the end of it you might want to compute things like which emerging areas are there, how many acquisitions did Apple do in that timeframe and so on. So that’s like really classic information extraction work. And that’s also actually a lot of what we are seeing in the projects that people are trying to solve.
[...]