In this podcast, we talk to Boris Bialek, vice-president and field chief technology officer (CTO) at MongoDB, about how artificial intelligence (AI) can help with discovery and management of unstructured data. Bialek sets out how AI can help bring together different classes of information that an organisation might hold about customers to make processes much quicker and more efficient. He also talks about how multiple AI agents can operate together to make these processes work in an agentic fashion. How can AI help with discovery and management of unstructured data? The recovery and identification of unstructured data is one of the oldest tasks in IT. It started with scanning papers and trying to make pictures out of them, and then people actually typed the stuff out. Imagine you get a handwritten document about an accident description and you try to make sense of it. Today, AI can do that for you in zero time. And beyond that, it can understand and reason about the data. It can lift the intellectual level from “I have a picture” to “I have a text and I can extract sentences which consist of ‘accident’, ‘bicycle’, ‘street’ and ‘the mountain was steeper than I thought’”. So, this is where AI really can help. It can be pictures, it can be text, it can be sound. The classic database model, the RDBMS from the 1970s, is great for structured data. But this so-called structured data means mostly textual data, which can be numbers, but anything which is in a structure which we can put in a spreadsheet. Anything else is considered unstructured, which is a little bit unfair. What we’re doing now with AI is lifting this data to the next level and being able to interpret it in a sensible way. What approaches in the use of AI to discover and manage unstructured data exist for customers? If you ask any startup, they will tell you they’re the only answer for that one. But when we take a more intelligent view, there are two major ways. One is to look at what kind of data you have and build a solution around it. And most important is the combination of fresh data, where I get unstructured data – video, sound, things like that – and put it into context with other known information. For example, Boris has an insurance number, and Boris has a contract with Antony’s insurance company. So, those kinds of mashups between, for example, operational data, metadata and reference data, together with what we call “signals”, is the first approach to bring these things together. But the other option is how do we do this more intelligently and break it up into a horses-for-courses approach, the best horse for the best racetrack? There are solutions here. One is EncoreCloudAI, or PurpleFabricAI from a different vendor. Those solutions allow us to put the data into an intelligent form, so I don’t need to start from scratch. So, I can get my data, bring it into an operational data store, get my legacy data out, and lift data from there, which could be, for example, documents, physical papers. These could be in legacy document archives or document management systems. That, in my opinion, is the fastest way to do it. That said, there are enough good reasons to build your own. In many cases, if you have specific needs, such as if you have specific video information that you need to process in a very specific form. For example, somebody driving through a toll gate on a highway and you want to make sure they pay the toll. There are specific cases where writing your own code makes a lot of sense. But it’s all about getting the data together from existing data and the new data, the unstructured data. That’s really what makes intelligence work. What are the key benefits of applying these types of techniques to the data? The key benefits are that I can build a completely different picture of my environment. In the classical relational database, such as with an ERP [enterprise resource planning] system, which knows your sales numbers, you know how much you sell. You might have a CRM [customer relationship management] system and it tells you, “Boris is a great client” and “Boris is on my website right now”. But what does Boris really want? I could do the classical approach of a BI [business intelligence] system and say, “Boris falls into the category of white male, middle-aged person, and maybe he is looking for a new bicycle. Let’s offer him a bicycle.” But that’s not really what you could potentially know about Boris. Boris may have bought a bicycle from you last week and is maybe now looking for a new helmet. So, when you bring these things together, you want to drive more intelligence towards your consumers in the retail space. In the positive sense, in that you want to be relevant, and you want to help them. You don’t want them to say, “Why is he showing me this stuff? I’m not interested in this.” Also, let’s say we have an insurance case, somebody bumped my bicycle, it was parked in front of the house, and now I have a repair case. So, I go to my insurance. If the insurance is able to make sense out of the information I provide very quickly, they can have a very quick turnaround in claims management. And if they do that, it helps me to be a happy client and not be concerned that my bicycle was damaged, who pays for it, etc. Now I get an answer an hour later: “Yes, the bicycle is insured. We will fix this, don’t worry.” So, these are the reasoning parts which were not possible before. You could not put so much data into context. Secondly, there is natural language processing. Boris can talk to the insurance company and say, “My bike got damaged. My bike was parked in