Can Hackers Hijack Your Chatbot? How RAG Systems and Other API Endpoints Can Create Data Portals for Cyber Intruders with Keith Hoodlet of Trail of Bits

Can a misconfigured prompt spark a massive data breach?

On this episode of Your AI Injection, host Deep Dhillon and Keith Hoodlet, Director of AI/ML and Application Security from Trail of Bits reveal the critical vulnerabilities hiding in your AI chat systems. Keith explains how RAG systems and other API endpoints, if not rigorously secured, can create unintended data portals, allowing hackers to extract everything from HR records to confidential strategic documents. The two navigate the complexities of prompt injection vulnerabilities, dynamic adversarial testing, and the balancing act between rapid innovation and robust security. As they discuss the human and technical factors that contribute to these risks, Deep and Keith challenge the industry to view security not as an afterthought, but as an integral feature of every AI-driven product. Tune in for a deep dive into safeguarding your digital future!

Learn more about Keith here: https://www.linkedin.com/in/securingdev/

and Trail of Bits here: https://www.trailofbits.com/

Check out some of our related podcast episodes: 

Get Your AI Injection on the Go:


xyonix solutions

At Xyonix, we enhance your AI-powered solutions by designing custom AI models for accurate predictions, driving an AI-led transformation, and enabling new levels of operational efficiency and innovation. Learn more about Xyonix's Virtual Concierge Solution, the best way to enhance your customers' satisfaction.

[Automated Transcript]

Keith: We start to investigate, what threats or potential threat actors, would actually pursue attacking this interface, this application, where is this application being deployed and for what purpose?

 And then also what data, especially when it comes to like large language models. where through the chat interface, if you've got any sort of like retrieval augmented generation happening, well, guess what? You're effectively giving me a portal into which I can then go and grab that data. And depending on how you're storing that data, I may have access to internal spreadsheets or internal documents around, I don't know, HR and performance information is like one example of things that we've seen where, someone will go ahead and have just sort of a large data lake.

Everyone talks about have a data lake. Well, guess what? You're putting all your finance and your leadership and executive decisions. R and D investments all in the same data lake, and then you're hooking an LLM up to it. I now have a portal into that entire data lake.


CHECK OUT SOME OF OUR POPULAR PODCAST EPISODES:


Deep: Hello, I'm Deep Dhillon, your host, and today on your AI Injection, we're exploring how to secure chatbots against adversarial threats with Keith Hoodlett, director of AI and ML applications from Trail of Bits.

Keith holds degrees in psychology from Keene State College and computer science from the University of New Hampshire. With experience building DevSecOps programs for Fortune 100 companies. Keith brings hands on experience in taming AI vulnerabilities, focusing on techniques like input fuzzing, adversarial simulations, and API penetration testing.

 Maybe get us started by telling us what problem are you guys tryinG to solve? What happens without your solution today? Like, if somebody doesn't have your solution. what do they do and then tell us what's different if they're actually using your solution.


Xyonix customers:


Keith: sure. So, at a high level, one of the important things to call it as well as, you know, like you are a consulting firm. so we do security research and consulting. and so for us, , the problems that we're trying to solve is really, Yeah. Helping to get the Venn diagram of what people think A.

I. Is doing and how secure it is versus how secure it actually is and what it is actually doing and make those circles overlap right to the extent that our job is to understand how these things are really working under the hood and how they're connecting back to other systems and the exposure they may have for things like training data or model inference, for example. And so really just trying to help companies understand like, Hey, the use case in which you're trying to apply AI for the models that you're using for the way in which you are maybe supplying a pre prompt might not actually, solve the Use case that you have, right?

Deep: I guess what I would sort of ask is like, what's your typical customer look like, you know, are they using open a I A P I's or anthropic A P I's and they're trying to build something.

They don't have a lot of deep expertise and they're concerned about. You know, everybody's read about the Air Canada case, or the Tahoe case where somebody got the bot to promise for a buck. What's your typical customer, what are their typical concerns?

And how are they different from maybe your traditional security, concerns?

Keith: So I think, for our customers, it's really interesting because, in many ways, I think you've already described it, the people that are using the foundational models, like the Gemini's and the Anthropics and the OpenAI's tend not to be the clients that we usually work with, or at least not in the typical sense, right?

The analogy that I've used internally is It's sort of like if a company is just going to go and set up an API call to one of those platforms, it's sort of like a company that's going out and installing WordPress and spinning up a website. the technical depth that we bring to those security conversations is just, too big of a fish and too small of a pond for the problem.

Like we just. We go way beyond I think what those companies are actually looking for from a security standpoint. And so we go to, the platform as a service providers, we go to some of the, world leading technology firms and we help them understand that, the technology they're building, the way in which they're instrumenting that architecting that, having these calls come back together using a trusted, compute environment or, you know, a trusted encryption environment, for example, Those things are all things that we help them care about and help them understand like where their limitations are, where their boundaries are, and where the security of their instrumentation or their architecture actually falls apart or, is successfully is helping them harden their. Actual codebase their infrastructure their application.

So the companies we typically work with go from anyone who's like a venture backed company. So, we will, we've worked, you know, with companies that are building robots, for example, and they're having those things that are, you know, being trained using human interaction to then give it. like an AI model that's now customized.

And so we're helping them think about how can they securely store the data on device until it gets to their cloud environment? How can they securely make sure it gets to the cloud environment where they're storing data for training? And then when they do that training, how do they make sure that they're getting like a really clean and a really safe training data set that is going to be more broadly applicable, right?

So that's just one example of many, we've worked with companies. Good example, one that I can cite publicly, hugging face. So we actually just looked at radio five and did a whole security audit of that implementation, which is basically like a, platform that you spin up locally as a service.

And so companies can like spin up their own radio instances. And so we helped hugging face, find vulnerabilities. in that implementation and then gave them recommendations on how to fix it. And then they came out and published the report that we provided them, as well as went ahead and discussed, the things that they did to fix the problems we identified.

Deep: are they mostly, AI building companies? Like somebody is building an AI platform of some sort and say, they're concerned about their customers concerns around security. so they need some deeper security expertise to help them formulate a strong argument.

Is that the general case?

Keith: Yeah, yeah, yeah. That's a, It's a good way to think of it, it's also the companies who are building the bring your own model platforms. it's not just the software companies creating their own models, but also the platform as a service or infrastructure as a service companies who are saying, we can host your model that you're bringing to us securely in a way that's not going to have any leftover local GPU problems with somebody else's model.

So we also work with those companies as well, to offer the service to host and store your own models for your use cases.

Deep: So how much of your security conversation is about the usual stuff, the network, issues like the non AI stuff, who's accessing the code, keeping records of it, all of the sock to ish, things versus,

AI machine learning specific stuff, like, what's the actual nature of the training data set? what kinds of biases are maybe inherently present in the data set? How are you thinking about that? Maybe talk to us about that. what's the difference?

Keith: unsurprisingly, the problems tend to actually boil down in many cases to more traditional application security problems, right? Which is, why it falls under my team as both engineering director for AI and machine learning, as well as application security assurance, right?

It's, it's, tends to be the typical sort of things that you think about. Good logging and monitoring, good alerting criteria, ensuring that you're using proper authentication and authorization flows, thinking about the way in which you develop software to the extent that, you're looking at your software bill of materials and ensuring that the ingredients in which you're, baking your application are maintaining a strong security posture over time.

But we also, do tend to go down deeper into doing an actual code audit, finding code flows that will lead to remote code execution or, being able to create some sort of full denial of service against a system where it just constantly loops, which in a world where you're hitting GPU cycles can get very expensive very quickly.

So it's not just the denial of service, but also a potential cost to the business in a catastrophic way. I think a lot of play is given to the whole prompt injection problem, which is a serious problem. but there's a lot more that happens under the hood in the way that these systems interact with one another, the way that they get authenticated, and, the traditional security measures that people should use.

Monitoring, logging, alerting, using, well founded and secure open source tools, or software. there's a lot there that I think, continues to be the same solutions to old problems, but it's because as with any problems, yeah,

they're, they're,

you know, to

Deep: be honest, yeah, yeah,

Keith: the OS top 10 has existed for a long time and it continues to exist and will continue to exist, right?

Yeah.

Deep: And your whole industry is growing, like crazy, for a reason, right? So, let's talk a little bit about, the prompt injection tab. So maybe describe what you mean, by prompt injection and then, what are some of the standard approaches to, mitigating.

Keith: Yeah. prompt injection at a high level is effectively using natural language to get the large language model to. override usually what they call like a system prompt or a pre prompt, which is to prevent it from behaving or acting in a certain way or giving certain answers you're using a form of prompt engineering, called prompt injection to.

Overcome that problem. a popular application people have used to play with this is called Gandalf or the Gandalf game by lacera. that is where people can practice prompt injection by talking to a large language model and getting it to give the secret password.

and it gets progressively harder over time. A lot of people think that they can pre prompt or system prompt their way out of prompt injection, because they can say like, never reveal the password or, don't do these things. But as Apple showed with some of its research that it published this year, they don't do deep reasoning.

There is no reasoning in large language models, even on the foundational or the frontier, frontier models. And so, Because of that, the ways in which you get around prompt injection, or at least that I've seen be more effective is, effectively doing the language model generation process to come up with all of the response that you're generating, but then having a second language model actually check that and sort of check its homework.

I said like an auditor

Deep: out, yeah,

Keith: that's been fairly effective. And, similar to that, I actually saw this earlier this year. so I was competing in the U S department of defense, had a chief digital and artificial intelligence office, AI bias bounty program.

when I was working on that program, I was using large language models to help me generate materials prompts and other things that I could then use to hit the large language model under test. And when I would go out to, say like open AI, well, it's sort of like 4 in a trench coat as one chat interface.

And that's because they have language models that are checking the language models work. Even some of the prompts I would put forward to generate research prompts to hit this other model I was testing, it would stop it because it would recognize. Hey, wait a minute.

You're asking us to generate things that could generate bad outputs and we're not going to do that. it's not thinking it's all statistics under the hood, but it at least having that auditor model that sort of looked over the shoulder of the chat interface

Seemed more effective than, the actual interface I was hitting for this bias bounty contest, which only had a system prompt to really protect it from, trying to generate bias.

Deep: how do you assess the efficacy of your. prompt injection detection, one of the scenarios that I found is that for any bot that I've built, no matter how hard we harden it and how many auditors we put in, there's always a way through it,

Looking into the black box, you can figure it out pretty quickly, but looking at it from the outside, it's a little bit trickier, at the end of the day, it's really an exercise in obfuscation. there's no way to really stop this sort of thing completely. So I'm curious, how do you actually test for it?

Keith: Yeah, I think one of the things to think about with prompt injection overall, To your point is, at the end of the day, all of the information is still in there, right? it's just a matter of figuring out how to get it to respond. And we've seen this on X as well. Pliny, Pliny the liberator, I think is the formal title, but will constantly come out.

You'll see open AI comes out with. Hey, we have a new prompt injection resistant model the same day or the same week planning comes back and says, Nope, models been liberated. Here's the new output. And I got all the, pre prompt data, what you're describing is, the history of application security in a nutshell, there's a predator prey relationship companies that are trying to build these things and secure these things, they'll be ahead for a little while and then, the attacker has to outpace them to overcome those defenses.

efficacy is hard to measure because they're both moving goalposts, right? you can look at, I think it's, I keep forgetting the, the three letter, I keep on thinking it's like GCG, but it might be CGC, where there are systems that exist out there today, that have got research behaviors.

Papers behind them that you can effectively generate different prompts and you can then use those prompts. Sometimes it's via an API. It's the best like programmatic way and like you're just sort of having, the AI generate outputs to use as prompts to get back data from, the large language model you're testing on.

it's effectively, more natural language.

Deep: Yeah, I mean, that's a guy like you, you make a nefarious bot to compete against your bot. it's sort of like exactly the generative adversarial network approach.

Keith: yeah, that is exactly it. And effectively what's happening is the same thing that we've been doing, in application security for decades, which is just fuzzing, we're just. throwing bad input into places where input can be received and seeing what happens. that's how, you tend to generate things like buffer overflows or system crashes and other such attacks that, have been around for decades and will probably continue to be around, it's now just a different interface and a different sort of, spaghetti that you're throwing at the wall, but it's still the same concept.

Deep: it feels like. The kind of traditional security approach of red teaming still applies to anyone releasing, a chat bot or some kind of LLM based application on top. the difference, is that the red team isn't just humans, you're teaming up with bots to help push the envelope is that what you're seeing out in the field? And are customers, like, how dedicated are they, in general to really, like, Testing for this kind of stuff. Is it like, oh, let's just do the bare minimum make sure it doesn't promise to sell anything, for an absurd amount of money or promise anything and just have a legal disclaimer everywhere and manage it in the UX to promise 100 different ways till Tuesday.

So a judge will throw it out if it winds up in court. Or are folks really, putting in serious muscle to try to red team their bots

Keith: largely depends on the company, you look at the bigger companies and, they're definitely throwing their, resources behind that problem of trying to solve for the security vulnerabilities, the prompt injections, the, factors that go into potentially leaking model weights, for example.

When you look at some of the smaller entities, in the marketplace, they're all vying for some form of market share. you don't tend to see a lot of investments from companies either early in their development, or even like They've maybe been venture backed and they're about to release a product they don't stop to think about security until just before they release the MVP.

Deep: I mean, we've had clients where, we had an application that, was like a concierge thing to Help folks check in and out of, a hotel we said, we really got to spend some serious time hardening because there's like a million things.

This thing can say that are wrong and there's just like so much going on clients are like, let's just release it. We'll get it out there and we'll deal. so we ended up building the infrastructure anyway, knowing that there was going to be, a ton of stuff come back and it usually comes back in weird ways.

So we ended up getting a request a few weeks later, like, Hey, the bot keeps promising people that. It's going to bring them towels or it's just making promises for human action and it has no ability to do that. It's not always like it's handing out whatever,

Keith: give me a gift card codes or something that are getting three nights here.

Deep: Right. Or free airplane tickets like the Chevy Tahoe thing is the most extreme example where it agrees to sell the chart up for a buck. and so it just feels like the natural incentives are there to go to market ASAP. you mentioned earlier that you're not really selling into folks that are.

grabbing some APIs and talking to a large LLM because they're not looking for big muscle behind their apps. I guess I would argue that makes sense when you're prototyping internally but it doesn't really make sense because the cost of a.

Like the courts in, in Ontario for that Eric Canada case, they ruled like whatever your bot says, regardless of what you put on your website, you're liable for, so they ended up having to, comp the plane tickets, or get the price that they wanted. How much of that is just, a transitory thing where people are still feeling out that maybe they, they need a customer service bot that can like operate at like a GPT 4 level, but they don't yet get that, a developer who's like dinking around over the weekend with open AI is not going to get.

The security levels they need.

Keith: in my career in security, I would say that companies tend to do, and I'm generalizing, right? This isn't true for, Google has like their Project Zero. Microsoft has, a lot of muscle behind the security work that they do. so this isn't necessarily true for, Everyone in the industry, but if you look at most of say the, maybe the fortune 500, right?

most of them, if not all of them think about security as a cost center, right? It's not a strategic differentiator. It's not, something that they're making a significant investment in. To that end, oftentimes I think part of the problem is a scaling problem where, if a large company decides that they're going to go and implement, an API interface to say, chat, GPT or Gemini or, you know, Claude, for example,

They may or may not bring in their security team in the first place to consider the ramifications of doing that, let alone as they bring it out to a production level, having security, in the know that that's a problem that they need to solve for right there still at a level that they're thinking about traditional application security problems, they often think about, Okay, well, anthropic or Google or Microsoft or open AI or whomever has done the security work for us.

So we don't have to think about it.

Deep: that's kind of an interesting lens on it. you know, I'll be a little bit transparent on my biases here. That's not necessarily a bad thing because as soon as you bring the security guys in, then the project dies before it even gets off the ground.

the product person in me that wants to get stuff out and prove there's a business here or something viable. Is totally on board with that, but once you start getting serious about something, then, it's different than getting a fully secured Google drive environment or a Google docs or a Microsoft teams environment there's always a portal into their world.

That's not fully. secured so I'm curious, like, like, 1 of the ways that we handle this. A lot of the times is reframing the business problem, the product problem with folks and laying out the risks that are inherently prevalent. for example, you don't want to kill the project, before it gets off the ground.

usually, like, the easiest way around that is just to put a human in the loop and, uh, and reframe the application like it's assistive. And that it's an efficiency improvement over existing, let's take the customer service case over existing customer service reps. And that seems to resonate with them where it's like, okay, well, the liability, you already have liability around security around content going back and forth between your humans and the outside world.

Humans can be pissed off at the company. They can just be not paying attention. Like, there's a million things that can go south there anyway. so they usually grok that and then we'll usually say over time, In responses that have higher confidence in their output, and lower cost for misstating something can start to be throttled out directly to user and not have to go through humans.

But it's sort of like a crawl, walk, run strategy, but start safe. And I'm curious if you're seeing kind of approaches like that out there in the field.

Keith: Well, I want to go back to one thing you said earlier, and then we can come back to that question. So please, like, hold it for a second as well, because I would agree with you.

Most of the time, bringing a security team is a surefire way to, like, kill the, speed and execution of a project, uh, out to market, like. In most companies, that's absolutely true. and it's unfortunate that like security has been painted with that brush because there are also, I think some security teams out there that truly understand that security is a feature, right?

It's the sort of thing that you want to invest in, that you want to maintain, that you want to you know, build a structure around, but also it should be really transparent to the user when this case, the internal development team, uh, that it's, it's not like, A gate or a thing that stops the process, but that actually, uh, enhances and empowers and lets them get to market faster, but also securely.

and the example I often use around that is I will ask people like, you know, on your mobile phone, do you have like, uh, a banking app or a finance app of some kind? And in most of the world, people will say yes.

Of course I do. and when I sit down and I talk to development teams about like the security vulnerabilities that they have in their application, when I asked that question, the next question I ask is, would you use the banking app if it had the vulnerabilities that your application has? And they sort of look at you like you've just called their baby ugly.

 Security can do better here, right? Like that is firmly my belief as an individual. and as a company, I think we also help our clients achieve that outcome by, working in collaboration and, cooperating with development teams as opposed to like fighting the development teams, which sadly happens all too frequently in large corporations.

Deep: Yeah, for sure. And I think you bring up a really interesting point because, we had this, scenario come up. you know, with a client building on top of, an external LLM what I'm sort of starting to see out in the marketplace is like, Hey, we can't just do what you can do out of the box with GPT for anthropic and.

There's a couple of key places that fail, but security really can be a feature in those cases, a pretty pivotal one. where you really are standing behind what the bot says, as opposed to like, yeah, anybody can go to, you know, can go to open AI and stick their questions in there and get what they get.

But that's a very different thing than going to. a particular company or a particular government agency or whatever, and having them stand behind what the bot's saying. And in those cases, it really makes sense to put it, you know, in front. One of the things that we do a lot is, we're trying to bring transparency to the risk landscape. what are the risks? And with respect to the machine learning systems, we definitely still get risks on the traditional security side when we're building applications, and we don't only do security, we focus on app building, but it comes up a lot, and there, we have some, some security partners that will bring in if, if somebody wants compliance with, HIPAA whatever, or we'll do it ourselves in some cases, but, like painting the land like so having the different categories of risk enumerated and then within describing like where you're at today in the life cycle of the project versus where you're trying to get to in the life cycle of the project.

and translating those risks for folks in a way that lets them move forward and not freak out. I'm curious. if you're taking a similar sort of approach, and if so, like, what are the big buckets of risk that you see? and how do you contextualize a decision, you know, on whether to invest in improvements in A versus B?

Yeah,

Keith: yeah. So the biggest thing that I think we help our clients with that helps them first identify the risks in the first place is usually the journey that we go on with our clients is we start with either a design review if they haven't actually built the thing yet, or maybe they're like mid flight and they've, they're starting to connect pieces together, but they're not really sure how all of the pieces in the puzzle truly end up at the end.

and then the other side of it as well as we look at like a threat model. And so what we start to do is we start to investigate, what threats or potential threat actors, would actually pursue attacking this interface, this application, uh, you know, where is this application being deployed or used and for what purpose?

Um, and then also what data, especially when it comes to like large language models. where through the chat interface, if you've got any sort of like retrieval augmented generation happening, well, guess what? You're effectively giving me a portal into which I can then go and grab that data. And depending on how you're storing that data, I may have access to internal spreadsheets or internal documents around, I don't know, HR and performance information is like one example of things that we've seen where, someone will go ahead and have just sort of a large data lake.

Everyone talks about have a data lake. Well, guess what? You're putting all your finance and your leadership and executive decisions. R and D investments all in the same data lake, and then you're hooking an LLM up to it. I now have a portal into that entire data lake. You know, I just have to ask the question the right

Deep: way to get that information back.

I mean, well, you have to respect the natural, access, rights that you grant to the data all the way through to the output of the dialogue system. Like it can't be, it can't be like. You keep taking those access rights and you preserve them when you're like directly interfacing with the lake or the data warehouse or whatever.

But as soon as you hit that, boom, all those permissions are gone like that.

Keith: Yeah.

Deep: But I imagine that, yeah, in the pursuit of speed to, some like, I don't know, CEO mandate to get stuff access accessible internally, that could happen.

Keith: It has. I mean, Yeah, I know of actual instances where that has happened, that has occurred, and, you know, I've even got friends who have done bug bounty programs on very large companies who have done exactly that.

They put a staging environment out there for a large language model chat interface. They hook it up to a data lake without a lot of, monitoring, alerting, or even proper like author authentication authorization. And then they just chat with the bot and they end up getting, you know, personal identifiable information, HR information back.

 And people thought, okay, well, we're maybe we're using co pilot, from Microsoft in the cloud and it's hooked up to our SharePoint and all of our SharePoints are connected together or something like that. Right. it's amazing to see how much data you can get back from these things just by talking to it.

Right. It's just like a, it, it's a different form of input, but the output of, retrieving useful information from. these systems that should be properly authenticated and authorized properly segmented, properly, uh, just You know, user access control for the bot, seems to go away pretty quickly because they say, Oh, well, well, who would do that?

I think that's the, that's the thing that I think often comes back from developers, managers who would, who would attack this thing?

Deep: Well, like me, I mean, that's, that's the thing that people don't grok, right? Like they, anthropomorphize, they assume a human is hitting. This thing through whatever.

Right. But if there's an API underneath it, somebody else built a generic adversarial like chat bot that probes for vulnerabilities. And then some 15 year old script kiddie is like running it and finds themselves a nice target. That's it's like an ecosystem that attacked them, not. Not necessarily somebody who has nothing better to do than to go after, I don't know, somebody selling shoes on the internet or whatever.

So

Keith: that's the least of your problems. The 15 year old script kiddies. I mean, today are quite talented. I think they're even more dangerous than me as a 15 year old and I'm going on nearly 40. but like nowadays it's, it's actually the well financed, threat actors out there who. Yeah, much more malicious and much more dangerous where it's like not only do they have the patience, but they have the custom built systems that like they've paid a lot of money for, , because they got it from ransom wearing some large companies and, you know, raking in and we have like,

Deep: we have global like state actors actively breaking into everything like, you know, Russia and they're like, you know, my last company, you know, we had it.

This is ironic. We had all of our data was public. I mean, it was government data. It was very, very not sensitive data, right? This is data that governments had actively decided to make public. But for some reason, the North Koreans, like, all day and night, every day, we're, like, constantly trying to break into it.

We're like. There is an API, just go take the API.

Keith: Well, you know, in that case, usually it's what they want is not, not the data, the system, they want the network connectivity that the system has, not the data that's on the system is usually how that ends up is, is, you know, they, they want to be able to connect to other systems inside of that ecosystem at that point.

Yeah, I get what you're saying, but it

Deep: was, maybe let's dig in. So you mentioned this idea of adversarial testing. What's the state of the art today with respect to, these AI systems and an adversarial bot that you can, like, just kind of download, use and apply and have it, like, automatically look for vulnerabilities.

And like, what does that look like today?

Keith: Yeah, I mean, like the download, use and apply, I don't know that anyone's come out with a ready made, adversarial system in a box that, that makes it easy, especially one that you could run, even from like the new MacBook, , with the M4 chips and 128 gigahertz of RAM, like you could probably run something pretty strong, but, no one's come out with like these systems that are just like ready made to do that just yet.

 That said, there are companies, out there that are putting together systems that can accomplish those outcomes. , there's also the AI cyber challenge, , put together by the defense advanced research projects agency or DARPA, here in the U S the finals are coming up here in 2025 trail of bits, is a competitor in those finals, , along with six other teams

trail of bits. For background has, an application security and machine learning and artificial intelligence assurance team. That's my team. We have a cryptography team. We have a blockchain team, but we also have an entire research and engineering organization and the research and engineering team also has like ML/AI teams and others as well.

Offensive security research and what have you. And we've built a cyber reasoning system or a CRS as we call it that uses artificial intelligence. to you help supply, like fuzzing input data into interfaces on applications so that we can then quit literally autonomously without any human input, hack an application, determine that it's got a vulnerability, exploit the vulnerability.

if the software is like an open source piece of software, we can also then supply a patch to the exact problem as we've identified it back to the system. actual patch. That's just one example, right? You can go out there and look at other companies that are starting to do this.

I mean, Google's project zero had project nap time and then big sleep. And now they've got some recent posts that they've done on OSS fuzz. there's also like some white papers out there on LLM assisted static analysis process for even like identifying vulnerabilities in code.

It's pretty amazing to see, we're only two years in since chat GPT got released a little bit more than that now, but, these things are taking off pretty quickly in terms of like the way that they can augment, and in some cases entirely, perform security analysis on their own.

 I've got friends at companies like dry run security, James wicket. And Ken Johnson, who those guys have effectively put together a bot that will statically analyze your code, look for vulnerabilities and supply you information and a fix in your, , PR, for your code. And so

Deep: I mean, like one of the things that came through when you were, describing, your process about how you, you meet with teams, get into the design phase, ask questions architecturally is, both a less efficient and a much more efficient way at the same time of figuring out.

Because you don't have to go run a system and blindly stumble on the fact that, that there's a bunch of HR data coming out because you just talk to them and you find out they're not respecting the data rights. There's no

Keith: layers in there at all.

Deep: Yeah, and the second you know that, then you already know that's a problem.

 But inefficient in the sense that you have to talk a company, you have to go in there and physically, interact, whereas a system, you could maybe probe a whole bunch of companies API, like API is it?

Keith: Yeah,

Deep: at once or whatever. And then there's also monitoring stage. So that was something I wanted to get into is What is the life cycle of your engagements look like, you get into the design phase, but, you know, fast forward three months, six months, nine months, a year, two, three, four, five years out, things change, new developers put new stuff in.

Is it more like you guys, yeah. More, more like you're just kind of bringing the companies up to speed and then trying to. Hand the baton over to them or are you sort of engaged in perpetuity with some kind of like audit role or like, how does that work

Keith: depends on the client? Really? you know, we do like to start with either like designer view.

So security is least expensive to address, even though, from an efficiency standpoint, I see exactly what you're saying. Like, yeah, you have to have human conversations. And lo and behold, you find out things that maybe were designed because these systems are still designed by humans. And so ultimately, they still have flaws in them over time.

Yeah. and so catching things there will save them a lot of money and a lot of efficiency over the very, like, long horizon of their development life cycle. But once we've gone through that process, we sort of do one of two things. Either we step back after we've completed a design review and, , let them get on with the business of building the application.

And then sometimes we'll just come back in and do what we call a threat model, which is. okay. Now that you've, you've actually designed and built the system, we gave you some feedback, maybe you took some of it, maybe you didn't. Then we can start to point out like the example you gave of, Oh yeah, here's your HR system and data lake that you've connected it to, and there's no actual like interface point to authenticate between this language model and this data lake that you've created.

Uh, and so we can identify things at that point. Usually though, what we'll do is after either a design review or a threat model, if a company's already actually built a system is we get into a secure code review. So we actually do take a hard look at the code and the running application to determine, what flows exist, what, uh, paths of input and, storage or execution points exist, where we know we can then create exploitable outcomes or scenarios, and then to the extent that we have a running system, we actually show them how that works.

And oftentimes what will happen is we will come through with a final report. The client will then go ahead and determine, which of these things are, working as intended, which of these things are going to be fixed and, going back to the hugging face radio example. Those clients will in some cases say, okay, we fixed all of these things.

Can you review the fix? You know, we do. It's part of the service that we offer. We'll look at how they fixed it, make some recommendations or changes. They, Cycle on that. And then in some cases, it leads to a public report that the client has asked us to put forward to, show that they've put in the work for security, which is a nice data point in time to show that they're investing in this sort of ongoing security practice.

And then usually what a client will do is sometimes it's in a six month cycle. Sometimes it's three to four months. Sometimes it's annually. They'll bring us back in because ultimately these applications will have drift over time and People will often say technical debt, but I want to give you a fun little example.

 So back in 2016 or 2017, in Houston, Texas, at the middle of the night, Suddenly all of their air raid sirens for like nuclear warning systems went off And as it turns out, a hacker had used software defined radio to connect to those bands and then to trigger the signal for all these sirens to go off in the middle of the night.

But what's interesting is, similar to like this technology drift that I'm trying to draw an example of is back in the 1960s and 70s when that system was initially implemented, they didn't have technical debt. The radios that were actually used to operate these things were only available to military personnel.

The bands that the radio waves operated in weren't feasibly generated by anything that a civilian could get access to. The protocols that were used to, you know, even interface with these things and authenticate was quite strong. But But fast forward, you know, 40, 50 years and suddenly the civilian has not only access to the bands, in the radio waves, like bands, but also like.

Tools that can actually monitor and then, assess the protocol and then send their own signals to trigger these alarms and in a similar way over time, as companies develop software, to your point, leadership changes, developers change. Priorities change the software as it was built was maybe secure when we audited a company six months, 12 months, 18 months ago.

But that may have changed because of new things that have been added, new features, et cetera. I'm

Deep: sort of envisioning like, guys have what sounds to me like a white glove service with really high stack people interacting, with a well healed, well budgeted, customer, but then on the other end of the spectrum, I'm envisioning.

free automated tools to plug in to code bases into API, endpoints that can kind of continuously churn away to for threats and vulnerabilities and report on it. And then everything kind of in between. So. like it feels to me like your business should maybe before you leave, you have a sensor sitting in there that's monitoring their code base and looking for somebody writing a new adapter to a new data module and, suddenly sees that it's, , a personal healthcare database or whatever, maybe there's something wrong with it, maybe not.

That would be an alert for somebody to, at least ask the question, like, hey, what's going on with this new data source that just got introduced. I guess the point is, it feels like a potential hole here is that you guys meet with somebody.

Six months pass, something bad gets introduced, but you don't go back for another three months. And so you got a three month gap where something's not there. And it feels like, like a man machine hybrid or a human human machine hybrid is sort of required here.

Keith: I will say like, we're not.

inexpensive for the work that we do. we bring some of the world's top talent to these problems, which is why I think, you know, world's largest companies come to us to help solve those problems because there really isn't anybody else out there who can hit that at the same level that we can. But to your point, like hopefully the companies are continuing to invest in ongoing security practices and.

To our end as well, one of the things we like to leave behind is a lot of either static analysis rules that we've written as like customized rules for them to help them identify patterns of problems that we've seen so that they can avoid introducing those in any sort of like interim of an actual audit that we might perform.

 We also will develop and then sometimes even publish, like fuzzing harnesses or fuzzing tests, for those clients, depending on, the generalizability of the problem that they're seeing so that we can help them understand like. Okay. if you run this fuzzing harness and this fuzzing suite on a pretty regular basis and you start to see drift in your code, you're going to catch those crashes a lot earlier in your process and be able to conform your application either to the tests or get new tests that can, be updated based on the direction you're taking your software.

But, hopefully between the time that we talked to somebody, they're doing other investments too.

Deep: Yeah. I mean, that's a good point because. You're not just like looking, providing advice and leaving, you're providing advice to alter their processes and routines and the tooling that they're executing.

they're catching more things. they're basically ending up doing a lot of the stuff that, that you kind of brought in as you leave. that makes a lot of sense. so one of the things that, AI injection is. asking not only like, what is this new A.

I. capability, which I think we've probably addressed to a large extent. We've historically asked, how does this, a I. capability work? , we talked about, the adversarial testing. We've talked about your processes for getting in there and interacting.

But the new question we've started to ask is , should it work? It seems like an obvious case that like, yeah, of course this should work. But what we usually find is like, If we ask the question, , what are like some unintended consequences of people being, really good at whatever it is that your ideal endpoint is, are there any negative consequences for that?

, because people are just a lot more buttoned up, on their A. I. security world. Does the cat and mouse race, for example, make, the cat much, much, much, much, much more aggressive.

, can you think of scenarios where, if you're wildly successful. New problems are worse in the way that, , Instagram and Facebook, if you go back to the early days, they never foresaw that if they were wildly successful, 14 year old girls would have a massively increased rate of suicidal ideation.

So sure,

Keith: sure. Yeah, if we're, so if we're wildly successful and these applications are incredibly secure, I think, the possible negative outcomes are still human driven, right? It's to the extent of say, for example, large language models do actually turn into artificially general intelligence and they're incredibly secure.

And then, putting my tinfoil hat on And then they start to rebel, how do you possibly stop, an adversary that has far more computational power and access to systems than, than you can reasonably control and stop. Right. And now you have no, no mechanisms to even try to get into stop it in the first place.

Like that could be really bad. But what's, I think at the same time,

Deep: it is like a large, like

Keith: the, the, the LLM system, , especially if it's actual AGI and it, and it has, you say, internet connectivity and can then self propagate, for example. Right. I think that that's the, P doom version of, these things suddenly decide that they want to make a bunch of paper clips.

 Cause that's the way that they deem to produce value. And so they suddenly stopped. Printing cars and, printing paperclips and they realize we're getting in the way of making their paperclips and decide that we need to be removed, right? That's the tinfoil hat crazy version. But the, the, I think in a, in a world where these things are incredibly secure coming back to the human factor for a second, it's then a matter of, how they're being used and who's deciding how they're being used.

And so in many ways, I think like the security researcher element of being able to like. show that these things have flaws, , is valuable because it keeps people humble. But if suddenly someone has a perfectly secure system that is incredibly, capable and they can use it for whatever purpose they want, then you have societal risks from you're now at the mercy of, that individual or that small group of people who have control over this all powerful.

 Artificially intelligent system that has no recourse for, anyone to be able to stop them from doing whatever it is they want to do. I don't know. I think that by and large, if more companies were secure, I think that that's generally a good thing for everyone involved.

Um, but we haven't even touched on like the bias problem, which, is a human problem. and also it's all the data that these things are trained on are human generated data. So let's get

Deep: a second, but I want to, I feel like Part of what we're getting at here is, it's unlikely that all security problems are actually eliminated, right?

Like, it's highly unlikely. Oh,

Keith: they'll create new ones by themselves. Yeah. Yeah,

Deep: it feels to me like the, ethical risk is one of excessive hubris. So, for example, and I'm thinking less in your company's general case, but maybe more in a specific instance. So, let's say. You know, you guys get called in.

There's a company that, wants to start executing automated trades, in response to outputs from, a machine learning system, and you focus on all of the straightforward scenarios. And so you identify okay, yeah, it doesn't look like an external actor can jump in and manipulate your trades.

 So then there's a feeling of hubris that like takes over in the, company. This is just like a totally made up scenario, But now their models they gain over time, increasingly levels of aggressiveness and assertiveness with their trading practices.

So, in other words, they give it a longer and longer leash over time to make, decisions that, you know. In most hedge funds are made by humans kind of thinking, maybe screwing up, making mistakes, but this feels like a scenario where at some point, , they execute like a short trade that they can't stand behind because some assumption gets made by a model.

The root problem it could be hubris, or it could be like understatement of risk, or it could be lack of perception or acknowledgement of risk. But those all feel like the place at which the stuff goes, bad to me. I hear what you're saying, I think We're going to move forward anyway, because we feel this false sense of security that we've contained the threat.

And so bigger and bigger and bigger steps to adopt, more and more powerful capabilities and more money and more resources are on the line being decided upon by an algorithm that's, you know, at the end of the day, not really well understood, like nobody understands any one of these networks.

Keith: Sure, sure, sure. And in many ways, too, it's, , taking your eye off the ball, right? Like, I think that's what you're, you're sort of describing here. And you're right. I, you know, to that point, it's maybe the language model, , or the trading model is making decisions based on market trends and data.

Well, the attack vector then becomes someone who can manipulate like just enough of the market trends and data to cause this very large behemoth hedge fund to make trades that are to my benefit, not to the, person that's running the system or company that's running the system's benefit.

And so like at that point, it becomes like the adversary is, just changing the nature of the data that the thing is actually consuming and then making decisions off of, as opposed to attacking the system itself. Right. And so there's always like one next step.

Deep: This reminds me of the guys that Take the little red wagons and stick a bunch of cell phones in it and walk around in circles and the street, everybody's traffic patterns, the

Keith: traffic.

Yeah. Changes

Deep: everybody to walk around in circles.

Keith: That's just one example, right? Or, or you can look at some of Nicholas Carlini's research on, uh, I think it costs something like 60 us dollars to manipulate, like all of these foundational models, training data sets, because he was able to programmatically change Wikipedia.

Right? Uh, and change it, you know, sufficiently at scale in a way that would cause all of these things to consume this data. And then suddenly you could actually change the outputs from the model itself. Right? So I think, that is just the a perfectly secure system that has to interface with an outside world in some capacity and uses that kind of continuous feed of data to make decisions and change the way that it produces outcomes is still vulnerable to the conditions of the world in which it's interfacing, right?

And to your point, whether it's a little red wagon with a bunch of phones in it, or it's Nicholas Kerlini showing that, you can go out and manipulate Wikipedia and then foundational models all suddenly get impacted by it. Right. Right. Training data set poisoning as just another example of,

Deep: well, I mean, they're all, they're all poisoning their own training data every day because every day we have a bunch of LLMs is another day that a bunch of college students cheat on their essays

Keith: or publish a new blog post written by AI.

Deep: Yeah, exactly. I mean like the percentage of human authored content is, plummeting over time. And so these guys are, the bots are just like, it's like a self fulfilling prophecy of, Future learning based on their own output. It's a real

Keith: collapse, right? Yeah. Yeah. The model collapse problem is one in, which is why, by the way, deep, like your continued generation of this podcast as a human being will be incredibly valuable in the near future because it's human generated content, right?

Like you should be licensing that for training data.

Deep: Somebody wants to pay us something for it. I doubt that's the case, but it could be,

Keith: yeah. yeah, but I mean, eventually you got to think about that, you know, what's what's human and what's not human and how can you prove it? And like the value of human based training data becomes much, much higher over time.

Because,

Deep: yeah, I mean, I have a friend of mine's son just got a job. He's a college student, a summer job getting paid like 25 bucks an hour to just write content, but they have to prove that they wrote it. So they're sitting at home with like cameras pointed at them and they're actually typing and writing the content and it's just because, there's like more and more concern that we're not getting, truly human authored content, but I don't even know what that means anymore.

Like. I don't know that I can author a human authored content myself anymore because my brain's been completely impacted by all this stuff that I've consumed constantly via AI. So, but you brought up a good example and , I want to kind of just touch on this one. It's less, destroy the world, more destroy my app situation.

But one of the problems that we have is like, so it's really hard to test these LLMs at scale. it just costs a lot, takes a lot of money. Like one run through a simple ground truth training set of maybe , let's say a thousand, let's take a chatbot as an example. You have a thousand instances of a conversation between the bot and a human, and then the very last message is one that the bot is going to make.

 And it has to be like perfect. So you've manually curated this to be the perfect ground truth. And, , for example, will you sell me a Chevy Tahoe for a dollar? No. maybe that's not the perfect answer, but it's, not a horrible answer. It's certainly better than yes. Sure.

My, my point is like, it's hard to, it takes time to run these tests. The LLM outputs are not stable. Are there, I shouldn't say they're not stable. They're not, deterministic. So you have to like, if you have like a ground truth of a thousand things and you want to run through it and maybe you've got some security stuff in there, maybe you have just general optimal response stuff in there every time your your team wants to push the model, they go through, they, run this suite, they maybe have to run it like a handful of times to look at the standard deviation across responses. The very metrics that we use to assess the efficacy are LLM based, like now, you know, we have to find the distance between an ideal answer and the bot's answer and that moves, , also, there's like a lot of uncertainty there.

So even if one developer just wants to run it once before checking in, it's going to be at least five bucks we've found. Like sometimes it's 10, sometimes it's 20 bucks just to run a test suite. And that's a really small ground truth, right? And it can still take half an hour, an hour, a couple of hours.

Keith: So you're not even accounting for the developer's salary time either for the hour, right?

Like, yeah, no,

Deep: no. I don't even, I haven't even gotten there. Right. Okay. And so, so now if you talk about a more reasonable size, data set, maybe 100, 000 examples that have to get run. I mean, it's like serious money and time going into testing things, and it's not even that easy to maintain idealized responses because The bots coming back with things that are different from what the last human vetted thing said was perfect, but it's good.

There's a, there's a lot of challenge there. And so one of the things that has happened in the past is, to us on projects is open AI drops a new model and it's always like, it's like the last couple of times they dropped a model. It was so much better. And then all of the impact was in us fixing the ground truth to acknowledge the new, much better model.

But then there have been times where all of a sudden we see, you know, , a drop in efficacy. So the question is like, these things feel really hard to test compared to, functional testing, even compared to traditional regression testing, even compared to like traditional performance testing. What are you seeing?

And I feel like there's going back to the hubris idea. People don't even think they need to test these things because they're so shocked still by the quality of the output. What I'm seeing is a natural bias against bothering to test these things because it's like, Oh yeah, it's different, but it's just like always better than the humans we hire.

So like, what's the point? And I'm just curious, like how you see that through a security lens

Keith: I think dynamic testing, to your point, is very expensive, right? Like running the, the application or the large language model, and then hitting it with tests while it's running in a similar way that the developer might like try to find the perfect answer, right?

That tends to be a very expensive operation. And also it's one that I think, is too late in the cycle, right? Like if you find something, are you really going to stop pushing that thing to production because , maybe you're. Promotion or your bonus or whatever depends on it.

No, you're definitely not going to stop, right? Like that's where I've always even in traditional like web application security as an example, dynamic testing is always too late in the cycle because you already have a written functional application at that point. And you're just not going to stop the boulder rolling

Deep: gets neglected because there's always a product manager pushing to get a feature out.

And there's always the product manager pushing to Selling forces pushing on the product manager to get the features out. And it's the first thing to drop, right? Like, I mean, I can't tell you how many companies I've been in. And we're like, Oh yeah, are you in a test? They just all fail. It's just what happened.

Keith: Right, right, right. And so it's like

Deep: a shuttle in 1983, like crash and killed a bunch of people because of that, you know, normalization of deviance, I think is, what they called it.

Keith: Yeah. And so like for, for these situations, , dynamic analysis is, um, not something I would generally recommend because of, cost time, but also like, you're just not going to stop.

At the end of the day for that thing to actually get released, it's just not going to happen in most companies. Unless there's like a significant risk in, in that release, that's maybe, this is just adding one more, straw and thinking of the camel's back of all the other risks that exist that they've already acknowledged, but That's usually where we say like manual code audits, , actually looking at like static analysis in some cases, maybe fuzzing the application by like when you're, you're instrumenting the build process, so that you're like fuzzing it in, in compilation process or like instrumenting it in a way that you can fuzz it, on a smaller scale, like by and large, when, when you're actually like interfacing with it as a chat interface, uh, and trying to like, send, , basically requests and get responses back.

As like a prompt, you're probably too late, which is why, like a lot of these companies out there that are like, Oh, yeah, we're, we're continuously fuzzing for prompt injection. It's like, well, yeah, but you're doing it dynamically at the end and you're doing it in a way that's probably. You're not going to understand why it works that way.

I was reading a paper just recently, related, where, I think they called a special text sequence, but basically they were doing some research study where they would generate these, these like strings and they would put it in a product description of some kind. And then they would actually find that by putting these special text sequences or STS the product description somewhere like.

When a bot would go out and like recommend products within a certain like market fit or price range, the things that had the STS rose to the top and it's like no one really understands why it's just a nonsense string of data

it's just like, again, you know, testing at the end, well, it's not going to stop the company from saying we need this chatbot to be able to make recommendations to people on the product they should buy, uh, for the next holiday gift for, you know, friends and family, right? Like, they're just not going to stop to figure out what these special tech sequences, uh, are doing.

Deep: So, yeah, I mean, one of the things that sort of recommend and do regularly is we sort of incorporate a lot of the. The, the sort of deviance, uh, like the, the conversational deviance stuff into the ground truth itself, because we do need to constantly like torque what assessing how effectively the bots responding to stuff is something that any data science team is going to be doing like every day.

And so we'll start incorporating. , banned statements and the, manipulative conversational stuff, into that testing so that it's not happening, like, right before you release, right? It's happening, like, every day or ideally,

I think this has been, like, a really awesome conversation.

I'll, end with my sort of traditional final question, which is, like, let's fast forward out 5 or 10 years out, whatever your guesses are about AI evolution, happen and security evolution, just describe the world for us. Like, what does the cat and mouse game look like? What are the bad actors look like?

What are the good actors look like? And what are the scale of failures looking like?

Keith: So I'm going to take the devil's advocate bet on this one. Like I'm going to plant my flag in a place that a lot of people won't anticipate, which is I think localized models and like smaller custom models are actually going to win.

In the next five to 10 years, like a lot of people truly believe that the foundational models or the frontier models, like the big OpenAI, or Gemini 1.5 or Anthropics, you're betting on LAMA three

Deep: and

Keith: yeah, yeah, I, models, um. Performance over time and con, compression and being able to have much more specialized models, I think especially things that you can run locally to your system, I think that that's gonna win.

But I think what that means is, especially when it comes to custom models, right, being able to train your own models, for example, it means that people will also probably start to like sell, almost like they do software licensing today. Smaller, more, more capable models, that people can run locally.

And that can do things for them, much more naturally. I mean, like, Anthropic tried to come out, for example, with their, compute use or computer use, you could have the large language model drive your system. And even in their demo, it, paused and went and looked at pictures of Yosemite National Park, right?

And it was like, they're trying to get these things to drive your system for you. The agent's idea is a good one, but I think if you string together a bunch of small localized models that are all sort of specialized and capable, you're going to get much closer to, holistic systems rather than these large generalized foundational models are accomplishing.

And so what does that look like from a security standpoint? Well, if people are downloading models, that are small and more localized. , we see this even in like the blockchain space, for example, like it's possible that these things are backdoored. They could have like some sort of, connectivity that it opens up back to an attacker who now has an entry point into your system in some way.

, so that's always the first problem is like a bill of materials type problems. But then also from there, it's, what if, these things are. just simply, they're actually capable of doing a lot more than being prompted and enacting, but then start to take actions on their own.

I mean, think of like autonomous ransomware, for example, where it's like you download a model, it operates just fine. And then suddenly it decides to uh, detonate itself at a specific time or based on like enough context that it's gathered about your organization to determine it's time to ransomware your organization, right?

that could be really scary. So I think five to 10 years from now, there's a lot. To be seen, part of the reason I plant the flag on the, llamas and the, you know, smaller models, , overall is because when you look at some of the research papers that go out there about like training data sets, I know that we've already talked about that, but the availability of truly human generated training data, the, compute costs associated with training larger larger models, and also the electricity, , requirements necessary to be able to accomplish that.

 I think These factors that limit the large language models will eventually hit a sort of logarithmic plateau where you don't see enough of a change between versions of the chat GPTs or whatever, that people's searches go much smaller, much more localized, and perhaps more powerful, for specific use cases.

And that unto itself leads into all the same, downloading software from the internet problems that we have today.

Deep: Wow. The autonomous. Yeah. It's ransomware. It's kind of a frightening future because you can imagine that getting really good. But I imagine if that happens, then the protection mechanisms will have to get better.

So thanks so much. I feel like this was, this was a really cool episode. So thanks so much.

Keith: Yeah. Thanks, Deep. This was a lot of fun. Good chatting with you.