Is coding still worth learning?
In this episode of Your AI Injection, Deep Dhillon sits down with Austin Vance, CEO of Focused Labs, to explore the radical future of software development. Austin explains why AI might soon bypass programming languages like Python, generating bytecode or AI-native logic directly. The two discuss how developers today are pairing with AI copilots to navigate legacy systems, optimize frameworks, and build faster -- but what happens when AI handles everything except the requirements? Will human developers move from writing code to directing machines? Tune in to find out if programming languages are on the brink of obsolescence.
Learn more about Austin here: https://www.linkedin.com/in/austinbv/
and Focused here: https://www.linkedin.com/company/build-with-focused/
Check out some more of our related podcast episodes:
Expert Tips for AI Implementation and Data Strategy with Paul Lewis
Speak Directly to Your Data, No Coding Required with Sarah Nagy
Transforming Workforce Management with AI and People Analytics with Adam Binnie
Get Your AI Injection on the Go:
xyonix solutions
At Xyonix, we enhance your AI-powered solutions by designing custom AI models for accurate predictions, driving an AI-led transformation, and enabling new levels of operational efficiency and innovation. Learn more about Xyonix's Virtual Concierge Solution, the best way to enhance your customers' satisfaction.
[Automated Transcript]
Austin: It feels really inefficient for a generative AI to write Python so that a human can understand it. Like at what point are we going to skip Python? And like an LLM can write bytecode or some LLM native language that it can hold into context better, have reference, referential integrity of better than like a human language.
And we're communicating with it exclusively requirements. That seems like a natural progression of the field.
CHECK OUT SOME OF OUR POPULAR PODCAST EPISODES:
Deep: Hey, I'm Deep Dhillon, your host. today on your AI Injection, I'll be talking to Austin Vance, CEO of Focused Labs.
Austin, thanks so much for coming on.
Austin: thanks for having me, Deep.
Xyonix customers:
Deep: Cool. I'm really excited to get going. Start off by tell us a little bit about what kinds of AI projects are you seeing out there? maybe before and post LLM world and, what kinds of stuff are you seeing actually get past being built and having some success in the marketplace, that we haven't heard about?
Austin: one of my favorite AI projects that we've been working on for a little while now has been this, one that I don't know how I would have used traditional software to solve. I think a lot of times we see like AI as a feature flag or it's like a marketing tool like we use AI, but it's really just a chat bot or something you could use.
It's like a wrapper over OpenAI. this one is really cool. It's a company called Hamlet, a customer of ours. They use, large language models to watch city government meetings, think planning commission meetings, county level meetings, and then pull out, decisions happening in real time.
And then Provide transparency to local governments real estate developers or political action committees some of the stuff that could have never happened is starting to understand how a city council member changes their views over time after hearing commentary from Constituents or even just identifying speakers out of a YouTube video that is C SPAN style.
the camera's way over here and the people are talking. The audio kind of sucks. it's been a really cool project that . I don't think we could have done until large language models and gen AI got to where it is now.
it's been a really cool one.
Deep: Yeah. It's funny that you bring that one up. I have a theory on all the meeting time saving, apps that came out post COVID everybody's getting sick of being in zoom meetings during COVID. And so like we actually, helped, pretty awesome, note taker company.
do something very similar, like automatically extract the action items from a meeting. this, actually started pre LLM, or at least pre GPT. I think GPT 0 had just come out, but it was really early days I personally know four actual friends of mine that do this.
it really took off. it makes a lot of sense when you start tightly verticalizing it like that, in a really particular context like that municipal context is interesting.
Austin: I loved the idea because after COVID, all these local governments decided to stream their meetings online or post their videos on YouTube or on a government website
But still, if you want to get involved local government, you really need to find this stuff and you have to sit through a Pretty boring stuff. And if an LLM can help summarize and say what's going on in your city and help, bring transparency to how decisions are getting made.
it involves a community in a way that I haven't seen the government get transparent in a long time. I love the product. And I thought it was such a cool application of an LLM rather than it just being a chat bot a summarization tool or a sentiment analysis tool, which is a lot of what our projects are or rag, right?
We'll have a whole bunch of corporate docs or financial docs and we'll figure out how to parse those and search over them
Deep: Yeah. That's a pretty common, like having some dialogue engine go on top of, a collection of content, a corpus.
Yeah. That, that, that template is, we could probably even take it up a level beyond meetings. we've had multiple projects, and seen a lot of this kind of activity ourselves, which is certain things have happened in the last couple of years, like transcriptions just gotten really good.
And so you wind up, having a lot of, text that you can even work with off of, whatever it could be a video. It could be a meeting. It could be. We're even seeing phone conversation stuff happening. the audio gets transcribed with a reasonably high level of accuracy.
And then now you can go in with the LLM kind of deeper understanding of the text and start to pull out a lot of stuff. like I mentioned, action identification and meetings, but also a lot of summarization stuff, a lot of, trying to figure out what the most important salient snippets are.
So you can take them into like really crisp little 1530, 60 second, video snippets from maybe a long lecture online
Austin: Yeah. So the coolest problem, I think Company solved. And we can talk about some of the other organizations we've worked with, but like transcriptions, incredible now, like just so good.
but if you have one YouTube video with one audio stream, you can still do speaker identification, but most of these large language models will do like speaker one, speaker two, speaker three, they don't know who the speakers actually are. And for this use case, knowing who's talking and assigning that to a name, super important for transparency and who's making decisions in your government we use the large language model to identify speakers through the transcript. So you can say if someone says council councilwoman, Mary, what do you think about this? Or, let's do a roll call. city council member, Jim, are you here?
then the LLM can be like, Oh, I can associate speaker, the next speaker with Jim. And I know that's this person. I can also look at an attendee list and know that's this same as this. So you can actually pull out names from the conversation and assign those back to the speaker labels that a transcription engine would give you.
And then you can build that into a larger piece of software. So you can start to track how a real person is talking over time. when you do a zoom call, A lot of times the large language model will get multiple audio tracks. It knows you're speaking cause it's coming from you versus coming from me.
And it can watch that. that was a really fun problem. I don't know how I would have solved that with traditional programming.
Deep: Yeah. The diarization problem is, interesting that you brought that one up because that was actually one we were struggling with too, cause we were trying to be agnostic to the.
Meeting platform environment. most of the diarization APIs, are not as evolved as transcription APIs. And you have, like different human of similar kind of builds, and like kind of age and gender and other kind of stuff will have really, similar vocal patterns.
And those models can really struggle trying to figure out who's saying what. So it makes sense to go to the text itself to try to stitch together. But ultimately, what do you think? You think we're going to start to see more multimodal models that are a little bit more powerful and able to handle some of these diarization challenges or
Austin: I'd hope so.
you're starting to see a little bit of it. the text is just so nice to work with across all other parts of software too. So even if the multimodal can do directization better, eventually I want text so I can put it in a Postgres database and do full text search eventually getting to the transcript with better speaker labeling and things is important. these models just continue to get more and more powerful, any prompting seems to help a lot. We cracked the code through using DSPy To drive the prompt engine around pulling out the speaker labels and putting that into the data.
we have this kind of AI data extraction pipeline and DS pi is a big part of that. it's auto generating prompts using an evolutionary algorithm against grading to eventually find what it thinks are the best speakers. it's gotten to a really good spot.
I don't think we could have done that just by ourselves. Prompt engineering.
Deep: Tell me a little bit more about that. I haven't heard of this tool. So ds pi what's going on there?
Austin: essentially you give a starting prompt and a set of Expected answers to an algorithm and it runs through using that prompt, and grades itself.
it uses a LLM to alter the prompt decides whether that was better or worse and then alters it another part of it. And so it's like continuing to iterate on the prompt, trying to get to more and more accurate answers. you have to have the, this is what I would expect out of it.
So what we did is we handed it a handful of transcripts where we knew The labels already. We knew who was speaking where and as it iterated on the prompt, the prompt grew from, ours was probably a paragraph to a fairly significant, look for when someone identifies by saying, Mrs.
Chairman or identifies by role. And it was like adding that stuff into the prompt itself until it eventually got. really substantial.
Deep: and when you were done, did you find that you wound up with the prompt you wanted and you can use it in all contexts, or did you find that you needed that adaptive iteration in different contexts in the runtime environment,
Austin: in this case, we found a prompt that worked really well until we change models, like the models have enough variants that we would, we rerun the
DS pi algorithm if we switch from, sonnet to GPT 4 0 or something like that.
Deep: That's really, it. It's an interesting concept to structure, a learning process for prompt optimization. that makes a lot of sense. it seems like a natural thing that's starting to happen with the newer models, they're starting to build a lot of that into the models, but until we have that fully fleshed out, it's fascinating to, and also doing it with training data makes a lot of sense then, because you'd have to build a fine tune model then to like tap into that,
Austin: works really well. what was surprising to us which is always the thing when you have a program iterating over a model, especially top tier, models, it can get expensive, if you're doing, thousands of calls and your context windows get fairly big,
It can get really expensive, but then eventually you end up with a prompt that works pretty well. Pull that out, drop it in and you're good.
Deep: You're going to tell the next model rev or whatever, right? Exactly.
Austin: Exactly. You go again.
Deep: Yeah. tell me, this is a landscape that's just changing so fast, right?
There's so much going on. And I noticed, some of your projects are, doing a lot of, you mentioned rag models too. now that the whole landscape's changing. you have the assistance API that's taking a lot of that, rag model details out and black boxing it, making it easier to use, but more opaque, what's it like from your vantage dealing with just the rate of change, And it's no longer just an open AI only world, there's Claude, there's Gemini, there's just a lot of stuff going on, how do you guys stay on top of, developments, how do you, react to them, how do you deal with the fact that whatever you built eight months ago is probably not how you would do it today?
Austin: I remember the first front end single page app I built with backbone JS. by the time we were like feeling pretty confident in prod, there was like Ember JS out and we're like, Oh my gosh, we should have used Ember for this.
Software moves fast if you're in the bleeding edge and AI is. That right now. And you just, sometimes you're just like, okay, we're going to put a stake in the ground and that's the legacy we're going to stick with. Even if it's like not the correct or most modern implementation in the future.
most of what we're building, we're not building foundational models or anything like that. We're integrating with models.
Deep: so
Austin: If you can follow good software practices, like I will decouple myself from the integration by building adaptive paths using abstraction.
And if I can have good testing and, accuracy analysis built into my deployment processes, then actually switching the models up becomes fairly trivial. And you have some, like you, if you have some eval process built into your continuous integration, you can swap a model and it can say, Oh, you eval worse with this new model.
Just go back and it shouldn't be worse. Okay. Let's spend some time on that. Or actually, it's on it. And GPT or just like the same in this case. So we'll just stick with the one we know we've been using a lot of good software development, abstraction and writing code to make it better.
The eval stuff is a little different because of the non deterministic portion of it.
Deep: Maybe why don't we pull on that thread a little bit because I've found that to be a challenging scenario. traditionally with machine learning, you define a ground truth with data set and you have, some kind of, distance metric some way of determining whether or not you hit a target variable correctly that gets harder and more challenging in these LLM contexts because. you have to use heavy reasoning to figure out how far a generated response is from an optimal response. a lot of times, you might have multiple optimal responses. And then, like you mentioned, it's non deterministic.
So you rerun your thing a bunch of times and you get different outputs from the bot and each time can take a while. So like running through like thousands of permutations, can be hard. I'm curious, like how you guys approach that, the efficacy assessment challenges with LLNs. Yeah.
Austin: It depends on the work. If it's a rag over like reviews of a product, I think is a good one. If it's a rag over reviews of a product or sentiment analysis in the product returns, like people like the red Nike's better than the green Nike's every, 3 percent of the time, I'm not super worried about that.
And our customers don't seem to be super worried about that. If it's. Like a legal framework or, employee handbook that needs to be correct about how many vacation days you get or something like that. Rag, which we've done and you want it to always be accurate.
We built in like answer time checking, and then a lot of logging. And so I really liked, Langsmith's gotten really powerful and there's other tools too, but I always liked them, but we love the ability to go back and look at what the LLM is responding with and why, and giving grading into it or putting grading into the responses that actual users are doing.
And then we always have a set of. Standard with this question. I should get something close to this response. And then we have a grading happen around that as part of our continuous integration. So if we're iterating on a prompt, for example, we'll have, the QA like eval set,
Deep: I wanted to pull on this thread that you had said, where some customers are okay with the model making mistakes in certain scenarios. We've run into this. Scenario where customers don't always know what they think is okay and not okay.
Do you ever find that it's hard to explain why certain things which require a lot of work and testing need to be put in place because these bots out of the box can work really well. And so it's not obvious that in three months, your bot's going to be promising all kinds of, actions that aren't real or ever going to happen.
Because they like to please. And, they're either going to sell the Chevy Tahoe for a dollar, the Canadian airline tickets are going to get promised to be reimbursed.
There's a million things that the bots can do on their own because they like to please. You get a little bit of bandwidth for them to look at a few results. And then they look at them, they're like, looks good, ship it. And you're like, yeah, that's not representative of the millions of results that are going to come in and, the anomalies you're going to see, how do you deal with that?
Austin: It's not a solve problem. That's for sure. And I don't know if there's a silver bullet in some cases, like there's just deep human in the loop cycles. That's the answer still. And over the last, two years of us doing some of these kind of more complex, Retrieval applications or LLM based applications, the human in loops have gotten less, but there's still a human doing something.
I don't know if I would be comfortable today selling my services to build a chat bot that could sell a Chevy Tahoe and like actually close that deal. Like you said, it feels like please too much. Like I would want it to give all the correct answers and then kick you over to a rep to close it out.
Deep: There's a way to set that context up and one way is to like, have it recommend to customer service humans who like send a response over.
But usually that gets. pedantic pretty fast for at least some subset of the responses that have high confidence. But sometimes it's like it can just be Phrased and maybe there's some disclaimer and there's like a way to present it to users so that they know that they're just being given, something to think about as opposed to a response and so that's like another thing that we've seen being used and deployed there.
Austin: The disclaimer idea is good. I just think customers see that disclaimer and think, okay, I'm going to just not trust anything that's here. If the AI is deployed in a way where it's supposed to feel like a full customer service support agent.
Deep: Yeah.
Austin: And then you have a little asterisk at the bottom of the chatbots is like everything that this thing says could be a lie.
I think customers don't love that. And and neither do all government
Deep: entities. I think in Canada, there was a provincial, ruling that said, I don't really care. Your bot said that they get their money back. So they get their money back.
Austin: One kind of call center use case we've seen is instead of presenting the customer information directly, we use, real time, listening to the conversation rather than search what's going on and then it's augmenting context that the customer service rep already has in their head.
So they have all this knowledge. They could search See, yes, tool or something like that. But the LLM is doing search, understanding what's going on and presenting them with other relevant information. We have these deals going on. You have this coupon code you could give to them if you want, but judgment is still left to a human.
You don't get the like economy of scale where an AI bot could answer millions of calls, right? You get substantially less training requirement into your CS or you get the ability to move people around a little bit easier. And. You start to build a corpus, as you're recording and listening to how these things work, , you see what your CS people are using as presented you, then you can represent that information more often and stuff like that.
Deep: Yeah a common pattern that I've been noticing myself., and I'm still trying to wrap my head around like the whole life cycle of it, but there's multiple things going on. Like one thing with that customer service. Context is just training the customer service people better. So there's like that part of it. There's also and that's a part of training sometimes like assessing their actual human provided answers against a rubric can be really valuable. I built a system to help It was basically like a virtual college advisor kind of scenario.
And talking to students. And so we put it. Through the humans and took the human responses and benchmark them against, you gotta be empathetic what was your empathy score? What was your encouragement of autonomy score and.
And that sort of serves as a nice way to get the human. Customer service person to think of all the different things that are required for a good response. Whereas, if somebody asks a yes, no question in the past, they might have just given a yes, no question answer. But if they're being scored on their empathy, and their encouragement and some other stuff, they might be like, hey, good job for asking the question, some of that stuff can actually be really valuable because it teases out, more dialogue.
Austin: .
I haven't seen this, but you made me think of something , so I worked as a telemarketer in high school for a very short period of time, and it was awful. But the , I sure that wasn't on highlight, it was awful. But one of the things that would happen when you're on the floor, like you're, you're in a little cube and there's like a auto dialer is like some manager could just like listen in on your call and then the call would hang up and.
A person would pop on that wasn't like the next call and it'd be like, Oh, I'm your manager. I heard you say this, you were off script here and you should have said this instead. And they just like real time feedback to you. I wonder. I haven't seen this in the call centers with AI yet, but there is some like real time feedback potential that I had to be cool to see how that would work.
Deep: That's a perfect kind of analogy and context. I forgot the date. I think I saw a movie where they did that intro. It was wild. It
Austin: was like the worst job I've ever had. I'm not going to lie, but, that's
Deep: got to feel just so weird.
Austin: It was soulless. Yeah, soulless.
Deep: so there's that train that training that, but not all customer service reps are are answering easy to answer questions, right?
I've talked to folks who have fleets of engineers are responding to questions and they're really hard questions and they're not like even answered in one location in the document. And so building up the fluidity and ease of access to knowledge across that fleet of folks is really a big deal.
The other thing that
Austin: there is like. CS is built to do that already though. Think if like AI could replace like the L1 support, but not L2. It's like the easy to answer questions
Deep: when
Austin: you're like support. We all have called, like you call United for help and then the first person can't have you try this, that'd be, then they move you to the next person.
And eventually you get like to someone who has like enough context or system access or something to help you. Or you don't need that to go that far and you're just good. So like maybe AI just like scrapes off the. That like first kind of surface layer where it's like the probably the lowest paid, the most junior employee, the lowest training required kind of customer support, and it just elevates the human up to the next layer kind of thing.
Deep: Yeah, which ends up creating like a need for a different kind of a human, right? Because then you need a human that's. It's more thoughtful and able to handle more sophisticated scenarios than just look at we say just look up stuff in a bunch of info and like a reason against it but the truth is a couple years ago that was a hard thing to do right and now but with with GPT 4.
0 and this 01 model, you suddenly have a rather smart entity that's able to do that. So then the question is maybe this is the question what do we think that next higher level up is? What are those humans actually uniquely good at that we think that models are going to keep struggling with?
Yeah,
Austin: I think about that a lot. Civilization like constantly abstracts like the most expensive or the simplest form of labor.
Deep: Yeah.
Austin: But population has always grown. So we've always found jobs for everybody.
Eventually it might be hard for some people for a little while. I have not seen even with the Oh one models. And who knows, I have not seen them be incredible at reasoning. They like applying real sets of judgment between two good answers. They're not great at. And I do think when you come to customer support and interactions with real people, that comes up a lot.
I mean, even in your sales cycles, you might have a customer you really love. They're in a tough spot. They're going to come out of that and you're going to like, okay, I'll cut you a break. We'll keep working with you for a little while or a customer support rep is okay, let me just, I know this is tough.
I'm not supposed to do this, but I'll just upgrade you to first class and move it on. And so
Deep: you're almost talking about those. intuition slash sense scenarios that humans can just feel this person's pissed or whatever,
Austin: Maybe AI can get there eventually, but it's not there now.
And it feels like it's pretty far from that. There's a lot more judgment in CS, even it feels 10 years ago, right? Like you'd call a customer support and they had more ability. Now rather than a mainframe where they can do anything.
They have a SAS app that they are, have a very specific set of credentials and they can't do anything except for say, I'm sorry, sir, I can't do anything here. And it's is it, is AI trapped in that place right now? And maybe eventually it gets to the judgment that's further out. But
Deep: It's funny that you say that because I have the opposite belief
it's I sat down when, Owen came out to figure out like, how good is this thing? it took me a while to come up with something that GPT 4. 0 was bad enough at, that was objective enough that I felt like I could really assess how well the reasoning has evolved, at least with respect to this particular problem.
I brought this up before on, on the podcast. So the listeners might be getting sick of this example, but I was a bit of a math nerd when I was in high school and we used to go, I used to go to these like competitions and I was at this state math competition and they asked this question and it's like, Using the number four you can use any mathematical operator compose, assemble them such that, this particular number is present.
So that's the first thing I did, as I said, okay, using the number using the number four and any mathematical operator, create the digit, you create it such that the execution of the expression becomes the number 37. And GPT 4 I guess it succeeded in interpreting that I wanted a mathematical expression.
It succeeded in thinking that it needed to compute it. It failed at actually adding and subtracting and multiplying some numbers together which isn't a big surprise. But after I told it like, yeah, dude, that's, it's not there. Whatever you just said is 26 or 22. And we go back and forth three or four times.
And then it's and then it started getting. And I did that with a one and it gets it right away. And then I, yeah, and then I repeat and I say, okay, you can only use four fours. Like you can't use five or six or seven or whatever. Like you can use the number four, four times. So four plus four divided by four minus four plus four factorial, like whatever.
, this was interesting. Both models failed to actually know. That I really wanted you to only use four fours all the time. 01 got it most of the time, because I made the problem harder next, where I said compose the digits between 37 and 50 or something.
So now it has to know oh, I need to come up with an expression every time. I have to like and some of these are actually, Pretty hard you need to be clever and creative and you need some square roots and some divisions You know on some decimal points like it ends up being a bit of an eyebrow bender even for a human I would rate L1 as pretty, pretty remarkable, and I wouldn't say that just four, I would have said four was remarkable a year ago, but now I'm spoiled.
Do you think
Austin: with so say you were to pull like Llama or Sonnet and build a chain of thought application using that. You could create the same results. Is it, Oh, one is special or is it like, we've made the LLM. Be more intentional about how it breaks down and then execute, executes a problem.
Deep: Yeah. I don't know exactly what's in a one, but from what I've read, it's, there's a lot of chain of thought and stuff going on. So I'm going to go out on a limb and say, yeah, probably. at least with respect to four, Oh, I was able to coax it.
Okay. With the exception that it doesn't have a compute engine, so it didn't seem to, it can't seem to add and subtract really basic math it fails at. Whereas I think they must have, with O1, they really spent a lot of energy, maybe they just didn't. I don't know what they're doing exactly, but if I was them one approach would be just generate a crap ton of, math problems with the, chain steps in them and then, and you're basically teaching it how to add and subtract and, all that stuff.
Another one would be just to build a symbolic interpreter, so I don't know.
Austin: Or a code executor, which I found like is a really nice way to do that too late. They're so good at writing code. And if you can convince this thing in the back and it doesn't have to be like to a user, they don't have to see this, but it just writes a bit of code executes and says this.
I recognize you're asking me math. We'll just do that. Yeah. And I think that might
Deep: be your approach to getting simulated a one level efficacy on this problem. But I think that in itself brings up a question like, okay, you're the machine learning AI developer, you, in case one, give it slightly ill formed instructions and it nails it maybe 80 percent and in case two, it nails it maybe 20 percent and you're forced to like, spend some time and kind of hack this prompt or whatever and get it so that it really follows your chain of thought and you coax it into thinking the problem through properly the same way, I don't know, middle school Math teacher does with their students.
Sometimes I feel like we're building houses of cards, we're building this stuff where you're relying on this big muscle to figure it out. And in the other case, you're putting a little more effort. You understand as a result of putting that in that effort, you understand the problem better, at least how this thing thinks.
And it just feels like there's an inherent trade off there. You're a, I checked out, somebody said you're clearly a developer and and I am too. Don't you think it's an it's something about, it's disturbing . There's something super, it seems like we're gonna have a new generation of the world's most horrible bugs that we're gonna have to figure out a whole new way of debugging.
Austin: It's definitely a tool you have to learn how to use. I hand it to my parents, I'm like, look how cool this is. And they're like that's cool parlor trick. And then they move on.
I used to say to junior developers or college students going into software development, like they're like, what should I learn? What should I learn? I'm like, learn how to Google. If you can learn how to do research effectively, you start to learn over a career. If I ask Google certain things in a certain way, I can get the right responses,
I'm not getting this website, which doesn't have good answers or has outdated answers. I'm getting this, this website, which does right now, it feels like the LLM's are in that stage. If I learn how to interact as I work my way through it, I learn how to interact with it and I get better and better responses.
And it's this tool I lean on And the first time I use it, it's a parlor trick. And then I I build up efficacy with the tool and then I feel like I can actually do it. And I feel I learned how to drip feed context to it through a conversation versus plot, plow it all out in one message or whatever it is that I feel like is even more apparent.
You said I'm a developer. I love using like code assistance.
Deep: Yeah.
Austin: Sometimes they're not very good. Sometimes they're great. And learning how to use them was like a pretty big uphill battle. It felt like I was like picking up them for the first time again. Or something like it was like, I feel, I felt.
Slower because of this thing. But then as I started to okay, I understand where it's good, where it's bad, how to ask it stuff, how to drop in context, how much context it can understand. Now I'm like, oh, I can have this thing generate pretty substantial amounts of code pretty quickly.
And the big, crap moment for me was a bit, a big part of my business is legacy systems. So we work with like old All the infrastructure, whether it's a mainframe or an old Java app or something like that. And often these things don't have documentation.
And so developers have to spend a ton of time looking through code to figure out how to even interface with this old code. I like, I could drop in a pretty significant amount of a code base into a large context capable model like Gemini and be like, tell me how, tell me the API. Like, how do I interact with this thing?
Oh it's a soap API. We figured that much out. Actually, here's a wisdom. Let me generate a wisdom for you. So you can actually start interacting with it. Or it's a RPC API, and here's how you interact with the RPC calls that were built into this legacy system.
You're like, Oh, wow, that would have taken us weeks before to figure out. And it's just reading code, figuring out, and then we can write against it. It's cool. It's
Deep: I think you, you really honed in on something that I think is a really important point where I feel like I know how to use this stuff pretty well, because this is all a pretty much, I've been working with these things really intimately for a while like yourself, but I'll interact with somebody else and they'll just be like, oh, GPT sucks.
And I'm like, what are you trying to do? Oh, I have to give a commencement speech, tomorrow. I'm like, okay, what would you do to get, your answer and they're like, I'd say, write my commencement speech. I'm like, okay, that seems terrible. Yeah, of
Austin: course. It's
Deep: going to be the boring as speech ever.
And he said, and he's what would you do? And I said here, I'll just do it. And so I looked down and I just said, write a commencement speech. As if you were a combination of Sylvia Plath and then I put in some comedian and and it, and then I gave it a little bit more insight, in fact, a little more insight and said, and describe why you're making the decisions that you are, but separate that from the actual speech.
And then I gave it to him. He's holy shit, this is unbelievable. I'm saying, yeah, because Sylvia Plath is interesting and not telling it who you want is boring because it's going to put it in corporate BS land. It's going to come up with the most banal, the hotel equivalent of a restaurant, it's going to be like stuff that's non offensive.
That's not.
Austin: Yeah. A hundred percent.
Deep: Yeah. And and he's okay, I would have never thought to mix a poet with a comedian. And I'm like, Goodbye. Yeah. But how else are you going to get to say something interesting, right? You got to. And and then I realized like, Oh, I know how to use these things.
Austin: There's like creativity in how we interact with these tools.
Deep: Yeah.
Austin: Yeah. And I'm guessing, you've seen people Google stuff and you're like, hold on, let me just Google that for you. Like I'll show you how to create an effective search That will actually get you an answer. You'll see a developer developers will do it.
They'll like copy paste a whole error message in that has like their own path. So it's like slash user slash Austin Vance. And it's of course, Google's not going to find that. Like it's not on the internet. Delete that. The same kind of happens for the LLM.
People are like if I just highlight this, copy paste it in. Or I say, write me a blog post about the future of real estate development. You get nothing. Like you just
Deep: you'll get boring, and it was exciting a year ago. It's not exciting now. Because everyone's realizing like these things can get to the minimal threshold pretty quickly in output.
But to get it to say something interesting about modern politics or economics, to get it to be like an Ezra Klein or something like, you can't use them naively, but you can get pretty, you can get really interesting if you're clever about it.
Austin: But I've liked, I've stopped using them for writing almost entirely. Yeah. Cause even when I ask LLMs to proofread and stuff for me, and maybe my prompts aren't good enough, it feels like they remove any of the edginess of any writing you're trying to do, like part of
Deep: that for sure. And even if you're like, don't change it at all
Austin: just do it.
Just don't change anything. Just make sure that it's coherent. And it's coherent to me means. Don't have a strong opinion hedge everything you say and so if you're trying to be if you're trying to write something interesting at llm is not going to create that I don't think
Deep: I disagree again.
I think it's in the prompting. So like I spent you know in my I've got a marketing team and they're always asking me questions and at some point I'm like, I'm just going to make a bot to deal with you guys. Like here, I need you to write this.
I need you to write that or any of that. So I just taught it like all the nuances of my writing is like, Hey, the semi colon is your friend and use it, but don't drive people nuts with it. Like all these really detailed stuff and then all kinds of examples of my writing in different contexts.
This is what was asked. This is what I produced. This is what I asked. This is what I produced. And now the prompts can be, quite, you can be quite expansive with them now. And And then I'll even go into the different scenarios, like when I'm writing tech writing, it's very straight edged and short sentences and boring.
And when I'm writing marketing fluffy and hyperbolic, and so I have to let it go a little bit. And it took me a while to get just the right little custom GPT. But once I did, it was like, okay, that's good enough. It's 90 percent of what I'd say,
Austin: yeah.
Deep: But it is work. It's for
Austin: writing. It's just even 90 percent isn't me. And that's why I tend to just shy away. Maybe you can, maybe you go back and reedit yourself back into it. But I'm just like, I would have never written this way.
Deep: Yeah, it depends what I'm writing.
Fair enough. Writing a response to a lead that came in. I don't, fair enough. I love
Austin: that. Yeah, that's different. I'm talking about when I contribute to like fast company or I'm writing. Oh, yeah. Yeah. Yeah. And I think there's a lot you can read a lot of that stuff now and you're like, oh, this is a, this is definitely AI edited or AI generated by a significant amount.
I know they check for some of it, but they definitely don't get it all.
Deep: And then they can't like. But I think that's an interesting one, right? This was part of my challenge, like, when I was trying to compare 1, I was like I can't really do writing stuff, because, it's too subjective.
It's hard to compare. But if I try to compare it, of my problem is I just don't like most other people's writing. I'm really picky with writing. I don't know how to describe it. I'm just super picky with it. I had some really good, I was a creative writing minor in college.
And so I got obsessed with writing and I have a very specific style. So I hear you like if I'm writing, if I was writing something like a thought piece like that. Yeah. I'll use the LLM. For outline and brainstorming and and having a good conversation about what I want to write about, and I might have it like generate little pieces and then, but, my fingers would have been all over it.
So
Austin: I use it like a continuity guide. And so Hey, I might have a draft. I'm like, help me. How are my transitions? What's working here? Is there any points that you think are like, not founded? And it'll be like, Oh, you made this claim and you don't really back it up. Or, there's a kind of a jarring transition from this paragraph to that paragraph.
And it might have some suggestions. Sometimes I'll use, use them. Sometimes I won't. That's where I found it to be the most, most successful generating content, or even like really even roof reading and editing. It's, they tend to remove too much of me from the content. Yeah,
Deep: that is that, that's a really good scenario.
Like usually what I think of when you said that is both my kids went through this kind of pretty rigorous like writing high school curriculum and the thing that was incredibly frustrating to both of them is they almost never got feedback from their teachers, like almost never, and they would write these huge essays and like all this stuff and the teachers would give them like a score or something and just like maybe a sentence.
And I was just thinking, like, why are teachers so against LLMs? It makes no sense. You're not giving these kids any feedback at all due to whatever constraints teachers are under. And I get it. They got, too many students but it seems to me like a beautiful use case for an LLM because To at least get some feedback,
Austin: I remember like doing my thesis working with my thesis advisor and Then I went into defense And I just got destroyed.
And I was like, how could you have not given me this? How could you not given me this feedback ahead of time? And I guess it makes sense. I'm writing a 80 page, document that has research in it, numbers. And so he's, reviewing it, but Got other stuff on the mind and then defense.
They sit down on their job is to be hypercritical. Yeah. I would have loved to have had an LLM to come in. I dropped it in there and the LLM was like ripped me apart too. Like I was like, Oh my goodness, this could be so much better now. And yeah, I agree. Like that's the use case I've found is awesome for writing at least.
And for code, it actually works well too.
Deep: Yeah. That kind of critique this code scenario. Yeah.
Austin: Like. How do you feel about this? Is this abstraction at the right level? Could I name this better? I do think there's like a future. I talked about this a little bit, but I think there's a future in which
programming languages are a human computer interface or program or computer interface, say make it easier for humans to write byte code. It feels really inefficient for a generative AI to write Python so that a human can understand it. Like at what point are we going to skip Python? And like an LLM can write bytecode or some LLM native language that it can hold into context better, have reference, referential integrity of better than like a human language.
And we're communicating with it exclusively requirements. That seems like a natural progression of the field.
Deep: I don't know how you feel about this, but if somebody's written a lot of code over the years, I've always felt like when, like when I'm talking to another developer or something, I always feel like English is just really inefficient.
There, there's usually a moment where I'm like, this is a dumb conversation. Let's just go look at the code.
And to some extent, I feel like Yes, that isn't, it's not only, the only purpose of code is not execution, it's also so we can talk to each other. And so the way you said that, it's interesting that it lets us communicate with the bot, with the LLMs about code.
Whereas if we were talking in English, we would just be like I interpreted what you said to me in this and did, and sometimes it just gets really dumb. I'm sure you've been in some conversations like that where you're like, English doesn't work for this.
Austin: But you get to like the multimodal, maybe way in the future.
Maybe it's really close and it'll just show up one day at a product launch. But you get to a multimodal world where I don't find myself often saying, let's go to the code when I'm trying to explain something. I'll be like, this conversation is really difficult. I'll say, let's go to a whiteboard.
Yeah. And if I could have an LLM start to, take User stories or product requirements, plus a whiteboard for conversation. And I can do that. And it's interpreting all of that and then generating software or altering software for me on the fly. Why, like, why would it ever unminify the JavaScript, right?
Like
Deep: you
Austin: could just work inside of that minified language. It doesn't need,
Deep: or maybe it's rendering it just to talk to you.
Austin: Exactly. It doesn't need to have a variable name that is. I always say the hard parts of computing or naming things and stuff like that. LLM doesn't necessarily need to name anything until it needs to communicate back to you what it does,
Deep: but to
Austin: itself, it doesn't need that.
Deep: That's an interesting concept I'm used to thinking of it as a way to communicate with a computer, but I'm not used to thinking of it as a way to communicate with humans regardless of the computer.
It's just kind of a weird concept, but I hear what you're saying. It makes sense, right? Like, At some point, we'll get there. Right now, I do use LLMs to write code, I'm almost where you are with writing, but with code, it's just not worth my energy for it to deal with a stupid machine to generate my code.
But I do, because I find that my brain gets into a different loop where I'm just like, I go into like prompt, torque, give me something, throw it in, execute it mode, I find it hard for me to determine when is the point at which I need to bail on this thing or commit to whatever the last rendering is and just start actually writing code.
I feel like that decision point is not, Always obvious to me. And sometimes I sink too much time babbling with the LLM. And sometimes I jump in too early in the code and waste too much time.
Austin: I grew up pair programming, so I worked in companies that always did paired programming and , it reminds me of pairing with a mid junior developer and I don't know any of the frameworks they know, but we're supposed to be working in those frameworks. So I know how to write software. They know the kind of, they know all the docs, they know the APIs, they know that kind of stuff.
And so we're having a conversation, but I'm only pulling what I want from it and they might write some code, but then I'm like, okay, let's change it to be this. Like it feels very iterative when I work with versus asking it, To generate a whole application or a whole component or something like that.
It's my pair and it's actually really cool as a ceo because I don't get a pair all the time i'm doing ceo y things and so then I can sit down and code and I have actually like someone to bounce code off of or something I guess not someone but something to bounce. It's
Deep: interesting that you mentioned this idea of like I can go back and forth on a small thing because I feel like we do that in both programming and in language like we're trying to get it to write well like the way I get the LLM to write well is not write this article.
It's very broken down into a whole chain of specific things. There's like a multiple dialogues just to understand like what's the concept here. there's a dialogue to figure out what's the kind of rough outline look like. And then there is a dialogue to really nail.
Just this one concept. And then usually I'm like, Oh, that's actually three concepts. And then, so it's a, it's very much a back and forth. And I feel like I do the same thing on the code where it's if I give it too much and say, I don't want to, I can get all that boilerplate crap, but that's not my issue.
My issue is this little thing. Let's just talk about that.
Austin: It's like that. So I grew up pair programming. And doing like really rigorous test driven development, like that red and green refactor and like those two skills seem to translate really well for me into how do I pair with AI and like write, I write a test and we iterate on that.
How do we make that test pass? And then we talk about refactoring out and they're like, Oh, that, AI can think of abstractions or I can, and then we move on, but you're not generate a component. You're like, all right, let's the 1st, we have a user story. Let's break it down into these tests. Let's start working through this, red, green refactor the go from integration down to unit and then back up, through that.
And then the green,
Deep: do you find that people are like, cutting back on the whole pair programming thing now that they can, that they've got the bot being their pair. Or do you see, or do you see the bot showing up with them? Like how do you how is that happening from here?
Austin: My teams have the bot showing up with them.
So it's like a third pair. We pair a lot as a company. We do remote pairing. We pair in person or our clients will send their developers into our offices to pair with us. There's still no replacement for that. but they add this co pilot as part of that conversation. And Where people have talked about it being awesome. Is it just has, it has this incredible knowledge of the working frameworks. The worst thing in the world is doing like spring boot auth i, every app needs it. You implement it once the next time you have to do it, it is like incredibly painful.
You get it in there and then you're done with it. You never think about it again with an LLM. It's Oh God, it's not painful this time. Like we can just drop it in and then the LM helps me get it through. Like it has this like deep knowledge of the framework, the context, like all that stuff that makes it just like it easier.
And I've loved that.
Deep: It's interesting. I haven't committed to like using Copilot and stuff. Like I've mostly been cut pasting chunks of code into. prompt windows and like going back and forth that way. And I've, yeah, and I've been told that I need to like commit to an IDE that's got native, AI stuff in it, but I haven't done, I haven't done that yet.
Austin: We co pilots probably though. I use them cause they're like the brand, but also I just mean a co pilot. It's probably our least favorite.
Deep: What's that? What's your favorite developer tool
Austin: for Python? cursor is really great. The kind of VS code fork with Sonnet tends to be really great.
Oh, one gives really good coding answers, but it's just so slow. Like it has to go through that whole chain of thought. So people don't like it. Sonnet's faster and gives close enough answers that are just as good. Some of our customers require us to like work behind a firewall or in an air gapped environment for any of their code.
And so we can't use Sonnet or any of those models and we'll deploy. Llama or deep seat coder, run that on our own infrastructure behind a air gapped environment. And then we use a continue. dev. It's a plugin for IntelliJ or like JetBrains IDs, or it works for VSCode. And it has the same like autocomplete and chat built into it.
And you can. at files. So you like reference a file, you'll say like, ah, and then like this file and this file, so it shoves those into the context window as it talks. So it can reason about both of them and change both at the same time and generate a diff and stuff like that.
That's what I've liked.
Deep: So I, I usually like ending on like a future note, so if we jump out like five or ten years, into the future, where do you think all this stuff's heading?
Austin: I have this talk I've given on this a few times and where I think we're going with. LLMs or AI kind of, especially in the developer space. So this is very specific to the developer spaces, like all of the best practices we have in software development, solid principles, abstraction layers, programming languages, having they're nice to use because of this reason or that reason are to make it easier for people to reason about like infinitely complex systems.
And I think especially within a decade, we will have an entirely new set of best practices to interface with a computer and program them. Solid principles will be a thing of the past. And part of me thinks even now, like five years from now, we'll be writing, I'll be writing Python, remembering how great Python was, but like in my job, there will be no language that looks like Python anymore.
Deep: And what do you think it will be replaced with just like some like natural language or
Austin: I don't know if it's natural language or a like a more logic based language that doesn't have the esoteric syntax and if statements and all the other stuff that it has to exist in a programming language because it's so in malleable.
Deep: There's something a little more pseudo code oriented, right? Like
Austin: exactly. Or even like a no code editor where you're dragging and dropping stuff together where it can generate much, much more powerful machine code than we. can now like all the no code solutions now are pretty bad, but I think they could get substantially better.
Deep: What's quite a related question for that? Like, when I rewind, I've been doing this for a while 2, 3 decades ago, there was this whole hoo ha about like visual coding environments and you're going to drag and drop and you're just going to connect stuff together and make these little circuit diagrammy things and you're not going to have to deal with all this.
Code stuff. And at some point, it seems every time we come up with this concept the goal being to get less and less experienced with code folks to interact with this stuff, but the complexity is always still there and that you still have to have a very programmery mindset to be able to deal with it.
Do you think that will be the case in this world or. I
Austin: think you'll still have to have like engineer brains manage the complexity and the logic trees and stuff like that. They'll have to take what is like a. a human, like a PM's requirements or user stories and turn those into a series of logical decisions.
But dealing with complexity and summarizing it is like one of And LLM is like greatest abilities. And so the ability to present a simpler context to a future state programmer is something it could do really well and present only a picture that's important to them. And then have that continually, reveal itself as the programmer kind of instructs a system to do more different things.
I still think logic and if this, then that will have to exist, but it'll be more pseudo Cody. It can be like
Deep: the way we think of none of us really, very few of us work in machine code or assembly anymore. It's and actually relatively few of us even work in C or, manage memory anymore, like a lot of those kinds of, it feels like we're just going up a layer,
Austin: but it feels like a
Deep: big jump up,
Austin: it does.
And it, I think it will be, cause it does feel so much more different, It feels like a big jump, the introduction of more memory, more processing power. Let us do things like have better language to interact. Yeah.. But this is not a more memory, more processing power.
This is a totally different interface in which we can interact with these things.
Deep: And it feels like a really, hot PhD topic for some for somebody, right? To just go to design a language that works well with LLMs hat's very high level that feels interesting to me. And then the other thing we talked a lot about was this whole human in the loop, customer service, all that kind of stuff.
Where do you think that is in five to ten years out,
Austin: that one's tougher for me. There's a tweet I really like by an old friend who's at Grok, and he essentially says the call centers are effectively token factories.
Deep: Okay, what does that mean?
And.
Austin: Where like a call center is producing a lot of tokens. Like the tokens they produce are like, we have a bunch of knowledge. We have a bunch of people, they're spitting out that knowledge in a relevant way to people. And we're trying to do that at the lowest cost possible.
We find geographies and we find lower costs of living areas to, to create those factories. I think the AI is a new step in that, It's a part of that token factory. Like it, it generates and pushes out tokens just the same way. A lot of low skill labor do low skill thought oriented labor do.
So it's not going to replace a construction worker, building a building, but it will, email based customer service, I think very quickly will disappear or be, especially the L one lowest level of support.
Deep: Yeah, I think that's the right way to think of it, right? Like we're moving up the chain and human skills that are required and Humans do a lot of things really well that machines are terrible at like we can Reason off of relatively little information.
We can learn off of relatively little information, like no human, competes with a, an LLM based on reading out the stuff it read because it read everything and we don't do that. And we're also really good empathy machines not all of us, some of us are the opposite, but I think that's probably Yeah, making somebody feel connected to you.
There's so much of that's in the tone and how we interact, but I don't know, part of me feels like I can get the LLMs to be more empathetic than most people I know, it's just, so I don't know, I don't know where it's going sometimes I wonder. Thanks so much coming on This was really fun. This was
Austin: so fun. Thank you.