Nailing down the metric on my AI Agent workflow (AI changing work)

Build in Public · Feb 02 · Episode 37
all right this is my build and public all right this is my build and public all right this is my build and public log it’s January 27th log it’s January 27th log it’s January 27th 2025 uh quick update for today this will 2025 uh quick update for today this will 2025 uh quick update for today this will be a short and sweet one uh yesterday I be a short and sweet one uh yesterday I be a short and sweet one uh yesterday I I I took almost the entire day off I I I I took almost the entire day off I I I I took almost the entire day off I I just needed a a day off to uh rest and just needed a a day off to uh rest and just needed a a day off to uh rest and recover I did work a little bit I did recover I did work a little bit I did recover I did work a little bit I did one little experiment uh but nothing one little experiment uh but nothing one little experiment uh but nothing serious um so I’ve been making good serious um so I’ve been making good serious um so I’ve been making good progress on this last little bit here progress on this last little bit here progress on this last little bit here for my AI for my AI for my AI agent uh that is collecting the metrics agent uh that is collecting the metrics agent uh that is collecting the metrics so first of all what metrics do you want so first of all what metrics do you want so first of all what metrics do you want to collect I mean there’s the really to collect I mean there’s the really to collect I mean there’s the really obvious ones how many tokens you’re obvious ones how many tokens you’re obvious ones how many tokens you’re using on the input and using on the input and using on the input and output uh was it successful you know did output uh was it successful you know did output uh was it successful you know did it fall back uh and have to like try it fall back uh and have to like try it fall back uh and have to like try to uh use normal code to try to like to uh use normal code to try to like to uh use normal code to try to like figure it out uh did it fail so how you figure it out uh did it fail so how you figure it out uh did it fail so how you know you you want to be able to look and know you you want to be able to look and know you you want to be able to look and see oh this agent is failing like 50% of see oh this agent is failing like 50% of see oh this agent is failing like 50% of the time it’s useless I think I have a the time it’s useless I think I have a the time it’s useless I think I have a good good good uh pattern to to do this um I spent some uh pattern to to do this um I spent some uh pattern to to do this um I spent some time thinking time thinking time thinking about how does H like you as a user about how does H like you as a user about how does H like you as a user let’s say you’re going to use this agent let’s say you’re going to use this agent let’s say you’re going to use this agent and drop it into into your Cod your and drop it into into your Cod your and drop it into into your Cod your project how would you want to interface project how would you want to interface project how would you want to interface how would you want to uh record all the how would you want to uh record all the how would you want to uh record all the all like for example all like for example all like for example errors update like errors update like errors update like what the database where all the errors what the database where all the errors what the database where all the errors live do you want to ignore them do you live do you want to ignore them do you live do you want to ignore them do you want to track them in Sentry you know want to track them in Sentry you know want to track them in Sentry you know how how do you want to what do you want how how do you want to what do you want how how do you want to what do you want to update you know so the same goes for to update you know so the same goes for to update you know so the same goes for all the metrics I am building it in a all the metrics I am building it in a all the metrics I am building it in a way that way that way that is like I put that your hands you you is like I put that your hands you you is like I put that your hands you you write you write the method that actually write you write the method that actually write you write the method that actually is responsible for doing something it is responsible for doing something it is responsible for doing something it will log it if you have the logger will log it if you have the logger will log it if you have the logger turned on uh but you got to do that so I turned on uh but you got to do that so I turned on uh but you got to do that so I I think that’s a good middle ground I think that’s a good middle ground I think that’s a good middle ground because I mean what else can I do I I because I mean what else can I do I I because I mean what else can I do I I can’t write I don’t know what the users can’t write I don’t know what the users can’t write I don’t know what the users using or wanting to use a database maybe using or wanting to use a database maybe using or wanting to use a database maybe you want to store it in reddis maybe you you want to store it in reddis maybe you you want to store it in reddis maybe you want to log to grafana I don’t know want to log to grafana I don’t know want to log to grafana I don’t know there’s a million different ways to do there’s a million different ways to do there’s a million different ways to do it so part of uh My Philosophy with it so part of uh My Philosophy with it so part of uh My Philosophy with building uh a library or you know for building uh a library or you know for building uh a library or you know for example like a like a AI agent framework example like a like a AI agent framework example like a like a AI agent framework My Philosophy My Philosophy My Philosophy is make is make is make it uh as it uh as it uh as extendable as possible for the user the extendable as possible for the user the extendable as possible for the user the end user uh whether whether you know you end user uh whether whether you know you end user uh whether whether you know you over override a method you listen to an over override a method you listen to an over override a method you listen to an event to fire um there’s so many ways to event to fire um there’s so many ways to event to fire um there’s so many ways to do this but I believe that leaving it do this but I believe that leaving it do this but I believe that leaving it into the developer hands is is the best into the developer hands is is the best into the developer hands is is the best way uh just make it as sane and simple way uh just make it as sane and simple way uh just make it as sane and simple and easy to override and hook into and and easy to override and hook into and and easy to override and hook into and interface with as possible so that’s interface with as possible so that’s interface with as possible so that’s what I’m trying to do uh I’m I made a what I’m trying to do uh I’m I made a what I’m trying to do uh I’m I made a good progress I’m working on just good progress I’m working on just good progress I’m working on just testing really testing and recording um testing really testing and recording um testing really testing and recording um by testing I mean actual automated tests by testing I mean actual automated tests by testing I mean actual automated tests so unit tests I got more work to do so unit tests I got more work to do so unit tests I got more work to do after this after this after this video um but I’m I’m feeling happy with video um but I’m I’m feeling happy with video um but I’m I’m feeling happy with like uh my Approach so far so we’ll see like uh my Approach so far so we’ll see like uh my Approach so far so we’ll see you know right now it’s kind of get this you know right now it’s kind of get this you know right now it’s kind of get this whole thing done and working then drop whole thing done and working then drop whole thing done and working then drop it into my projects and start using it it into my projects and start using it it into my projects and start using it and then kind of itating and maybe and then kind of itating and maybe and then kind of itating and maybe making changes in working on this last making changes in working on this last making changes in working on this last piece I shared this diagram in a in a piece I shared this diagram in a in a piece I shared this diagram in a in a previous video so uh check that out if previous video so uh check that out if previous video so uh check that out if you’re curious about the whole flow of you’re curious about the whole flow of you’re curious about the whole flow of the agent and the the agent and the the agent and the workflow uh really really quickly for my workflow uh really really quickly for my workflow uh really really quickly for my random story of the day uh I was random story of the day uh I was random story of the day uh I was remembering when the first time I saw remembering when the first time I saw remembering when the first time I saw chat gpt3 chat gpt3 chat gpt3 drop drop drop um funnily enough I I was kind of on on um funnily enough I I was kind of on on um funnily enough I I was kind of on on top of this before the people at my my top of this before the people at my my top of this before the people at my my day job that I had at the time and I saw day job that I had at the time and I saw day job that I had at the time and I saw some demos on Twitter and I shared a some demos on Twitter and I shared a some demos on Twitter and I shared a bunch uh and I remember I was in this bunch uh and I remember I was in this bunch uh and I remember I was in this meeting with uh I don’t know like four meeting with uh I don’t know like four meeting with uh I don’t know like four developers four or five developers no developers four or five developers no developers four or five developers no managers so you know it’s a real it was managers so you know it’s a real it was managers so you know it’s a real it was a real casual conversation we were a real casual conversation we were a real casual conversation we were having and I remember having and I remember having and I remember saying uh look at this demo you guys but saying uh look at this demo you guys but saying uh look at this demo you guys but as of today you now have to use AI to as of today you now have to use AI to as of today you now have to use AI to code or you’ll just be left behind code or you’ll just be left behind code or you’ll just be left behind because everybody else will be using it because everybody else will be using it because everybody else will be using it and you’ll be really slow you’ll be and you’ll be really slow you’ll be and you’ll be really slow you’ll be really inefficient um you’re going to really inefficient um you’re going to really inefficient um you’re going to have to learn how to interface with AI have to learn how to interface with AI have to learn how to interface with AI every day to do your job from this day every day to do your job from this day every day to do your job from this day forward and a couple of people in the forward and a couple of people in the forward and a couple of people in the call kind of rolled their eyes or we’re call kind of rolled their eyes or we’re call kind of rolled their eyes or we’re like okay okay Steve whatever yeah right like okay okay Steve whatever yeah right like okay okay Steve whatever yeah right um I don’t believe you or whatever um I don’t believe you or whatever um I don’t believe you or whatever and yeah I was I was dead right and yeah I was I was dead right and yeah I was I was dead right I mean it’s gotten more and more and I mean it’s gotten more and more and I mean it’s gotten more and more and more more more intense the intense the intense the um I kind of call it the interfacing um I kind of call it the interfacing um I kind of call it the interfacing with AI with AI with AI so you know at first when I did it I you so you know at first when I did it I you so you know at first when I did it I you know did like a lot of people I went to know did like a lot of people I went to know did like a lot of people I went to chat chat chat GPT and maybe I copied and pasted some GPT and maybe I copied and pasted some GPT and maybe I copied and pasted some code and I tried and tried and I was code and I tried and tried and I was code and I tried and tried and I was like this is stupid uh it’s faster to like this is stupid uh it’s faster to like this is stupid uh it’s faster to just like think think through the just like think think through the just like think think through the problem in in you know work and code As problem in in you know work and code As problem in in you know work and code As We have forever this is so much slower We have forever this is so much slower We have forever this is so much slower to try to coax an answer out of AI copy to try to coax an answer out of AI copy to try to coax an answer out of AI copy and pasting all this crap I was like and pasting all this crap I was like and pasting all this crap I was like this is stupid so at first I was like this is stupid so at first I was like this is stupid so at first I was like not sold on it and I I kind of would not sold on it and I I kind of would not sold on it and I I kind of would only go to AI if I was stuck on only go to AI if I was stuck on only go to AI if I was stuck on something and I just kept coding as something and I just kept coding as something and I just kept coding as normal for the longest time but now with normal for the longest time but now with normal for the longest time but now with cursor and you know co-pilot and agents cursor and you know co-pilot and agents cursor and you know co-pilot and agents now especially like the agents built now especially like the agents built now especially like the agents built into cursor uh cursor cursor is into cursor uh cursor cursor is into cursor uh cursor cursor is killing it by the way uh if you don’t killing it by the way uh if you don’t killing it by the way uh if you don’t use cursor and you code and you’re use cursor and you code and you’re use cursor and you code and you’re coding AI stuff you got to get on cursor coding AI stuff you got to get on cursor coding AI stuff you got to get on cursor it’s the hype is real so at first even I it’s the hype is real so at first even I it’s the hype is real so at first even I thought the hype for cursor was nonsense thought the hype for cursor was nonsense thought the hype for cursor was nonsense because I tried it out you know and it’s because I tried it out you know and it’s because I tried it out you know and it’s like kind of it was a little step above like kind of it was a little step above like kind of it was a little step above copying and pasting okay and oh okay now copying and pasting okay and oh okay now copying and pasting okay and oh okay now they have this apply thing okay cool they have this apply thing okay cool they have this apply thing okay cool but when the Adent got released I was but when the Adent got released I was but when the Adent got released I was like okay holy this is yeah this is like okay holy this is yeah this is like okay holy this is yeah this is something different this is like pretty something different this is like pretty something different this is like pretty magical so in my iteration Loop like for magical so in my iteration Loop like for magical so in my iteration Loop like for let’s let’s say for example um okay I let’s let’s say for example um okay I let’s let’s say for example um okay I want to like test uh recording errors to want to like test uh recording errors to want to like test uh recording errors to this thing I pass into the this thing I pass into the this thing I pass into the class my Loop is like now no I wanted to class my Loop is like now no I wanted to class my Loop is like now no I wanted to do this or I’ll write so write some do this or I’ll write so write some do this or I’ll write so write some comments and write a little bit of code comments and write a little bit of code comments and write a little bit of code and then tell the agent and and then tell the agent and and then tell the agent and it changes the code and changes the unit it changes the code and changes the unit it changes the code and changes the unit test and then it says run the unit test test and then it says run the unit test test and then it says run the unit test like I I I have to I am like you can’t like I I I have to I am like you can’t like I I I have to I am like you can’t run anything like I run it you you let run anything like I run it you you let run anything like I run it you you let me click the button and I click go it me click the button and I click go it me click the button and I click go it runs the test it sees the output and runs the test it sees the output and runs the test it sees the output and goes oh yeah and then it fixes it press goes oh yeah and then it fixes it press goes oh yeah and then it fixes it press go again run the test and then it like go again run the test and then it like go again run the test and then it like changes all the code and the the loop changes all the code and the the loop changes all the code and the the loop and that time to iterate and use the AI and that time to iterate and use the AI and that time to iterate and use the AI has closed so much in just that like has closed so much in just that like has closed so much in just that like what a year two what a year two what a year two years it’s crazy so it’s only increasing years it’s crazy so it’s only increasing years it’s crazy so it’s only increasing it’s only going to get better and I you it’s only going to get better and I you it’s only going to get better and I you know this is the future unfortunately I know this is the future unfortunately I know this is the future unfortunately I sometimes hate it sometimes they just sometimes hate it sometimes they just sometimes hate it sometimes they just turn the ai ai off you know I’ve been I turn the ai ai off you know I’ve been I turn the ai ai off you know I’ve been I I’ve been coding professionally like I’ve been coding professionally like I’ve been coding professionally like getting paid to code for 18 years yeah getting paid to code for 18 years yeah getting paid to code for 18 years yeah I’ve had a lot of breaks and sabatical I’ve had a lot of breaks and sabatical I’ve had a lot of breaks and sabatical and and and but I I just you know I want to just but I I just you know I want to just but I I just you know I want to just code freely you know and uh I I I know code freely you know and uh I I I know code freely you know and uh I I I know software engineers and they’re like we software engineers and they’re like we software engineers and they’re like we can’t do that at work we have to use the can’t do that at work we have to use the can’t do that at work we have to use the AI we have to go really fast so pretty AI we have to go really fast so pretty AI we have to go really fast so pretty crazy stuff and I I’ll never forget that crazy stuff and I I’ll never forget that crazy stuff and I I’ll never forget that day when I told them on the call I was day when I told them on the call I was day when I told them on the call I was like this it’s it our job is different like this it’s it our job is different like this it’s it our job is different after today onwards and yeah sure I after today onwards and yeah sure I after today onwards and yeah sure I was I was right I mean it seemed pretty was I was right I mean it seemed pretty was I was right I mean it seemed pretty obvious to me why some of them were obvious to me why some of them were obvious to me why some of them were obviously disagreeing with me some obviously disagreeing with me some obviously disagreeing with me some people were even hostile to when I when people were even hostile to when I when people were even hostile to when I when they made that comment I made that they made that comment I made that they made that comment I made that comment outside of the call like more comment outside of the call like more comment outside of the call like more generally and some people were pretty generally and some people were pretty generally and some people were pretty hostile I just rolled my eyes at them I hostile I just rolled my eyes at them I hostile I just rolled my eyes at them I was like yeah right you’re you’re was like yeah right you’re you’re was like yeah right you’re you’re delusional if you don’t see what what’s delusional if you don’t see what what’s delusional if you don’t see what what’s coming ahead like how do you how do you coming ahead like how do you how do you coming ahead like how do you how do you not see like what’s GNA happen here in not see like what’s GNA happen here in not see like what’s GNA happen here in one year two year and we’re almost two one year two year and we’re almost two one year two year and we’re almost two years out now yeah the world looks a lot years out now yeah the world looks a lot years out now yeah the world looks a lot different now so this is all I do is AI different now so this is all I do is AI different now so this is all I do is AI stuff now so let’s hope I can get paid stuff now so let’s hope I can get paid stuff now so let’s hope I can get paid for it pretty soon that’s all I got for for it pretty soon that’s all I got for for it pretty soon that’s all I got for today uh see you tomorrow

Description

Today’s struggles:

  • Self-hosting configuration issues
  • Troubleshooting trigger.dev
  • Infrastructure challenges

Summary

Summary of the Video: Build and Public Log. The speaker provides an update on his AI agent framework that collects and logs metrics, detailing its extensible design and testing approach. He explains how the framework tracks token usage, errors, and integrates with various logging tools. He also reflects on the evolution of AI coding tools from GPT-3 to Cursor and the growing necessity of AI in development.

AI with Steve build in public

Subscribe to stay up to date