Using Mux and Gpt-4 for Ai-driven Video Uploads and Title Generation

A video file is selected for upload to Mux, not S3. During the upload, the Inngest development server receives events and creates an asset. Details about this asset can be viewed and the video asset track or transcript can be seen. This transcript is sent to the GPT-4 writer, which begins creating a title and descriptive summary.

This process involves a chain of writing. The initial writing is edited based on provided feedback and then applied. The prompts used in this AI writing request can be viewed in the output of the development server, which is useful for evaluation. These prompts could also be stored for review if desired.

In the code of the source Inngest writer, the process instructions can be overridden. A stack tells the writer what to follow for creating the first draft. The edit prompt provides an 'opinion' and past prompts, like the original system prompt and the AI's response to it, are given as history.

The writer can be instructed for multiple edits and checks like picking the best title from past ones, and formatting the output in JSON. Should there be need for changes in the process, like writing for a different audience, these can be made and replayed at the Mux asset or GPT-4 writer level.

The end goal is to use embeddings to examine all titles produced on Egghead and search through them for input examples. Titles liked can be used as an embedding example, leading to improved topic generation.

Transcript

[00:01] Alright, so on this side I have the playground. This is ShadCNUI. I have this stuff. I'm actually not going to use that. I'm going to use this file input.

[00:12] And what I'm going to do here is pick a video and this one is of reasonable size and I'm going to upload that. It's actually uploading to Mux, not to S3. If I wanted to see progress I can only see that here. But in the background we can see over here in the ingest dev server that we're actually starting to get some events as this upgrade or uploads it's done. So we see over here so we have an asset created we come in there and check that out and see all the details about that Then you get the video asset track, so that's actually the transcript and then it sends to the GPT-4 writer which you can see actually starts kicking in over here because I have PartyKit going And that's running through a chain where we write the first one.

[01:03] And its goal is to write the title and a descriptive summary. It goes to an editor that provides feedback, so this is a different step. Then that edit is then applied, and you can see it does things that I've asked it to in the background also. Over here in the output of, on the ingest dev server, the output of this AI writing requested is all of the prompts that actually got used. So that's really nice if you're doing kind of hands-on evaluation.

[01:32] You could actually store those so you could fully review them, which is pretty smart if you're doing this kind of stuff. You also end up with each of the steps. So here's the step that actually gives us our transcript. So if I go over here, you can see the transcript here. That looks great.

[01:55] And one of the cool things that I can do if I feel like it is come back over here, and this is in the code, so I am in source ingest writer here and we have some instructions that come in that we can override and then we have here's the stack that we're telling the writer to follow. It doesn't always listen to this, but what's cool to me is, you know, like you do this, you send that to the writer for the first draft, and then we have a second prompt that's the editor, and the editor has an opinion, and it comes in here, and we're able to give it history. So we have all of the past prompts. So we have our original system prompt, the primary system writer prompt, the AI response to that, which is another message, and then the editor prompt. And then it comes down here and it's a final.

[02:51] I think this one should be sent to writer for. And one thing I have done in the past is actually have another editor step, right? So you send it to the writer, you send it to the editor, you send it to the writer, and then maybe you have it say, hey, look at all the past ones, choose the best one and give us the title. And then you can also do a step where It's like, hey, now format this in JSON and check it, make sure it's valid and all that. And this is where those prompts come back.

[03:21] What's really cool to me is if I come in here and I'm like, I wanna make some changes. Instead of you writing for a technical audience, you're writing for a clown college, I don't know. I'll come back over here and what I can do then is at the mux asset level or the GPT-4 writer level, I can replay that. And as soon as I hit replay, you'll see, this is PartyKit again, so it's sending a socket so I can get these updates in the UI. But in the background, we have a new ingest function that's executing.

[04:01] So you can go through here. I don't know if it'll actually get into the clown stuff, but probably not. Anyway, so that's done. And you can make changes and go back and forth and you could add steps or remove steps or even do things like load data in. So my next step here is to actually take embeddings and I want to embed every single title that we've ever produced on Egghead and use those to search through topically to see if I can find, you know, like, hey, give me some topics, search those embeddings, and use those titles as input for examples, basically, that they can then use.

[04:41] So then I'll have some examples of titles that we like and that sort of thing. Should be pretty cool.