MKZ Forge: Part 2 -- Current State & Pipeline

Sometimes, a humble Python script is all you need.

We’re going to talk about “Markizano’s Video Forge”

This is a video editing project that started as a hack and evolved into something I use all the time to get my videos posted. For more info about the history, you can checkout the video here.

MKZ Forge has several features, including:

  • Video concatenation
  • Silence detection and removal (for those dead-space moments in the video)
  • Subtitle generation
  • Thumbnail image generation
  • Title and Description generation via AI
  • Built-in web interface so you can kick off the process from a browser (mobile or desktop)

In this video, we’ll step thru the source code a bit to understand the differences between what it was per my last video and what it is today.

As a bonus, I’ll also walk you thru 2 system prompts I use to help me refine my thoughts and output higher quality videos by having AI help me write my video copy, structure my thoughts and give you a better experience than my brain bouncing around from subject to subject.

Let’s get into it.

FFmpeg, A History

First things first, starting off with the premise:

I used to make a video on my phone, send it to my laptop and use ffmpeg to help me process the video into a more compressed format, clipping out parts I didn’t like and was basically workshopping my video process before publishing it. Also, at the time, I didn't realize this was what I was doing, so it's not like I had some grand plan from the start either.

At some point, I got into a groove and found myself making the same changes to a video frequently. So I started automating that. It was really crude at first, using a combination of custom python scripts, Makefiles and a very strange process.

When I had done enough, I decided to take it more seriously and wrote up a proper project for it. At first, it was called “YouTube FFmpeg” or ytffmpeg. I didn’t have a better name for it because I was originally going to use it to process my videos before posting on YouTube.

Boy, was I wrong!

I ended up posting on TikTok way more often!

The Rewrite: v2.0

Anyways, after using this for some time and continuing to build additional functionality into it, I starting having issues being able to maintain it.

I’ll be honest, my first revision was a ton of spaghetti code! As you saw in the previous video.

At some point, I decided to completely rewrite it, almost from scratch and in the process, it got a new name: MKZ Forge! Markizano’s Video Forge!

I also reduced the code down to a toolbox of functions that were called in the final CLI entrypoint rather than a bunch of class objects that all depended on one another thru class inheritance.

This made it much, much easier to work with since if something broke in the pipeline, I could load up a python interpreter and call the methods to get the video processed while making edits to the code at the same time for the next release.

The result: Videos that I could run thru the pipeline and produce pristine content within minutes to post on video supporting platforms like TikTok, YouTube, and LinkedIn. If something broke or didn’t turn out quite right, I could easily inspect the logs from what happened, make the changes and still fairly quickly get a video out the door and make code updates quickly without having a headache.

You’re afraid of AI taking over jobs?

Just wait until you meet an automation engineer who can automate entire jobs away 😏

By the way, if all of this sounds like majik and you want to know how to cast kewl spells like this, you should subscribe because I’m going to continue to post technical deep dives like this as I get into projects and solve real world problems.

The Code

So, with that said, I’m going to jump into the code here and we can review some of the high level things that make this project unique but also useful and if you want, drop a comment about how you think it could work better or even better — submit a pull request with what you think will help out!

videos.py

This is probably the most crucial part of the entire project. It contains the meat and potatoes of modifying videos using the world-renowned video editing software ffmpeg. I used a thin Python library that makes interfacing with it slightly easier than the subprocess module. There’s also really simple methods in here to get video information so I can use it elsewhere. If you notice, there’s little to no boilerplate code and no useless functionality. This project contains as little “what-if” code as possible because it’s a very tightly bounded scope that handles a very specific task: process my videos for me in as few steps as possible.

cli.py

This is a kewl pattern I will continue to use in other projects. It handles the heavy lifting of reading command line arguments from the CLI and serves as a great entrypoint to the application. All of the packaging revolves around this as the starting point to the code; making it easy to follow semantically.

The arguments and configuration extracted from this are then passed as a simple dictionary to all command line entrypoints that get called from this function.

subtitles.py

Code here specifically revolves around generating and manipulating subtitles. Until I get Whisper from OpenAI trained up on my custom dictionary, I have a series of regular expressions that will fix a few oddities around things like “Markizano Draconus”, “Tanninovian” and other words that are not part of the English dictionary that I use and say all the time because they are part of the lore of the story.

By the way, if you’re interested in learning more about the 12-book sci-fi-fan series I’m writing, checkout story.markizano.net; link in description.

genimg.py

This is where I abstract some of Google’s libraries away for image generation. I tried my best to keep it pinned to OUI (OpenWebUI), but OUI just wasn’t there at the time of writing. So I went with Google’s Image generation API. This works great since I already have a Google Workspaces account and billing already integrated. So yes, each video thumbnail costs me a few cents to generate (sometimes a few more cents if I don’t like it and need to re-generate it). For me, it works well!

grive.py

I will admit, this project is mostly hand-coded. However, this is one of the few things I delegated to AI to have it write up. I didn’t really want to go read the Google docs in that much detail since I understood them from a high level enough to just ask this to be created and it did a somewhat decent job. In this module, we handle the rough edges with interfacing with Google Drive since I had direct upload issues when testing browser-to-python integration. In this way, I can upload from my phone to my Google drive, then when I kick off the process from any browser, it downloads from my G-Drive to the local machine to perform processing on the video.

For this, I will admit, I debated using a locally mounted fuse drive using rclone vs using native Python libraries to talk to the Google API’s. I decided to integrate it directly to reduce the amount of non-Python dependencies required to get this up and running.

metadata.py

This is fairly straightforward: Manage video metadata.

One of the things about this is it connects to a configured LLM endpoint and will send the generated subtitles (in TXT format) from the video to the LLM with the instructions to generate the title or description. It’s not perfect; sometimes it’ll generate a multi-paragraph long summary of the video for the title when I explicitly ask for 2-5 words; tho for the most part it works great!

utils.py

Generic utilities go here. If I used something repeatedly in more than one module, I put that code here.

webserv.py

This is where the baked-in server is and how you can run mkzforge serve to get a locally running web server instance. I use CherryPy because I was most familiar with it, but I’m open to Bottle, Flask, or whatever the hottest Pythonic web server is these days. The core function here is the upload happens until the file is received, then the rest of the video processing happens async in a subprocess. I chose subprocess over async/await:

  1. I hate how the normies screwed up Python by bringing JavaScript nonsense into it.
  2. I don’t understand async as well as I should and probably never will.
  3. I wanted a process not a thread. I know Python isn’t the greatest at threads, plus threads are meant to be run in a single core. This kicks off a process that uses multiple cores.
  4. I wanted the video editing to continue even if the server was stopped for the current one in the queue.

It’s a fire-and-forget type of setup. An AWS SNS message is sent upon completion (error or success) so I know when a video is ready for me to download or investigate.

notify.py

Handles the rough edges of authenticating to AWS and sending the SNS notification.

Open to other methods of notify here as well!

types.py & const.py

These are just ancillary Pythonisms I use — store types and environment constants in their own modules so I can reference them from where ever I am in the code.


So that's a summary of the code itself. It's really not much to it. Only ~2700 lines of code, more or less. The stuff under the cli/ folder combines these interfaces in a way that makes it possible to call from the command line.

What follows next is stuff that I created outside of the code. I still had to create videos. So my pipeline includes steps that go from idea to recording. Perhaps at some point, I'll bake these into the web interface so video copy can be generated all in one solution.

For now, these are system prompts I created to put into OpenWebUI to help organize and package my thoughts before producing a video. Sometimes I read them, sometimes I riff from them.

The Muse

I was also facing an internal circumstance: The inner critic!

I have so much content on draft, partially recorded thoughts or musings or things I think about when I don't have a shirt on and none of that gets posted. My inner critic is real, so I personified him.

The Muse is a system prompt I generated and I put it in a folder in OUI. The result: I get questions instead of criticism. My thoughts get battle tested before proceeding to the next step. Instead of videos dying on draft, they end up getting thought thru with more complete and coherent structure.

The Muse outputs a structure I take to The Packager.

The Packager

This is a second system prompt I generated to help me stay on track. It "packages" my video into something I can actually record. I spend less time thinking about my next words and just say them.

It also structures my thoughts in a way you can follow me. Instead of me jumping around in some half-baked idea, I now have a refined idea packaged in a way that gets more views than I ever did by myself.

Because of this pipeline, I get more content produced more regularly than ever before.

Before having these two system prompts, I would record, draft, judge, delete.

Now, I write, refine, package and produce.

Result

The result is incredible… I get both CLI and Pythonic interfaces that are simple and easily used to manipulate videos.

There was a time when I was working on the video concat piece and was nearly writing up the commands in the interpreter from memory instead of leaning on my IDE auto-complete or asking AI to help me do it. It was very empowering!

The Takeaway

What I want you to walk away from this video is a sense that you can write your own stuff. Code is not that complicated. As you can see, this library is barely under 3k lines of code and handles 90% of the use cases I have with regards to video processing.

You don’t need a fleet of AI agents. You don’t require millions upon billions of tokens. Sometimes, just a humble Python script is enough.

If you found this helpful, don't forget to like on your way out and I'll see you on the next one.

Comments

Popular posts from this blog

Setup and Install Monero(d) -- p2pool -- xmrig

Subversion Tutorial

Connect to Wi-Fi in Linux using `wpa_supplicant` and `dhclient`