Building a Video Engine from Scratch with FFmpeg and Lua
How I built FURIOUS, a custom video editor and effects engine for music-synchronized content, using C++, FFmpeg, and a Lua scripting pipeline.
- C++
- FFmpeg
- Lua
- Video Processing
By Jenn Barosa

FURIOUS started because I was making YTPMVs and otomads and kept fighting my editor. The workflow for this kind of content is really specific: you're placing hundreds of short clips on a timeline, each one snapped to a musical beat. Doing that by hand in Premiere is painful. So I wrote my own thing.
The Pattern System
I took the core idea from FL Studio. In FL, you don't place every note on a giant timeline. You build patterns, then arrange the patterns. FURIOUS works the same way but with video clips. You define a pattern, map clips to beat positions within it, and then arrange patterns in a sequence.
Internally everything is stored as beat offsets, not frame numbers. The engine converts beats to frames at render time using the project BPM. This means you can change the tempo of a project and all the clips shift correctly. In a normal editor you'd have to manually move everything.
FFmpeg for Decoding
I used FFmpeg's libraries (libavcodec, libavformat, libswscale) for all the actual video decoding. Writing a decoder from scratch would have been a waste of time when FFmpeg already handles every format.
The engine keeps a pool of decoder contexts, one per source file, with a frame cache on top. The annoying part was seeking. FFmpeg seeks to the nearest keyframe by default, which isn't good enough when you need frame-accurate positioning for beat-synced content. I ended up doing a two-pass seek: jump to the keyframe before your target, then decode forward frame by frame until you land on the right one. It's slower but correct.
Lua for Effects
All visual effects in FURIOUS are Lua scripts that get loaded at runtime. An effect script receives a frame buffer, the current time in beats, and a parameter table, then writes back to the buffer. You can stack effects and each one gets the previous one's output.
I went with Lua because I wanted people to be able to write and share effects without needing a C++ toolchain. The actual pixel-heavy work still happens in C++ helper functions that are exposed to the Lua VM, so performance is fine.
Memory Pressure
This is where I learned the most. A single 1080p RGBA frame is about 8MB. When you're playing back multiple sources with effects chains, memory adds up fast. I built a frame cache with LRU eviction and added pre-fetch logic that looks ahead in the pattern schedule to load frames before they're needed.
There were a lot of bugs early on where the cache would evict something that was about to be used, causing a stutter while it re-decoded. Getting the heuristics right took more iteration than writing the actual decoding layer.
Keyframing
Effects parameters can be automated with keyframe curves. The system supports linear, bezier, and step interpolation. Curves get evaluated per-frame during render and the interpolated values are passed into the Lua scripts. This is how you do things like ramping a distortion effect in sync with a drop.