Back to Articles
Jan 21, 20264 weeks ago

It's time for agentic video editing

JM
Justine Moore@venturetwins

AI Summary

Key Insights Three technical developments enable agentic editing: 1) Vision models (e.g., Gemini 3) can now process long videos to understand content, 2) LLMs can act as true agents using tools like Blender, and 3) Image/video generation models have reached sufficient quality for hybrid AI/filmed pipelines. Future AI editors will perform five key functions: Process (organize footage), Orchestrate (coordinate multiple AI models), Polish (fix audio, lighting, filler words), Adapt (reformat for different platforms/languages), and Optimize (apply editorial "taste" to pacing and story). The ultimate goal is not just task automation but collaboration: an agent that can craft draft edits based on objectives, then iterate based on human direction (e.g., "make the ending hit harder"). This shift will create a hybrid production future, mixing filmed content (e.g., interviews) with AI-generated elements (e.g., B-roll, animations), all coordinated by agents. The impact will be a dramatic increase in both the quality and quantity of video content, empowering more creators to tell stories that would have been hindered by the high skill and time barriers of traditional editing.

2025 was unquestionably the year of video. AI-generated ads went mainstream, launch videos from seed-stage startups got millions of views, and video podcasts and interviews exploded.

What you didn’t see was all the editing work behind the scenes. Cutting 90 minutes of footage into a three-minute short. Correcting lighting and audio in post-production. Searching for the right music and sound effects.

A common rule of thumb in video production is that you’ll spend 80% of your time & energy on editing, and only 20% on filming (or now, generating). Crafting compelling video is a long and tedious process – and few people have the “taste” to do it right.

We now have the technology to hand over some of this work to AI agents. These agents will blow out the supply curve for quality video – the kind of content that requires days (or weeks) from professional editors today. What Cursor did for coding, these agents will do for video production.

Why now?

There’s immense demand for agents that give anyone the skills (and taste) of a professional video editor. So why don’t these products already exist? There have been a few recent developments that have unlocked progress:

Vision models can process long video. You have to understand video before you can edit it. This is a non-trivial challenge. We’ve seen a lot of progress with recent models like Gemini 3, GPT-5.2, Molmo 2, and Vidi2, which are inherently multimodal and have longer context windows. Gemini 3 can now process up to an hour of video! You can upload it as an input and ask the model to generate timestamped labels, find a specific moment, or just summarize what’s happening.

Models can now use tools. AI video editors need to be able to take action - not just describe what’s happening or suggest changes. We’re starting to see meaningful progress around LLMs as real agents that can use tools. One of my favorite examples of this is Claude using Blender (a notoriously tricky product that many humans haven’t mastered). You can imagine how this evolves as you give agents access to more tools.

Image and video generation models have improved. I’m a big believer that many video production pipelines will be hybrid - a mix of AI and filmed content. Imagine filming interviews for a documentary, but generating B-roll or historical footage with AI. Or using a motion transfer model to take a reference animation and apply it to a real character. For any of these things to work, models needed to reach a level of quality & consistency to be valuable. Now, that’s finally happening.

What will these agents do?

Process - whether you’re filming or generating, you’ll likely end up with much more footage than you need (sometimes by a factor of hundreds). It’s time consuming to sort through all this video, organize it, and decide what to use. Products like Eddie AI can take hours of footage and identify A- vs. B-roll, process multiple camera angles, and compare takes.

Orchestrate - if we assume many videos will include some element of AI in the future, we’ll need agents that orchestrate all of the models. Imagine you want to add an AI animation to an educational video. You’ll need an agent that can generate the images, send them to a video model, and stitch the outputs together. Products like Glif are launching agents that coordinate between multiple models on a user’s behalf.

Polish - small details take a video from good to great. But if you’re not a pro, you may be overwhelmed by the flood of tasks needed to polish a video. For example - adjusting lighting between clips, cleaning noise out of the audio track, or taking out filler words (“ummms” and “uhhhs”) during an interview. Products like Descript’s Underlord agent can take a video, make all these changes for you, and deliver the final version.

Adapt - when you make a good video, you should adapt it for more reach. A common workflow is cutting a YouTube podcast into short clips with different aspect ratios to post on your X, Instagram, and TikTok accounts. Or even translate a video into other languages (and re-dub the speakers) to reach an international audience. Platforms like Overlap allow you to set up node workflows for these adaptation tasks.

Optimize - the ultimate goal isn’t just replacing manual tasks with AI. It’s agents with taste that can make your videos better. There’s a reason people hire professional video editors! They spend years learning things like how to hook viewers, pacing a storyline, and using music to build an emotional reaction. YouTuber Emma Chamberlain famously said that she used to spend 30-40 hours editing a ~15 minute vlog.

What if an AI agent could watch your footage, ask about your objectives, and then craft a few draft versions of a video for you to iterate on? You review and direct - “The opening is too slow.” “Cut the middle section.” “Make the ending hit harder” - and the agent executes.

In conclusion...

I couldn't be more excited for the future of AI editing agents, as both a creator and consumer. They're going to dramatically increase the quality and quantity of video we see. When you give everyone the skills of a professional editor, you're going to get a LOT more stories that may not have otherwise been told.

I've been spending a ton of time testing all of the video agents that exist today - if you want to see demos of these products, check out my full post on the @a16z Substack.

And if you're building something here, please reach out (@venturetwins or jmoore@a16z.com). Bonus points if your agent can edit my video ⬆️

By
JMJustine Moore