Photo & Video Production · 2026-05-07 · 13 min read
Photo and Video Production May 2026: Sora 2, Veo 3 and the Hybrid Workflow for the Mid-Market

David George
Co-Founder & Creative Director, Vincency
In May 2026, the key question in brand production for mid-market clients is no longer whether generative AI works. It does. The question is where in the pipeline it delivers savings — and where it harms the brand if you deploy it. Drawing on 60 brand productions over the past 18 months, I'll share here what the mid-market workflow actually looks like in May 2026 and where its limits lie.
Up front: the answer "always hybrid production" sounds boring. It's correct nonetheless. Anyone betting exclusively on generative AI in May 2026 builds visuals that look good and sell nothing. Anyone producing exclusively the classic way pays for pixels that a machine now delivers too. The value-creating position is the middle — and it can be described in concrete terms.
What the generative video models can actually do in May 2026
The three platforms that are currently relevant in our production practice: Sora 2 (OpenAI, publicly available from Q1 2026), Veo 3 (Google, since late 2025), Runway Gen-4 (since early 2026). From practice, four statements can be made that did not hold in early 2024.
First: cinematic quality on short clips is reality. 8 to 12 seconds, 4K, with consistent lighting and believable camera movement. What still looked like "AI" in 2024 — the hands, the eyes, the unrealistic fabric physics — has vanished in most cases. On static scenes with a clear subject, the models in May 2026 are barely distinguishable from real camera footage.
Second: consistency across multiple clips remains the main problem. One and the same person, the same product, the same setting — if that has to hold across ten cuts in a campaign, the models still break down. Reference conditioning helps (all three platforms support it), but consistency degrades with every generation within a sequence. An advertising brand film with recognizable people cannot yet be produced purely generatively in May 2026 without the brand recognition value suffering.
Third: real spaces remain genuinely superior. When a campaign shows a specific, actually existing place — a particular practice in Munich, a specific law firm in Frankfurt's Westend, a Hamburg penthouse with a view of the Alster — the camera almost always wins. Generative models can produce "a German law firm," but not "this law firm." For premium marketing in the mid-market, this specificity is often the real selling point.
Fourth: brand voiceover is human again. Generated voices are technically excellent in May 2026 — but in the premium B2B segment, we saw a clear consumer backlash against recognizable AI voices in 2025. A human German voice talent with brand character costs 600 to 2,500 euros for a brand spot voiceover. This investment pays off for brands that have authenticity as a pillar in the DACH market — and in May 2026 that's nearly all premium mid-market companies.
The hybrid workflow as of May 2026
What does a concrete production look like when we at Vincency set up a mid-market brand campaign in May 2026? Take the example of a premium real estate brand with an average property value of 2.4 million euros in Hamburg — comparable to one of our 2024 client cases.
Phase 1: concept and storyboard. In May 2026 we use generative storyboarding tools (Midjourney 7, occasionally Krea or Imagine) to generate around 80 variations per key scene within two to three days. Where in 2022 we still worked with pencil sketches, a broad exploration now runs at daily-rate cost. From the 80 variations, we narrow down with the client in a 90-minute session to three to five storyboard drafts that go into production.
Phase 2: real production on location. For premium real estate campaigns, this is the central pillar that cannot be replaced. We film the real properties with a cinema setup — typically an ARRI Alexa Mini LF or a RED Komodo, depending on budget — over two shooting days. Plus our own drone footage for establishing shots of the city and the location. This phase delivers the material that sells the brand: specific spaces, specific lighting situations, specific atmosphere. Generative tools still cannot replace this in May 2026.
Phase 3: augmentation with generative AI. This is where the generative layer comes in — but very focused. Examples from the Hamburg real estate campaign: a real shot of the penthouse terrace is combined with a generatively extended sky plate, because the real shot was captured on a grey day and the client wanted a clear summer atmosphere. A drone shot gets an additional, generatively produced bird sequence that adds movement and emotion. An interior shot is extended with a generative 4-second transition to a different camera position, which would save two additional shooting days — the transition is consistent because both endpoints are real.
Phase 4: postproduction. Here the stack is the classic DaVinci Resolve plus selective AI tools — Topaz Video AI for upscaling and stabilization, RunwayML for targeted element removal, Gemini Vision for automatic logging of the shoot-day material. The creative decisions — color grading, editing rhythm, sound design — remain human. This is where the emotional substance of the brand lies.
Phase 5: delivery across multiple channels. For the 2024 real estate campaign, we still had one main video and three cut-downs. In May 2026 we typically produce between 8 and 14 variations per campaign — platform-specific (LinkedIn, Instagram Reels, YouTube, German B2B portals), format-specific (16:9, 9:16, 1:1, 4:5), and language-specific where an international client target market exists. Generative tools accelerate this variation step massively: from one master cut we produce eight format variants in a fraction of the time it took in 2024.
What has changed in budget terms
Concrete figures from the last ten productive brand campaigns that Vincency delivered in early 2026 — all for German mid-market companies, all premium segment.
A comparable brand film production that cost around 35,000 euros in 2022 typically costs 22,000 to 28,000 euros in May 2026 for the same or better output quality. The saving does not come from fewer shooting days — those stay largely constant — but from three places: a faster concept phase through generative storyboarding (three days instead of seven), smaller postproduction teams through AI-augmented workflows, broader format variations at the same price (instead of an extra-cost line item).
For photo productions, the shift is different. A premium brand photo set (15 to 25 final images, classically one to two shooting days) cost 8,000 to 14,000 euros in 2022. In May 2026 the same output costs 6,500 to 11,000 euros. The saving is smaller here because the human component remains indispensable in premium photography — the photographer's eye, the posing direction, the material selection. AI augmentation comes into play mainly in retouching and background extensions.
Important for clients: the savings are only accessible if the workflow was conceived as hybrid from the start. A classically tendered production brief that gets AI layers added afterward typically costs 10 to 20 percent more than a purely classic production, because the integration friction of the two pipelines eats up the efficiency gains. Hybrid has to be hybrid from the very first briefing hour.
Premium cases: BMW iX and Hamburg Real Estate as a comparison pair
Two Vincency cases from the past year illustrate the two endpoints of the spectrum.
The BMW iX Charging Campaign 2024 — produced before the Sora 2 wave — was a classic cinematic setup. ARRI Alexa Mini LF, three shooting days, a postproduction phase of four weeks. We deliberately used generative AI minimally, because the BMW brand places value on traceable, documentable production authenticity. The result: 2.4 million video views, +38 percent engagement, +27 percent test drive requests. The added effort of the classic production translated measurably into impact.
The Hamburg premium real estate campaign 2024 — produced with the beginning of Sora 1 augmentation — was a hybrid setup. Two real shooting days in the properties, three generative augmentation layers in postproduction (sky replacement, motion elements, format variations), six weeks of production time overall. Result: 14 qualified buyer leads in the first month after launch, +95 percent brand presence in the local Hamburg market, 10 weeks time-to-go-live. In this case, hybrid improved efficiency over a purely classic production by an estimated 25 percent, with no loss of impact.
The lesson: there is no single right workflow. There is a client-specific mix that optimally hits the brand reality, the budget, and the impact goals. At Vincency we decide this in a 60-minute briefing session at the start of every production assignment — and hold to it consistently afterward.
Three recommendations for mid-market companies planning brand production in 2026
First: brief the hybrid question in the first 30 minutes. What share of the final visuals should be authentically real, and what share can be augmented? This question decides the workflow, the team composition, and the budget. If it stays open until postproduction, production gets more expensive, not cheaper.
Second: invest in the real setups, save on the variations. Three cleanly filmed master scenes on location are the investment that creates impact in 2026. From these three master scenes, 12 to 20 asset variations can be produced with AI augmentation — without diluting the brand substance.
Third: document the production authenticity. In the anti-AI-slop climate of May 2026, the "behind the scenes" of a production has itself become a marketing asset. With every Vincency production assignment we deliver a 90-second behind-the-scenes reel that the client can post as a trust signal on their LinkedIn channel. Over the past twelve months, this has performed consistently well.
Conclusion
Brand production in May 2026 is not an either-or between human and machine. It is a mix to be composed concretely, re-balanced for each client. Anyone who answers the hybrid question strategically produces visuals that are honest, atmospherically dense, and budget-efficient. Anyone who ignores the question risks generic output that weakens the brand.
If you're planning a photo or video campaign in 2026 — whether for real estate, private medicine, a law firm, or industrial mid-market — talk to us about the right mix. An initial assessment costs you 30 minutes and delivers a concrete recommendation.
Frequently asked questions about photo and video production
Which generative video models are relevant in May 2026?
Three platforms are relevant in our production practice: Sora 2 (OpenAI, publicly available from Q1 2026), Veo 3 (Google, since late 2025) and Runway Gen-4 (since early 2026). On static scenes with a clear subject, they are barely distinguishable from real camera footage in cinematic quality (8 to 12 seconds, 4K).
When is generative AI enough and when do you need real camera production?
Generative AI is enough for short, static clips and for fast variation production. Real camera production remains necessary as soon as a campaign shows a specific, actually existing place — the models can produce "a German law firm," but not "this law firm." For premium marketing in the mid-market, this specificity is often the real selling point.
What is the biggest limitation of the generative video models?
Consistency across multiple clips remains the main problem. One and the same person, the same product, the same setting across ten cuts still breaks down. Reference conditioning helps, but consistency degrades with every generation within a sequence — an advertising brand film with recognizable people cannot yet be produced purely generatively in May 2026 without the brand recognition value suffering.
What does a hybrid brand film production cost in May 2026?
A comparable brand film production that cost around 35,000 euros in 2022 typically costs 22,000 to 28,000 euros in May 2026 at the same or better output quality. A human German voice talent with brand character adds 600 to 2,500 euros for a brand spot voiceover. The savings are only accessible, however, if the workflow was conceived as hybrid from the start.
Related insights




