Luma Uni-1 Agenten schlagen Nano Banana 2

Luma präsentiert mit Uni-1 eine native multimodale Architektur. Die neuen Luma Agents übernehmen die Erstellung von Text, Bild und Video.

Andreas Becker06.03.26 Luma Uni-1

Kurzfassung Quellen

Luma hat das neue KI-Modell Uni-1 vorgestellt, das nativ für Text, Bild, Video und Audio trainiert wurde.
Im RISEBench schlägt Uni-1 etablierte Modelle wie Nano Banana 2 und GPT Image 1.5 im Gesamtranking.
Die darauf basierende Plattform Luma Agents ermöglicht die automatisierte Erstellung kompletter Werbekampagnen.
Die Agenten können externe KIs wie Googles Veo 3 steuern und bewerten ihre Ergebnisse in einer internen Feedbackschleife eigenständig.

Luma hat mit Uni-1 ein neues KI-Modell präsentiert, das Text, Bild, Video und Audio in einer Architektur vereint. Zeitgleich startet "Luma Agents". Die neue Plattform steuert auf Basis des Modells komplette kreative Workflows völlig autonom. Beispiele mit Prompts ganz unten.

Menschen austauschen

Replace the rightmost man in IMAGE 1 with the man in IMAGE 3, and replace the seated man in IMAGE 1 with the man in IMAGE 2.

Ein natives System für alle Medien

Bisherige KI-Systeme kombinieren oft verschiedene spezialisierte Modelle für Text, Bild oder Video. Luma wählt mit der neuen Architektur einen anderen Weg. Das Modell Uni-1 wurde von Grund auf mit fünf Modalitäten trainiert. Dazu zählen Audio, Video, Bild, Sprache und räumliches Denken.

Dieser integrierte Ansatz ermöglicht es der KI, medienübergreifend logisch zu arbeiten. Uni-1 plant komplexe Ideen zunächst textlich und wandelt diese direkt in visuelle oder auditive Ergebnisse um. Das System versteht emotionale Nuancen in einem Text-Prompt und übersetzt diese passgenau in die Farbgebung eines Bildes, den Rhythmus eines Videos oder die Tonalität einer Stimme. Luma verabschiedet sich damit vom reinen Zusammenfügen isolierter Software-Bausteine.

Starke Ergebnisse im RISEBench

Die technische Leistungsfähigkeit von Uni-1 zeigt sich deutlich in aktuellen Vergleichen. Im sogenannten RISEBench, der die kognitiven und generativen Fähigkeiten von KI-Modellen misst, sichert sich Luma aktuell den Spitzenplatz.

Mit einer Gesamtpunktzahl von 0,51 setzt sich Uni-1 knapp vor Nano Banana 2 (0,50) und dessen Pro-Version (0,49). Deutlicher ist der Vorsprung vor Konkurrenten wie GPT Image 1.5, das einen Wert von 0,46 erreicht. Weit abgeschlagen sind GPT Image (0,32) und Qwen-Image-2 (0,31).

Das neue Luma-Modell sticht besonders in der Kategorie "Spatial" hervor. Beim räumlichen Verständnis lässt es mit 0,58 Punkten die gesamte Konkurrenz deutlich hinter sich. Lediglich im Bereich "Logical" (Logik) muss sich Uni-1 Modellen wie Nano Banana 2 noch geschlagen geben.

Autonome Agenten für Agenturen

Parallel zum neuen Modell startet das Unternehmen die Plattform Luma Agents. Diese Software richtet sich primär an Marketingteams und Designstudios. Die Agenten nutzen das tiefgreifende Verständnis von Uni-1, um Projekte von der ersten Idee bis zur fertigen Kampagne eigenständig zu managen.

Ein entscheidender Vorteil ist dabei der Erhalt des Kontexts. Die KI merkt sich alle Vorgaben über verschiedene Arbeitsschritte hinweg. Anwender geben ein kurzes Textbriefing sowie ein Referenzbild ein. Das System entwickelt daraus eigenständig verschiedene Werbemotive, Videos und die dazugehörige Vertonung. Die Agenten bewerten ihre eigenen Ergebnisse und verbessern diese in einer internen Feedbackschleife, ohne dass menschliche Nutzer ständig neue Prompts eingeben müssen.

Luma demonstrierte diese Fähigkeit eindrucksvoll an einer realen Werbekampagne. Das System wandelte eine aufwendige Kampagne in rund 40 Stunden in lokalisierte Versionen für mehrere Länder um.

Quelle: Luma

Ein Dirigent für externe KIs

Die Plattform beschränkt sich nicht nur auf die eigene Technologie. Die Luma Agents agieren bei Bedarf als Orchestrator für externe KI-Dienste.

Wenn eine bestimmte Aufgabe ein spezialisiertes Modell erfordert, greift das System über eine API nahtlos auf externe Anbieter zu. So steuern die Agenten beispielsweise Googles Videomodell Veo 3, ByteDances Seedream oder die Sprachtechnologie von ElevenLabs. Nutzer arbeiten durchgehend in einer einzigen Oberfläche, während im Hintergrund die jeweils beste KI die Aufgabe übernimmt.

Zudem integriert Luma spezifische Funktionen für den professionellen Einsatz. Die Plattform dokumentiert jeden Schritt der KI und erzwingt auf Wunsch eine menschliche Freigabe, bevor Inhalte veröffentlicht werden. Unternehmen behalten dabei die vollen Rechte am geistigen Eigentum ihrer generierten Inhalte.

5 Beispiele von Luma incl. Prompts

Infografik Papier

Create an image from the following text, expanding the description with more details first, if needed: A square-format infographic arranged as a physical flat-lay on a light green cutting mat board. Materials include folded paper origami animals, real dried leaves, acorns, and thick white yarn acting as connection lines. Text is scrawled in green marker on small, round manila shipping tags attached to the board with shiny brass brads. Bottom level: Real dried oak leaves, a pinecone, and a paper-cut mushroom. A handwritten tag reads "PRODUCERS & DECOMPOSERS: Base of the web." Middle level (Primary Consumers): An origami paper rabbit and a small wooden carved mouse. Pieces of white yarn connect them to the leaves and acorns below. Handwritten tag: "HERBIVORES: Eat the plants." Upper level (Secondary Consumers): A folded paper fox and a felt snake. Yarn connects the fox to the rabbit, and the snake to the mouse. Hand-lettered tag: "CARNIVORES: Predators." Top level (Apex Predator): A detailed origami owl perched on a real twig at the top center, connected by yarn to the mouse and snake. Handwritten tag: "APEX PREDATOR: Top of the food chain." The white yarn creates a tangled, overlapping network indicating a web rather than a straight chain. The title, "THE FOREST FOOD WEB," is written in messy block letters on a strip of tree-bark-textured paper at the very top.

Piano Spieler

Generate a sequence of images that are each from a different shot, based on the following storyboard description:
A fixed, unchanging camera frames the same upright piano in a quiet room as a young boy begins learning to play, his movements tentative; without any shift in angle, time flows forward as he grows into a confident teenager, then a passionate young man, then a gentle parent playing for a child at his side, then a reflective middle-aged figure pausing between phrases, and finally an elderly man whose slow, deliberate notes carry the weight of a lifetime, the worn piano and aging room silently marking the passage of years. The camera never changes angle throughout each frame - only the person's physical appearance, background, and characters around the person
Generate the first frame of the sequence.
A young boy sits at playing on a piano, carefully pressing each key as sunlight spills across the room. His mother is next to him, watching him play. The camera is facing the boy, as if on the piano.

4 Zeiten

A panoramic cityscape of New York showing a complete 24-hour cycle in one image. The far left shows sunrise with pink sky over Brooklyn Bridge, gradually transitioning through bright midday in Manhattan center, then golden hour, sunset, and deep blue starry night on the far right. Street lights progressively turn on from right to left. The city architecture stays consistent while only the sky and lighting change. Architectural photography, seamless gradient.

2 Katzen

Two cats sitting across from each other at a tiny cafe table, one sipping from an espresso cup, the other judging, a croissant between them, Parisian morning light, vintage Leica street photography, the energy of two old friends who disagree about everything.

Oktoberfest Plakat

Please generate an image based on the following text, expanding the description with additional details first, if needed: A vintage-style event poster for Oktoberfest, printed on textured ivory paper with a faded parchment effect around the edges. At the top, a decorative banner in Bavarian blue and white diamond check pattern spans the full width, with 'OKTOBERFEST' in large blackletter (Fraktur) font centered within the banner in dark brown ink. Below the banner, the year 'MÜNCHEN 2025' appears in bold sans-serif gold text. The central illustration, rendered in a retro lithograph style with visible halftone dots, depicts a smiling woman in a traditional dirndl holding two large beer steins overflowing with foam; behind her, the silhouette of the Frauenkirche's twin onion domes rises against a warm amber sky. Below the illustration, text in black serif font reads: 'Das größte Volksfest der Welt' followed by 'Theresienwiese · 20. September – 5. Oktober'. At the bottom, in smaller text: 'Eintritt frei · Alle sind willkommen!' flanked by small pretzel and wheat hop illustrations. A thin decorative border with corner rosettes frames the entire poster.