Start Your First Project

The easiest way to connect the concepts above is through a concrete example. This walkthrough takes you from opening a project to exporting the final video.

> Scenario: you have an image in mind. On a rooftop in the middle of the night, a lonely robot waters a dying potted plant. You want to turn that into a 45-second mood piece with no dialogue, only visuals and music. You already have an AI-generated concept image of the robot that you like and want to continue from.

Step 1: Create the Story Through Chat

In the chat panel, you say:

> "Help me make a 45-second emotional short. The story is simple: late-night rooftop, an old robot quietly taking care of a dying potted plant. No dialogue, only visuals and music. The image I uploaded should be used as the visual reference for the robot. Use that style. Vertical, 9:16."

The Planner starts working:

Accept multimodal input: the image and text enter the Planner together in one turn. The analysis agent automatically extracts visual features from the uploaded reference image, such as the weathered metal shell, the dim warm glow in the eyes, and the slightly hunched posture, and also tags the overall visual style. The image simultaneously appears in the Files panel for later binding and reuse.
Match a Skill: based on the request, the Planner automatically selects the Story-Driven Short Video Skill. That Skill tells Flova to organize the narrative as a three-act emotional arc rather than a dense information sequence.
Update documents: the document-maintenance agent writes duration, aspect ratio, "no dialogue, visual-only storytelling," and visual-style keywords into Final Video Spec.
Design the storyboard: guided by the Skill's narrative rules, the storyboard designer outputs a structure:

Key elements: three key elements are created. Element_Chibi_Robot describes the old chibi-style robot, visually grounded in the uploaded image. Element_Rooftop_Night describes the nighttime rooftop with blurred city lights and deep sky. Element_Dying_Plant defines the nearly dead plant in a cracked old clay pot.
Shots: three long shots of roughly 15 seconds each. Shot_Rooftop_Intro establishes the rooftop and slowly pushes in toward the robot and plant. Shot_Caring_Plant focuses on the plant, then the robot watering it, then an extreme close-up of water landing on the dry leaves. Shot_Companion_Night shows the robot sitting beside the plant, then pulls back so both face the quiet city skyline together before fading to black.
Audio layer: a single background music track Audio_Melancholy_Piano, starting with minimal piano, later joined by faint strings and a slightly mechanical texture, moving from loneliness into warm, quiet care.

Step 2: Adjust the Storyboard

After reviewing the output, you decide the third shot should describe the robot sitting down in more detail. You double-click the shot description and revise it to include a soft mechanical creak in the joints and a slightly forward-leaning posture, which adds emotional tension.

You do not need to explain these manual changes to Flova. It will receive them automatically on the next turn.

Step 3: Generate Key Images

You say:

> "Generate all key element images first. I want to lock in the core character and scene before moving on."

The media generator starts by working on key elements rather than shot keyframes. It uses the uploaded reference image as a strict anchor and expands around the same character concept, keeping the chibi shape, rusted shell, warm glowing eyes, and hand-drawn feel consistent.

After a few minutes, several candidate images appear under the key elements. Some emphasize city lights more. Some bring out the starry sky. Some are warmer overall.

You browse them in the preview panel:

one scene image feels close, but not quite deep and lonely enough
you type feedback directly in the preview panel: "This still doesn't feel deep and lonely enough. Generate a few more versions so I can compare."
the media generator adds more versions under the same asset group
among the new versions, some are darker, some reduce the neon glow, and some weaken the warm wall light to increase the feeling of solitude
you compare them side by side and like the one that best matches your intended feeling

The important point here is not one perfect generation. It is locking the character and atmosphere of a single asset through focused iteration, then selecting the best version from a meaningful set of options.

Step 4: Generate the Video

You say:

> "Now generate the video for each shot based on the key elements we've locked in."

The media generator works through the storyboard shot by shot. Each shot automatically references the current versions of Element_Chibi_Robot and Element_Rooftop_Night, so the robot's appearance, the environmental mood, and the global color palette remain consistent across the piece.

After a few rounds, each shot has multiple candidate video versions. You review them one by one. One version of Shot_Companion_Night has the strongest pacing and emotional weight, so you like it. In another shot, the robot's movement feels a little too stiff, so you request two more comment-based revisions and then pick the better one.

Step 5: Generate the Music

You say:

> "Generate the background music."

The media generator creates two music versions under the same asset group according to the audio-layer definition. After listening, you decide the first version is too full, while the second has more space and breathes better with the visuals, so you like the second.

Step 6: Assemble the Timeline

You say:

> "Assemble the timeline."

The video assembler combines the storyboard, the selected key element versions, the generated shot media, and the music into the timeline. You switch to the timeline panel, preview it, and feel that the final hold in the third shot should last one more second so the robot and plant can sit with the city a little longer. You drag the timing manually to adjust it.

While in timeline mode, you can also open the Files panel and drag alternative media directly onto the timeline tracks.

Step 7: Explore a Branch

After previewing the result, you wonder whether the ending would feel more resonant if the plant's leaf moved very slightly before the fade instead of fading out immediately.

You go back into the chat history, find the point before the robot-image generation started, and click Branch from here. In the new branch, you edit the description of Shot_Companion_Night and ask Flova to regenerate it. Now the two endings exist side by side and can be compared, selected, or developed independently.

Step 8: Export

In the end, you choose the more restrained version. There is no explicit "revival," only the robot quietly accompanying the plant until the night ends. Once the timeline is confirmed, you click export, and the 45-second vertical emotional short is complete.

Start Your First Project

Step 1: Create the Story Through Chat

Step 2: Adjust the Storyboard

Step 3: Generate Key Images

Step 4: Generate the Video

Step 5: Generate the Music

Step 6: Assemble the Timeline

Step 7: Explore a Branch

Step 8: Export

On this page