SAMPLED
Tech

How to actually make a music video on CloneViral: a Marcus + Taylor walkthrough

Two agents, one song, zero After Effects. Here's the workflow that gets you something postable instead of another cursed AI clip.

By the Sampled desk·
Sampled

Heads up: this post contains affiliate links to CloneViral (opens in a new tab). If you sign up through them, we may earn a commission at no extra cost to you. We haven''t been paid to write this — opinions are ours.

Most AI video tools dump you into a single prompt box and pray. CloneViral (opens in a new tab) takes a different route: you chat with a specialist agent for the thing you''re trying to make. For a music video that actually cuts together, you only need two of them — Marcus (the film director) and Taylor (the music video producer). Here''s how to run the play.

Step 0: have a song first

Taylor can generate music from scratch, but if you''re a real artist you almost never want that — you want your track scored to visuals. Bounce a clean WAV or MP3 of the song (or the hook + a verse if you''re just testing) before you open the app. 30–60 seconds is plenty for a first pass. Trying to do a full 3-minute video on attempt one is how you burn credits.

Step 1: open Taylor, paste the lyrics

Sign in, pick Taylor — Music Video Producer from the agent list. The opening prompt should give Taylor three things and nothing else:

  1. The mood in two words ("foggy, anxious" / "warm, nostalgic" / "neon, mean")
  2. The setting ("empty parking garage at 3am", "Tokyo back alley in rain")
  3. The lyrics for the section you''re scoring

Don''t ask for camera moves yet. Don''t ask for a treatment. Let Taylor break the lyrics into beats first and give you a rough scene list. You''re QC-ing the structure before you spend compute on pixels.

Step 2: hand the scene list to Marcus

This is the move most people miss. Taylor is good at sync and pacing; Marcus is better at shots. Copy Taylor''s scene breakdown into a new chat with Marcus and ask him to rewrite each beat as a one-sentence shot description with a camera move. Wide push-in. Handheld over-the-shoulder. Locked-off medium. Real film-school language — Marcus understands it.

You''re now holding a shot list. Save it somewhere outside the chat (Notes, a doc, whatever) because you''ll re-use it for revisions.

Step 3: lock the look with one reference

Before you generate anything, give Marcus one reference image — a frame from a video you love, a photo, even a still you generated elsewhere. Tell him: "match the color, grain, and lens character of this image across every shot." This is what stops your video from looking like 8 different AI tools fighting each other. Consistency is the whole game.

Step 4: generate the hardest shot first

Counter-intuitive but right. The shot you''re least sure about — the close-up on a face, the complicated camera move, the one with text on a sign — generate that one first. If it works, the easy shots will too. If it doesn''t, you saved yourself from rendering ten clips around a centerpiece that was never going to land.

Step 5: bring it back to Taylor for sync

Once you have your clips, drop them back into Taylor with the audio. Taylor''s job here is timing — which cut hits on the snare, which shot holds through the vocal phrase, where the beat drop wants a hard cut vs. a slow dissolve. This is where the "music video producer" framing earns its keep; most AI video tools have no concept of a downbeat.

Step 6: export, then fix it in a real editor

Be honest about what AI is for. CloneViral (opens in a new tab) gets you 80% of the way to a postable video in an afternoon. The last 20% — color match between clips, audio ducking, a title card, captions for the muted scroll — is faster in CapCut or Premiere than fighting another round of prompts. Pull the assets out and finish like a human.

What this is actually good for

  • Lyric videos with real atmosphere instead of stock footage
  • Singles you''re not budgeting a video for — the album cuts that still need a Reel
  • Mood pieces for pre-release teasers (15–30 seconds, no narrative pressure)
  • TikTok/Reels content where the bar is "did it stop my thumb"

What it''s still bad at

  • Continuous performance shots of you — character consistency across long clips is still rough
  • Anything requiring readable text in-frame
  • Hands. Always hands.

Verdict

The agent-per-job structure isn''t a gimmick — it''s the reason this workflow holds up. Treating Marcus and Taylor as two different people on your crew (one for shots, one for sync) gets you a finished cut faster than wrestling a single mega-prompt. Try it on a song you already have (opens in a new tab) before you commit to a treatment for the real video. Worst case, you get a Reel out of it.