Midjourney+veo3=?🤫 别外传 |,从神秘平台大佬那“偷”来的Veo3秘籍，看完你也是半个导演。(附完整提示词)-课多多

我知道，最近你的信息流里八成也塞满了各种AI生成的视频，看得都快麻木了。但你有没有想过，那些看起来牛逼哄哄的短片，很多其实就是用 Veo3 “算”出来的？

没错，今天这篇就是讲 Veo3。

但不是那种“一键生成爆款”的垃圾教程。我们要聊的是怎么把那些脑子里的细节——具体到某一秒的对话、某个特定的环境音、甚至是一闪而过的表情——给真正做出来。

网上流行的那些所谓“ASMR视频”、“AI数字人”模板，确实不少。但说白了，用模板，你永远都在别人的框架里打转，玩不出自己的东西。

我想说的，是一种能让你自由实现几乎任何画面的方法。

或者说，是如何设计一种能举一反三的、真正有效的提示词。

我写东西，一向不喜欢只给个鱼，让你钓一次就没了。如果只能生成某个特定场景，那玩的边界也太窄了。这篇文章，我敢说，多少会对你有用。

第一步：生成“原始”图像

开始之前，我们先用老朋友 Midjourney 生成一张基础图像。

为啥非要用 Midjourney？原因很简单，它的出图质量稳定，够顶。当然，你用 Veo3 自己生成或者别的什么AI也完全没问题，看你个人习惯。

[中]

在昏暗的夜店里偷拍的一张快照。一位18岁的中国女孩，留着可爱的粉色波波头和刘海，正以时尚的Y2K风格跳舞，散发出一种时尚俏皮的气息。灯光柔和，淡淡的霓虹灯在周围反射，营造出一种轻松亲密的派对氛围。背景中只有几个人，构图捕捉到了昏暗夜店环境中自然而青春的瞬间。 --ar 16:9

[EN]

A candid snapshot in a dimly lit nightclub. An 18-year-old Chinese girl with a cute pink bob and bangs is dancing in a stylish Y2K fashion, exuding a chic and playful vibe. The lighting is soft, with faint neon glows reflecting around, creating a relaxed and intimate party atmosphere. Only a few people are in the background. The composition captures a natural and youthful moment in the low-light club environment. --ar 16:9

第二步：从图像生成视频

拿到上面那张图之后，我们把它扔进 Veo3，让它动起来。

我选择刚才生成的第一张图作为初始帧。

每两秒钟进行一次明暗交替的“切光”效果（亮 → 暗 → 亮 → 暗）
女孩安静而富有节奏的、有代入感的摇摆
真实的俱乐部氛围（电子乐、环境噪音）
一段能让你在视觉和听觉上都体验到“都市夜生活”的短片

veo3生成的动画

海螺生成的动画(感谢群友@油炸小龙虾协助提供视频)

好了，接下来就是视频生成提示词的解释部分了。

我知道，你第一眼看到下面的代码，可能会觉得：“嗯…看起来有点麻烦。”

要是真这么觉得，跳过它，别勉强自己，直接从你感兴趣的部分开始读，读着读着可能自然就懂了。

我到底是怎么指定秒数的？

Veo3 这工具，最革命性的功能之一，就是能**“生成带有环境声、对话等声音的视频”，并且能“把场景和动作精确到秒”**。

能把细节指定到这个程度，对创作来说，简直是天赐之物。

但经常做视频的人，可能都会有我当初那个疑问：

“只用提示词，真的能把指令细化到每一秒吗…？”

答案是，可以。而且准得吓人。

方法就是这玩意儿：JSON 提示词。

废话不多说，直接上我们这次用的提示词。这是一位来自“X”平台的大佬琢磨出的方法，我消化后整理了出来，相当牛逼。

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "shot": {    "composition": "medium close-up, eye-level, woman's upper body centered with dim crowd behind",    "camera_motion": "handheld, gentle side-to-side sway mimicking the beat",    "frame_rate": "24fps cinematic",    "lens": "35 mm low-light lens",    "film_grain": "slight grain for texture"  },  "subject": {    "description": "young woman with pastel pink bob hair, smoky eye makeup, soft glow on cheeks; dressed in dark clubwear",    "pose": "swaying slowly left and right with closed eyes or subtle gaze",    "emotion": "entranced, in-the-moment"  },  "scene": {    "location": "intimate, crowded nightclub with neon tubing and exposed ceiling",    "time_of_day": "after midnight",    "environment_details": "people in the shadows, pulsing lights, faint fog in the air"  },  "visual_details": {    "timeline": [      {        "t": "0-2s",        "action": "RED LIGHT ON — spotlight from the right side illuminates her hair and face as she sways"      },      {        "t": "2-4s",        "action": "DARKNESS — lights cut to black; only vague outlines visible; she continues moving to the rhythm"      },      {        "t": "4-6s",        "action": "BLUE LIGHT ON — overhead soft strobe pulses every 0.5s; her silhouette glows in cool tones"      },      {        "t": "6-8s",        "action": "DARKNESS — full blackout again, just subtle rim-light from distant fixtures"      }    ]  },  "cinematography": {    "lighting": "hard switch between light and darkness every 2 seconds (red → black → blue → black)",    "style": "moody, underground club realism",    "tone": "entrancing, hypnotic"  },  "audio": {    "ambient": "continuous murmur of people talking and laughing in a crowded club; voices are layered and spatially spread",    "music": "steady electronic synth tone with subtle pulsing, non-melodic, lasting full 8 seconds",    "dialogue": "indistinct crowd chatter throughout, no clear individual voices"  },  "color_palette": "deep blacks, neon red and blue highlights, subtle metal tones",  "visual_rules": {    "prohibited_elements": [      "daylight",      "subtitles",      "logos",      "pop-up UI",      "sci-fi effects"    ]  }}

我懂，这玩意儿看起来是真头大（苦笑）。

（※ 如果你是工程师，估计一眼就看明白了。）

但别怕。这个所谓的 “JSON 格式”，其实就像一份给AI准备的、结构超清晰的数据配方。

啥是 JSON 格式？

简单说，它就是一种能把数据写得井井有条、结构分明的格式。

特点就两个：

人能看懂。
机器更容易理解。

你看，这不就跟英语差不多成了世界通用语一样嘛。

换句话说，这是一种对人类和AI都足够友好的表达方式。

这到底意味着什么？

一句话总结就是：

“用自然语言跟AI沟通可能会产生误解，但用 JSON，它能更精确地理解你的意图。”

举个例子，你说“一个18岁的中国波波头女孩”。

我们人一听就懂。但AI需要去解析“每一个元素”，这中间就容易出现模糊地带。

但如果用 JSON 来写：

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "age": 18,  "gender": "female",  "nationality": "Chinese",  "hairstyle": "bob"}

这样一来，机器就能非常准确地get到“年龄多少、性别是啥、国籍在哪、发型如何”这些信息。

JSON 独有的精确表达

再比如这句：“一辆红色的跑车停在上海市中心。”

听起来是句很正常的自然语言吧？但细究起来：

到底是什么红？
是什么牌子的跑车？
在市中心的具体哪个位置？

各种问题就来了。

而用 JSON，你可以把“红色”精确到具体的颜色代码，比如 #800000 (栗红色)。

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line// 中文示意{  "subject": {    "object": "法拉利 F8 Tributo",    "color": "栗红色",    "hex": "#800000",    "status": "停放中"  },  "setting": {    "location": "上海市中心的十字路口",    "time_of_day": "正午",    "lighting": "自然日光",    "crowd": "熙熙攘攘"  },  "composition": {    "angle": "低角度，前3/4视角",    "focus": "前景是车，背景是人群和城市景观"  }}
// English{  "subject": {    "object": "Ferrari F8 Tributo",    "color": "maroon red",    "hex": "#800000",    "status": "parked"  },  "setting": {    "location": "Shibuya Scramble Crossing, Tokyo",    "time_of_day": "noon",    "lighting": "natural daylight",    "crowd": "bustling"  },  "composition": {    "angle": "low-angle, front 3/4 view",    "focus": "car in foreground, crowd and cityscape in background"  }}

这样一来，AI理解起来就容易多了。

当然了，现在的AI已经很聪明，能读懂大部分自然语言的上下文，一般情况下，“红色跑车”或“粉色头发”这种描述也完全够用。我自己也经常这么干。

关键是，要学会聪明地使用它们，在需要精确控制的时候，就拿出JSON这个大杀器。

拆解一下 Veo 3 的 JSON 到底写了啥

（为方便理解，这里提供一份中文对照版本）

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "shot": {    "composition": "中近景，与视线齐平，女性上半身居中，背后是昏暗的人群",    "camera_motion": "手持拍摄，模仿节拍进行轻柔的左右摇晃",    "frame_rate": "24fps 电影感",    "lens": "35mm 低光照镜头",    "film_grain": "轻微的胶片颗粒以增加质感"  },  "subject": {    "description": "留着淡粉色波波头的年轻女性，烟熏眼妆，脸颊有柔和光泽，身穿深色夜店服装",    "pose": "闭着眼或眼神迷离地向左右缓慢摇摆",    "emotion": "沉醉其中，活在当下"  },  "scene": {    "location": "有霓虹灯管和裸露天花板的、拥挤但氛围亲密的夜店",    "time_of_day": "午夜之后",    "environment_details": "阴影中的人群，脉动的灯光，空气中弥漫着淡淡的薄雾"  },  "visual_details": {    "timeline": [      {        "t": "0-2秒",        "action": "红灯亮 — 右侧的聚光灯照亮她的头发和脸，她随之摇摆"      },      {        "t": "2-4秒",        "action": "暗场 — 灯光切黑，只能看到模糊的轮廓，她继续随节奏移动"      },      {        "t": "4-6秒",        "action": "蓝灯亮 — 头顶柔和的频闪灯以0.5秒的间隔脉动，她的剪影在冷色调中发光"      },      {        "t": "6-8秒",        "action": "再次暗场 — 完全变黑，只有远处灯具的微弱轮廓光"      }    ]  },  "cinematography": {    "lighting": "每2秒在亮与暗之间进行硬切换（红 → 黑 → 蓝 → 黑）",    "style": "情绪化、地下俱乐部现实主义风格",    "tone": "迷人、催眠般的"  },  "audio": {    "ambient": "拥挤俱乐部中人们持续的交谈和笑声，声音层次分明，具有空间感",    "music": "稳定的电子合成器音调，带有微妙的脉动，无旋律，持续整整8秒",    "dialogue": "贯穿始终的模糊人群嘈杂声，没有清晰的个人声音"  },  "color_palette": "深邃的黑色，霓虹红和蓝色的高光，以及微妙的金属色调",  "visual_rules": {    "prohibited_elements": [      "日光",      "字幕",      "标志",      "弹出式UI",      "科幻效果"    ]  }}

拆解与解释

上面这个提示词，主要由8个部分构成。

第一部分：拍摄 (shot)
- composition (构图): 怎么拍。这里是中近景，镜头跟人眼平齐，把女孩上半身框进来。
- camera_motion (镜头移动): 像手拿着相机一样，自然地左右轻晃。
- frame_rate (帧率): 24帧/秒，营造电影感，不会过于丝滑。
- lens (镜头): 模拟35mm镜头，在暗处也能拍清人。
- film_grain (颗粒感): 加一点粗糙的质感，让画面更真实。
第二部分：主体 (subject)
- description (外貌): 淡粉色波波头，烟熏妆，深色衣服。
- pose (姿势): 随着音乐，身体慢慢左右摇摆。
- emotion (情绪): 沉浸在音乐里，平静又专注。
第三部分：场景 (scene)
- location (地点): 一个有霓虹灯和裸露天花板的深夜俱乐部。
- time_of_day (时间): 午夜之后。
- environment_details (环境细节): 周围有人，空气有点烟雾，灯光偶尔闪烁。
第四部分：视觉细节 (visual_details) (按时间线的效果)
- 0-2秒: 红光照亮女孩，她缓缓摇摆。
- 2-4秒: 突然全黑，只能看到轮廓，但她还在动。
- 4-6秒: 蓝光从头顶照下，勾勒出她的剪影。
- 6-8秒: 再次变黑，只有远处的灯光提供一点微弱的轮廓光。
- (用这种明暗交替，来表现俱乐部那种“灯光闪烁”的氛围。)
第五部分：电影摄影 (cinematography) (视觉风格)
- lighting (光照): 每2秒进行一次强烈的明暗切换（红→黑→蓝→黑）。
- style (风格): 真实的夜店氛围，不要奇幻，要写实。
- tone (调性): 让人感觉身临其境，仿佛被音乐吸进去。
第六部分：音频 (audio)
- ambient (环境音): 能一直听到周围人的说话声、笑声等杂音。
- music (音乐): 持续8秒的电子音，没有旋律。
- dialogue (对话): 听不清具体内容，但能感到有几个人在低声交谈。
第七部分：调色板 (color_palette)
- 整体是黑色基调，用红色和蓝色的霓虹灯做点缀。带点金属的冷色调质感。
第八部分：视觉规则 (visual_rules) (禁止出现的东西)
- 禁止包含： 白天的光线、字幕、品牌Logo、UI按钮、科幻特效等。

就是这些。

把这个格式作为基础，你就可以把它应用到任何你想做的场景里（比如车站、自然风光、城市街道、教室等等）。

自动生成模板

我知道，每次都手写这玩意儿能把人逼疯。

所以，我给你们（也给我自己）做了个模板，可以直接复制，然后把你的想法填进去，再扔给AI，让它帮你生成完整的 JSON 提示词。

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line// 模板{  "shot": {    "composition": "",    "camera_motion": "",    "frame_rate": "",    "lens": "",    "film_grain": ""  },  "subject": {    "description": "",    "pose": "",    "emotion": ""  },  "scene": {    "location": "",    "time_of_day": "",    "environment_details": ""  },  "visual_details": {    "timeline": [      {        "t": "0-2s",        "action": ""      },      {        "t": "2-4s",        "action": ""      },      {        "t": "4-6s",        "action": ""      },      {        "t": "6-8s",        "action": ""      }    ]  },  "cinematography": {    "lighting": "",    "style": "",    "tone": ""  },  "audio": {    "ambient": "",    "music": "",    "dialogue": ""  },  "color_palette": "",  "visual_rules": {    "prohibited_elements": []  }}

怎么扔给 AI？

把上面的JSON模板和你想要创建的内容一起发给AI。

你想创作的内容（示例）：

拍摄 (shot):
- 构图: “特写”, “广角”, “低角度”
- 镜头移动: “固定”, “第一人称视角”, “手持晃动”
- 帧率: “24fps 电影感”, “60fps 流畅”
- 镜头: “35mm”, “广角手机镜头”
- 颗粒感: “数码清晰”, “胶片感”
主体 (subject):
- 描述: “穿红裙的女人，波波头”
- 姿势: “走路”, “看着镜头笑”
- 情绪: “看起来很开心”, “专注”
场景 (scene):
- 地点: “深夜俱乐部”, “海边小路”
- 时间: “白天”, “傍晚”, “黎明”
- 环境细节: “风中摇曳的树木”, “背景的霓虹灯”
视觉细节 (visual_details): (最长8秒，或自由发挥)
- 时间线: “0-2秒: 女人看镜头笑”, “2-4秒: 镜头向左平移”
电影摄影 (cinematography):
- 光照: “夕阳的逆光”, “闪烁的霓虹灯”
- 风格: “电影风格”, “真实纪录片”, “动漫风格”
- 调性: “梦幻”, “紧张”, “怀旧”
音频 (audio):
- 环境音: “风声”, “虫鸣”, “汽车声”
- 音乐: “Lo-fi”, “嘻哈节拍”
- 对话: “有人在喊叫”, “能听到微弱的声音”
调色板 (color_palette):
- “单色调”, “鲜艳的红蓝色”, “淡淡的粉彩色”
视觉规则 (visual_rules):
- 禁止元素: “字幕”, “Logo”, “日光”, “科幻特效”

就是这么简单。

重点！

如果你是从一张已经生成的图像开始做视频，那么 1: 拍摄 (shot) 和 2: 主体 (subject) 这两部分，除非你有特别的偏好，否则可以删掉。因为图像本身已经包含了这些信息。
想再简单点，3: 场景 (scene) 也可以删。
AI 更喜欢简单、易懂的提示词，信息给得太多，有时候反而会起反作用。建议你灵活运用，只保留像镜头移动这种关键点就行。

下面我们再看几个例子。

再创建一个其它的场景！

[中]

一位时尚的18岁中国女子，留着天蓝色的短发，戴着墨镜坐在BMX自行车上，自信地在岩石山顶上自拍。她穿着融合了运动和潮流元素的时尚夏季街头服饰。这张照片是从高角度拍摄的，背景中展现出广阔而令人惊叹的自然景观。在明亮的夏日阳光下，崎岖的岩石地形和一望无际的风景交相辉映。她的墨镜和鲜艳的发色在背景中格外醒目，增添了一种清凉自信和冒险精神。 --ar 16:9

[EN]

A stylish 18-year-old Chinese woman with sky-blue short hair and sunglasses sits on a BMX bike, confidently taking a selfie on a rocky mountaintop. She is dressed in fashionable summer streetwear that blends sporty and trendy elements. The shot is taken from a high angle, showcasing a vast and stunning natural landscape in the background. The rugged, rocky terrain and expansive scenery are highlighted under the bright summer sun. Her sunglasses and vibrant hair color stand out against the backdrop, adding a cool, confident, and adventurous spirit. --ar 16:9

提示词 (Prompt)

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "shot": {    "composition": "starts with selfie-style handheld shot, then transitions to a first-person female POV using either a helmet-mounted or chest-mounted camera",    "camera_motion": "handheld panning in the first half, then smooth forward motion with terrain-based vibration in POV",    "frame_rate": "60fps for smooth motion capture",    "lens": "wide-angle action cam lens (GoPro-style)",    "film_grain": "digital clean"  },  "subject": {    "description": "young woman with bright blue bob haircut and sunglasses, wearing a white long-sleeve top and casual cycling outfit, riding a mountain bike",    "pose": "smiling and panning with the camera in the first half, then gripping handlebars tightly as she descends the trail",    "emotion": "lighthearted and curious at the start, focused and energized during the ride"  },  "scene": {    "location": "rocky mountain summit transitioning to a steep dirt trail with panoramic views",    "time_of_day": "midday with strong sunlight and few clouds",    "environment_details": "expansive mountainous terrain, dry rocks and shrubs, distant towns visible below"  },  "visual_details": {    "timeline": [      {        "t": "0-3s",        "action": "selfie-style camera footage: the woman holds the camera and slowly pans to show the surrounding landscape while smiling into the lens"      },      {        "t": "3-4s",        "action": "cut to first-person female POV from either helmet-mounted or chest-mounted camera, showing the handlebars and the trail ahead"      },      {        "t": "4-8s",        "action": "POV view as she descends rapidly on a rugged mountain trail; hands grip the handlebars firmly while the trail scrolls past at speed"      }    ]  },  "cinematography": {    "lighting": "strong midday sunlight with occasional lens flare and natural contrast",    "style": "cinematic GoPro-style realism",    "tone": "free-spirited to adrenaline-filled and immersive"  },  "audio": {    "ambient": "light mountain breeze in the first half, intensifying with speed during descent",    "music": "",    "dialogue": "",    "foley": [      {        "timestamp": "0-8s",        "sound": "gravel crunching under tires, light frame rattle, wind building as the ride accelerates"      }    ]  },  "color_palette": "sky blue, tan and rocky earth tones, contrasted with bright clothing and vivid sunlight",  "visual_rules": {    "prohibited_elements": [      "third-person camera angles",      "subtitles or text overlays",      "logos or branding",      "urban elements",      "unrealistic effects"    ]  }}

(后面的案例就不一一翻译JSON内部文本了，结构和逻辑都是一样的)

[ASMR] 尝试吃一株会晃动的水生植物

[中]

一位18岁的中国女孩，留着铂白色波波头和刘海，坐在一间略显昏暗、如梦似幻的房间里的桌子旁。她穿着简单的夏装，面容柔和平静。桌子上放着一个麦克风，位置略微偏离中心，中间立着一株神秘的室内植物，叶子完全由缓缓漂浮的水组成。灯光昏暗而忧郁，柔和的高光打在她的脸上和那株反光的植物上。整体氛围安静、魔幻，又略带超现实，就像一个温馨的直播空间，却又带着一丝奇幻的意味。 --ar 16:9

[EN]

An 18-year-old Chinese girl with a platinum white bob and bangs sits at a table in a slightly dim, dreamlike room. She wears simple summer clothes, her expression soft and calm. On the table is a microphone, positioned slightly off-center, and in the middle stands a mysterious indoor plant with leaves made entirely of slowly floating water. The lighting is dim and moody, with soft highlights on her face and the reflective plant. The overall atmosphere is quiet, magical, and slightly surreal, like a cozy live-streaming space with a hint of fantasy. --ar 16:9

提示词 (Prompt)

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "scene": {    "setting": "dimly lit room with a soft, intimate atmosphere",    "subject": "Chinese woman with platinum white bob hair",    "object": "fantastical plant with transparent water-like leaves on a table",    "camera": "stationary frontal view",    "lighting": "subtle, focused on the plant and her face",    "audio_style": "high-fidelity ASMR with delicate environmental mic capture"  },  "visual_details": {    "timeline": [      {        "t": "1-2s",        "action": "She gently picks a single transparent leaf from the plant. The soft sound of water movement is audible along with her calm breath."      },      {        "t": "2-5s",        "action": "Water slowly trickles from the leaf into her hand, creating soft, plump, dripping sounds that ripple through the ASMR mic."      },      {        "t": "5-8s",        "action": "She eats the leaf slowly, and water begins to overflow slightly from her mouth. Gentle chewing and delicate swallowing sounds are captured in high detail."      }    ]  },  "audio": {    "foley": [      {        "timestamp": "1-2s",        "sound": "soft breathing, slight water shift from touching the leaf"      },      {        "timestamp": "2-5s",        "sound": "realistic water dripping, wet tactile ASMR textures"      },      {        "timestamp": "5-8s",        "sound": "chewing and subtle mouth sounds, water overflow and gentle gulping"      }    ]  }}

[ASMR] 尝试会晃动的蛋糕

[中]

一名 18 岁的中国女孩，留着柔软的粉红色短发和刘海，正平静地坐在灯光昏暗、如梦似幻的房间里的桌子旁。她穿着简单的夏装，面容平静地面对镜头。麦克风放在桌子的一侧，桌子中央放着一块完全由水制成的半透明蛋糕，在柔和的灯光下微微闪闪发光。气氛安静而略带超现实，融合了现代网红的审美和一丝魔幻现实主义。 --ar 16:9

[EN]

An 18-year-old Chinese girl with soft pink short hair and bangs sits calmly at a table in a dimly lit, dreamlike room. She wears simple summer clothes, facing the camera with a serene expression. A microphone is placed to the side of the table, in the center of which sits a translucent cake made entirely of water, shimmering slightly under the soft light. The atmosphere is quiet and slightly surreal, blending a modern influencer aesthetic with a touch of magical realism. --ar 16:9

提示词 (Prompt)

ounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(lineounter(line{  "scene": {    "setting": "Kitchen counter with dreamy pink and blue lighting.",    "subject": "Chinese woman with pink bob hair, sitting in front of an ASMR microphone.",    "object": "A transparent jelly-like water cake with blueberries inside.",    "camera": "Fixed front-facing camera, bust shot composition.",    "audio_style": "Ultra high-quality ASMR recording."  },  "visual_details": {    "timeline": [      {        "t": "0-3s",        "action": "She gently lifts the water cake with both hands. The blueberries float and sway inside the jelly."      },      {        "t": "3-8s",        "action": "She brings the cake to her mouth and takes a bite. Water bursts out, overflowing from her mouth and dripping down her chin."      }    ]  },  "audio": {    "foley": [      {        "timestamp": "0-3s",        "sound": "The sound of hands touching the jelly, gentle wobbling water sounds, and subtle noises from moving blueberries."      },      {        "timestamp": "3-8s",        "sound": "Realistic chewing sounds of water, muffled water movements, and dripping water falling from her chin in ASMR detail."      }    ]  }}

就这样吧。

说实话，Veo3 的“声音”质量高得有点吓人……不拿来搞点怪东西都对不起它。

Veo3 这东西可玩性很高,我也会继续研究。

附赠：Midjourney 生成的菜单提示词

[中]

一份编辑风格、极简主义的菜单布局，采用极简的斯堪的纳维亚风格构图。场景分为两部分：左侧，优雅地摆放着一盘美味的卡邦尼意面，盛在干净的白色陶瓷碗中，特色是奶油蛋酱、融化的奶酪和现磨黑胡椒。右侧，温暖的南瓜汤盛在哑光陶瓷碗中，搭配一杯加冰块和薄荷叶的意大利苏打水。灯光明亮柔和，阴影柔和，自然光充足。食物造型现代简约，没有杂乱或过多的装饰。拍摄时采用浅景深和柔和的背景，符合杂志编辑的审美。每道菜旁边都用精致的衬线字体居中覆盖着文字：“夏季套餐”。 --ar 2:3

[EN]

An editorial-style, minimalist menu layout in a minimalist Scandinavian composition. The scene is split into two parts: on the left, a delicious plate of Carbonara pasta is elegantly presented in a clean white ceramic bowl, featuring creamy egg sauce, melted cheese, and freshly ground black pepper. On the right, a warm pumpkin soup in a matte ceramic bowl, paired with a glass of Italian soda with ice and mint leaves. The lighting is bright and soft with gentle shadows and ample natural light. The food styling is modern and simple, with no clutter or excessive decoration. Shot with a shallow depth of field and a soft background, fitting a magazine editorial aesthetic. Overlayed next to each dish is centered serif text with a delicate font: "Summer Set Menu". --ar 2:3

文章版权归作者所有，未经允许请勿转载。

THE END

AI笔记