完美逆向文生图提示词的方法找到了-🎉数字奇遇🎉

在 Codex 里创建两个子智能体，就能把网络上任意一张你想复刻的图片，逆向成可复用、可测试、可迭代的文生图提示词。

我最近做了一个小实验：给 AI 一张封面图，让它反推出背后的提示词。

难点不是猜出「黑黄 cyberpunk manga poster」这种风格词，而是把画面结构也锁住：

左侧巨大 `276`
`USE CASES / OF / HERMES AGENT` 的层级
右侧戴耳机的漫画女性
黄色圆形光环
右侧 HUD 面板
底部六个 icon 标签
黑黄白三色旧印刷质感

我没有直接让一个模型一次猜完。

我创建了两个子智能体。

A：逆向提示词智能体。

它的任务是看图，然后猜背后的 prompt。

它的任务不是夸 A，而是用A的prompt生成一张图片，然后和原图找差异，把差异拆成可执行反馈。

在 Codex 里怎么用

上传一张你想复刻的图片。

创建两个子智能体，一个是逆向提示词智能体 A ，一个是监督智能体 B。A 的作用是猜测一张图片背后的提示词。B 智能体负责用猜出来的提示词生图，对比和原始图片的差异，然后不断给 A 反馈，让它迭代提示词。直到 B 认为二者的差异很小。
终止条件：B 认为差距很小，或者达到第5轮迭代。

Step 1：72 分，先把元素写死

A 第一轮做的事很直接：

把原图里的关键元素全写进 prompt。

不是「huge typography」。

giant yellow “276”
massive bone-white “USE CASES”
small centered “OF”
huge yellow “HERMES AGENT”

不是「HUD panels」。

“$> hermes run”
“PLAN”
“RESEARCH”
“EXECUTE”
“DELIVER”
“DONE.”

这一步生成出来是这样：

它的判断是：风格方向对了，主要元素也对了，但还不够像。

因为原图不是元素堆叠。

原图是一个非常明确的海报版式。

不要继续加风格词，要把 prompt 从「风格描述」升级成「版式锁定描述」。

这是整个实验第一次拐弯。

Step 2：92 分，开始锁版式

第二轮，A 加了一个很重要的词：

LOCKED COMPOSITION

它真正有用的地方是后面的约束：

left half of the poster is dominated by exact readable typography, occupying about 50-55% of the canvas

也就是说，不再只是「左边有大字」。

左半边 50-55% 都是文字。

`276` 在最上面。

`USE CASES` 在中间。

`HERMES AGENT` 在底部。

底下还有六个 icon：

DEV, AUTOMATION, RESEARCH, BUSINESS, CREATIVE, INFRA

B 直接把分数打到了 92，已经达到使用标准了。

生图提示词里，最值钱的不是形容词。

`cyberpunk`、`screenprint`、`high contrast` 这些词只能把你带到一个风格区间。

但 `left 55%`、`top-left`、`lower-right`、`terminal panel sits between typography and character face` 这种词，才能把图钉在画布上。

Step 3：95 分，最后修视觉重心

但 B 还是挑出了几个问题：

左侧 `276` 的压迫感还可以更强。

于是 A 第三轮不再加新元素，只做精修。

Long straight glossy black hair falling past the shoulders, heavy black hair mass, blunt bangs

head and upper body occupy most of the right half, cropped at the bottom-right

把 `276` 改成：

the largest element in the image, nearly touching the top and left margins

B 给了 95 分，然后触发 STOP。

继续加词，只会让 prompt 变得臃肿。

分数怎么涨的

它更像三次不同层级的收敛：

初始 prompt：只抓风格，未评分
第 1 轮：72 分，补全关键元素
第 2 轮：92 分，锁定版式结构
第 3 轮：95 分，精修视觉重心，STOP

最重要的变化不是「词更多了」。

而是 prompt 的描述对象变了。

第四版在描述视觉重心。

这也是我觉得这次实验最有价值的地方。

逆向提示词不是把图片翻译成一堆漂亮词。

而是把图片拆成可执行的生成约束。

最终合并提示词

这是最后 B 判定为 95 分的合并版。

正向和负向我放在一起，方便直接丢进生图工具。

Ultra-wide 1500×600 / 2.5:1 X Article cover, exact locked poster layout, black safety-yellow and bone-white limited palette, distressed manga screenprint cyberpunk technical poster, matte black background, thin rounded safety-yellow border around the entire canvas, dense blueprint grid lines, circuit traces, terminal schematics, HUD markings, risograph grain, halftone dots, scratches, worn ink texture, high contrast.
LOCKED COMPOSITION: left half is dominated by exact readable typography, occupying about 50-55% of the canvas. Top-left: enormous safety-yellow “276”, the largest element in the image, nearly touching the top and left margins, each digit very wide, squared, ultra-condensed varsity / industrial block type, extremely thick strokes, tight spacing, chipped screenprint edges. Middle-left: huge bone-white “USE CASES”, ultra-condensed industrial block letters, tight tracking, stacked tightly below 276. Directly below: small centered bone-white “OF” between thin yellow horizontal rules. Lower-left: huge safety-yellow “HERMES AGENT”, tightly spaced ultra-condensed block letters, spanning the lower-left width.
Bottom-left row beneath the title: six small rounded-square icon buttons with thin yellow outlines, bone-white icons, and exact labels “DEV”, “AUTOMATION”, “RESEARCH”, “BUSINESS”, “CREATIVE”, “INFRA”, followed by small yellow ellipsis and bone-white text “AND MORE”.
Right half: large anime manga young woman AI operator, head and upper body occupy most of the right half, cropped at the bottom-right, face located around center-right and facing left in three-quarter profile. Long straight glossy black hair falling past the shoulders, heavy black hair mass, blunt bangs, hair nearly touches the top border, pale bone-white face, calm serious expression, large over-ear headphones, thick black manga linework, sharp white ink highlights, large black high-collar jacket creating a heavy black shape on the lower-right.
Behind the character: large flat safety-yellow disk halo centered behind the head, spanning from upper center to lower right, partially cropped by the frame and partially hidden behind hair and shoulder.
HUD panels: terminal panel sits between the left typography and the character face, reading “$> hermes run”, “PLAN”, “RESEARCH”, “EXECUTE”, “DELIVER”, “DONE.” Top-right small panel reads “AGENT STATUS”, “ONLINE”, “WORKING 24/7”. Far-right vertical panel reads “CATEGORIES” with “DEV WORKFLOWS”, “INTEGRATIONS”, “PERSONAL ASSISTANTS”, “BUSINESS OPS”, “CONTENT CREATION”, “RESEARCH SYSTEMS”, “INFRASTRUCTURE”, “AND MORE…”. Lower-right box reads “TOTAL”, huge yellow “276”, “REAL WORLD”, “USE CASES”.
Preserve strong readable hierarchy and clear negative spacing around the main typography; all main text must be perfectly spelled and vector-sharp, with no extra random letters or distorted glyphs. Dense, balanced, gritty poster composition, but keep the left typography clearly readable.
Avoid small timid typography, avoid shrinking the character, avoid making the girl cute/chibi, avoid generic random text, avoid misspelled title text, avoid replacing 276, avoid photorealistic face, avoid colorful cyberpunk lights, avoid neon gradients, avoid glossy corporate gradients, avoid soft 3D render, avoid clean corporate UI, avoid pastel colors.

以前想复刻一张网络酷图，基本只能靠猜：猜风格、猜关键词，或者把图丢给大模型拿一个大概 prompt。

这次真正有价值的，不是提示词变长了，而是机制变了。

A 负责生成假设，B 负责监督差异；每一轮都生图、对照、打分，再把差异变成下一轮约束。

这很像生成对抗网络：一个生成，一个判别。

但这里的判别器会给出可执行反馈：人物太小、文字层级没锁住、光环位置不对、HUD 面板缺失、负向约束不足。

于是复刻不再靠玄学，而是变成一个有测试、有反馈、有验证的迭代系统。

复刻酷图的关键，不是猜中那句神秘咒语，而是搭出一个会自己逼近答案的机制。

文章版权归作者所有，未经允许请勿转载。

THE END