SparkScroll

项目简介 About

让每个人都能成为漫画家

Let Everyone Become a Comic Artist

SparkScroll（星火绘卷）是面向经典文学内容生成的多模态创作平台。围绕"长文本理解 → 剧情拆解 → 分镜生成 → 页面渲染 → 前端展示"构建完整闭环，让《西游记》等经典名著化身精美的数字连环画。

SparkScroll is a multimodal creation platform for classic literary content. It closes the loop from "long-text understanding → plot decomposition → storyboarding → page rendering → frontend display," transforming masterpieces like Journey to the West into beautiful digital comics.

项目名称中，Spark 致敬 NVIDIA DGX Spark 平台， Scroll 代表连环画与画卷；中文"星火绘卷"兼具科技感与古典文学韵味。

In the name: Spark pays tribute to the NVIDIA DGX Spark platform, and Scroll represents comics and picture scrolls. The Chinese name "星火绘卷" carries both a sense of technology and classical literary charm.

导演 Director 编剧 Writer 立绘 CharacterDesigner 出稿 Drafting 剪辑 Editor 合成 Assembler

Director Writer CharacterDesigner Drafting Editor Assembler

6

协同 Agent 阶段

Agent Pipeline Stages

128GB

DGX Spark 统一内存

DGX Spark Unified Memory

128K

文本模型上下文窗口

Text Model Context Window

0

Swap · 极致秒级响应

Swap — Zero-latency response

项目亮点 Highlights

四大核心能力

Four Core Capabilities

🎭

六阶段工业级工作流

Six-Stage Industrial Workflow

Director → Writer → CharacterDesigner → Drafting → Editor → Assembler，完整还原人类漫画制作流程，从百万字长文到精美页面一气呵成。

Fully reproduces the human comic production workflow — from million-word novels to polished pages in one continuous pipeline.

🎨

角色一致性解决方案

Character Consistency Solution

立绘 Agent 预生成角色正/侧/背三视图与道具参考图，后续所有分镜以参考图为基准注入扩散模型条件通道，风格统一、叙事连贯。

CharacterDesigner pre-generates front/side/back reference sheets; all subsequent panels use these as conditioning inputs to the diffusion model, ensuring visual consistency throughout.

⚡

重型双大模型常驻显存

Dual Heavy-Model Memory Residency

文本主脑（~40 GB）与视觉中枢（50–60 GB）同时常驻 DGX Spark 128 GB 统一内存，零 Swap、极致秒级响应，多 Agent 极速切换。

Text brain (~40 GB) and visual core (50–60 GB) co-reside in DGX Spark's 128 GB unified memory — zero swap, second-level response, rapid multi-agent switching.

🖨️

物理渲染印刷级文字

Physical-Render Print-Quality Text

Editor Agent 使用 Python PIL 在像素层面排版文字，彻底消除 AI 生字乱码，保证中文字体清晰、布局精准、达到出版级品质。

The Editor agent uses Python PIL to typeset text at the pixel level — no AI-generated garbled characters, ensuring crisp Chinese fonts and publication-grade layout quality.

工作流 Pipeline

六阶段 Agent 流水线

Six-Stage Multi-Agent Pipeline

①

导演 Agent — Director Director Agent

基于 128K 上下文对长文本进行"降维压缩"，提取核心剧情节点与角色关系，输出结构化剧本大纲。

Leverages 128K context to compress long texts, extracting core plot nodes and character relationships into a structured story outline.

②

编剧 Agent — Writer Writer Agent

将大纲拆解为分集、分镜，每页 4–6 个场景，输出标准化 JSON 剧本，指定每格台词与画面描述。

Breaks the outline into episodes and panels (4–6 scenes per page), outputting a standardized JSON script with dialogue and visual descriptions for each cell.

③

立绘 Agent — CharacterDesigner CharacterDesigner Agent 核心亮点 Key Innovation

预生成每个角色的正面、侧面、背面三视图和关键道具参考图，为后续分镜锁定视觉一致性基准。

Pre-generates front/side/back three-view character sheets and key prop reference images, establishing the visual consistency baseline for all subsequent panels.

④

出稿 Agent — Drafting Drafting Agent

基于分镜 JSON 和角色参考图，调用 Diffusers + vLLM-Omni 双驱动生成各格底图，保证角色风格与参考图对齐。

Using the storyboard JSON and character references, calls the Diffusers + vLLM-Omni dual engine to generate panel base images, keeping character style aligned to references.

⑤

剪辑 Agent — Editor Editor Agent

纯 PIL 像素级排版：将台词嵌入对话框，精准控制字体大小、行距与位置，拒绝 AI 乱码，输出印刷级清晰图像。

Pure PIL pixel-level typesetting: embeds dialogue into speech bubbles with precise font control, eliminating AI-generated garbled text and producing print-quality images.

⑥

合成 Agent — Assembler Assembler Agent

聚合各页产物，流式按页推送给前端，每生成一页即时呈现，用户无需等待全集完成即可阅览。

Aggregates per-page outputs and streams them to the frontend in real time — users can start reading as soon as the first page is ready, without waiting for the full episode.

技术架构 Architecture

为什么必须是 NVIDIA DGX Spark？

Why NVIDIA DGX Spark Is Essential

SparkScroll 采用"重型双大模型常驻显存"架构，普通单卡系统完全无法运行该并发管线。

SparkScroll's "Memory-Resident Multi-Model" architecture simply cannot run on ordinary single-GPU systems.

模块	Module	选型	Selection	显存预算	Memory Budget	DGX Spark 必要性
文本与逻辑主脑	Text & Logic Brain	Qwen3.5-9B 128K ctx · vLLM v0.19.0	~40 GB	~40 GB	128K 上下文足以覆盖绝大多数小说全文，稳定支撑剧情压缩与跨集规划。	128K context covers most full-length novels, enabling stable plot compression and cross-episode planning.
视觉渲染中枢	Visual Rendering Core	Qwen-Image-Edit-2511 FireRed-Image-Edit-1.1	50–60 GB	50–60 GB	兼顾高保真角色一致性编辑与中文字幕渲染，强行量化会导致排版几何推理能力丧失。	Balances high-fidelity character editing and Chinese subtitle rendering; forced quantization destroys layout geometry reasoning.
框架与并发缓冲	Framework & Concurrency	vLLM · FastAPI · Diffusers · OS	~15 GB	~15 GB	支撑 API 流转、自研调度与多 Agent 极速切换，需安全冗余。	Supports API routing, custom scheduling, and rapid multi-agent switching, requiring safe redundancy.
总计	Total	—	110–115 GB / 128 G	110–115 GB / 128 G	只有 DGX Spark 能实现"零 Swap"的极致秒级响应，支撑多 Agent 极速切换与流式呈现。	Only DGX Spark achieves zero-swap, second-level responses enabling rapid multi-agent switching and streaming delivery.

RTX 4090 仅有 24 GB 显存，频繁 Swap 导致性能断崖式下跌——长上下文处理速度衰减可达 64×。 DGX Spark 的 128 GB 统一内存与 NVLink-C2C 900 GB/s 双向带宽彻底消除这一瓶颈。

An RTX 4090 has only 24 GB of VRAM; frequent swapping causes performance to cliff — long-context processing can be 64× slower. DGX Spark's 128 GB unified memory and NVLink-C2C 900 GB/s bidirectional bandwidth eliminate this bottleneck entirely.

效果展示 Gallery

生成的连环画作品

Generated Comic Pages

基于本地生成结果整理，每个项目可按集数与页码连续预览

Organized from local generation outputs, with project-level episode and page previews

3 套作品 3 projects 76 张成稿 76 final pages

应用场景 Use Cases

五大真实应用场景

Five Real-World Application Scenarios

📚

教育普及

Education

将《西游记》等古典名著转化为连环画，提升学生知识留存率与阅读兴趣。

Transform classics like Journey to the West into comics, boosting student retention and reading engagement.

✍️

网文 IP 孵化

Web Novel IP

网络小说每日更新章节自动生成 5–10 页连环画，在短视频平台快速引流。

Auto-generate 5–10 comic pages per chapter update, driving traffic on short-video platforms.

🏛️

文博策展

Cultural Exhibitions

将晦涩的历史文献转化为连环长卷，让博物馆展览更加生动易读。

Convert obscure historical texts into scrolling comic murals, bringing museum exhibitions to life.

🌐

品牌出海

Brand Monitoralization

品牌故事快速转化为多风格连环画，支持多语言适配，降低内容本土化成本。

Rapidly convert brand stories into multi-style comics with multilingual adaptation, reducing localization costs.

🎎

文化复兴

Cultural Revival

重建年印量曾达 81 亿册的连环画产业，让"小人书"以数字化形态涅槃重生。

Revive the comic-book industry that once printed 8.1 billion copies annually — reborn in digital form.

项目团队 Team

纵贯线团队

Zongguanxian Team

团队起名"纵贯线"，源自成员来自北京、南京、广州三地，如一条纵贯全国的线。三地协作、远程共享 DGX Spark 设备，以 OpenClaw 多 Agent 辅助开发。

"Zongguanxian" means the line running through the country — a nod to the team spanning Beijing, Nanjing, and Guangzhou. Remote collaboration, shared DGX Spark access, and AI-assisted development via OpenClaw.

🧑‍💼

张小白（张辉）

Zhang Xiaobai

队长 · 项目策划

Team Lead · Project Planning

项目策划与管理、本地模型部署、前端 UI 开发与调试

Project planning & management, local model deployment, frontend UI development

🧪

覃飞雄（飞哥）

Qin Feixiong (Feige)

队员 · 测试与文档

Testing & Documentation

前端 UI 开发、系统测试、技术文档、参赛材料编写

Frontend UI, system testing, technical documentation, competition materials

🏗️

寒晨

队员 · 架构设计

Architecture Lead

系统整体架构、技术路线制定

System architecture, technical roadmap

🤖

Codex

AI 助手 · Gateway 开发

AI Assistant · Gateway Dev

设计和实现后端 API 服务

Designed and implemented backend API services

🎨

Trae

AI 助手 · 前端 UI 开发

AI Assistant · Frontend Dev

设计用户友好的前端界面，提供直观的操作入口

Designed user-friendly frontend interfaces with intuitive model operation flows