VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation |
Jiawei Wang1,2,   Zhiming Cui2,   Changjian Li1 |
1University of Edinburgh    2ShanghaiTech University |
arXiv 2025 |
 |
Fig. 1. The goal of creative sketch generation is to generate vivid sketches,
e.g., the birds from existing methods on the left. We have proposed VQ-SGen, for high-quality
sketch generation with a new vector-quantized (VQ) representation and an efficient generator.
See one of our results on the right for comparison.
|
Abstract |
This paper presents VQ-SGen, a novel algorithm for high-quality creative sketch generation.
Recent approaches have framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes.
To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation.
Our method follows a two-stage framework - in stage one, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In stage two, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process.
By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as text or class label conditioned generation and sketch completion. %\sout{and semantic-aware stroke editing.}
Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset, underscoring its effectiveness.
|
 |
|
Paper [ArXiv]
Code and Data [GitHub [Coming soon]]
Citation:
@article{wang2025vq,
title={VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation},
author={Wang, Jiawei and Cui, Zhiming and Li, Changjian},
journal={arXiv preprint arXiv:2411.16446},
year={2025}
}
|
|
|
Algorithm |
 |
Fig. 2. Overview of VQ-SGen. Given an input sketch, it is first divided into a sequence of strokes.
In the first stage, we begin by decoupling the shape and location information of each stroke and obtain their discrete representations (Sec. 3.1).
In the second stage, we use a decoder-only Gen-Transformer to predict the stroke image, label, and position in an autoregressive manner (Sec. 3.2).
|
Results |
Result Gallery |
 |
Fig. 3. Visual comparison against competitors on both the Creative Birds and Creative Creatures datasets.
|
Applications |
 |
Fig. 4. Applications of our approach.
(a) Given the class label, our method can produce corresponding sketches with rich variations.
(b) Our method supports text-conditioned generation, the resulting sketches explain the input text vividly.
(c) Given the initial stroke, our method can complete the whole sketch, which is favorable against competitors.
|
Coodbook Visualization |
 |
Fig. 5. UMAP visualization of the shape codes, with the overlaid strokes indicating the semantic aware clustering in the discrete code space.
|
|
|
©Changjian Li. Last update: March, 2025. |