VQ-SGen: A Vector Quantized Stroke Representation for
Creative Sketch Generation
Jiawei Wang1,2,   Zhiming Cui2,   Changjian Li1
1University of Edinburgh    2ShanghaiTech University
arXiv 2025
Paper teaser
Fig. 1. The goal of creative sketch generation is to generate vivid sketches, e.g., the birds from existing methods on the left. We have proposed VQ-SGen, for high-quality sketch generation with a new vector-quantized (VQ) representation and an efficient generator. See one of our results on the right for comparison.
Abstract
This paper presents VQ-SGen, a novel algorithm for high-quality creative sketch generation. Recent approaches have framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes. To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation. Our method follows a two-stage framework - in stage one, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In stage two, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process. By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as text or class label conditioned generation and sketch completion. %\sout{and semantic-aware stroke editing.} Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset, underscoring its effectiveness.
  Paper [ArXiv]
Code and Data [GitHub [Coming soon]]
Citation:
@article{wang2025vq,
    title={VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation},
    author={Wang, Jiawei and Cui, Zhiming and Li, Changjian},
    journal={arXiv preprint arXiv:2411.16446},
    year={2025}
}
                                    

Algorithm
Fig. 2. Overview of VQ-SGen. Given an input sketch, it is first divided into a sequence of strokes. In the first stage, we begin by decoupling the shape and location information of each stroke and obtain their discrete representations (Sec. 3.1). In the second stage, we use a decoder-only Gen-Transformer to predict the stroke image, label, and position in an autoregressive manner (Sec. 3.2).
Results
Result Gallery
Fig. 3. Visual comparison against competitors on both the Creative Birds and Creative Creatures datasets.
Applications
Fig. 4. Applications of our approach. (a) Given the class label, our method can produce corresponding sketches with rich variations. (b) Our method supports text-conditioned generation, the resulting sketches explain the input text vividly. (c) Given the initial stroke, our method can complete the whole sketch, which is favorable against competitors.
Coodbook Visualization
Fig. 5. UMAP visualization of the shape codes, with the overlaid strokes indicating the semantic aware clustering in the discrete code space.

 
©Changjian Li. Last update: March, 2025.