VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation

VQ-SGen: A Vector Quantized Stroke Representation for
Creative Sketch Generation

Jiawei Wang^1,2, Zhiming Cui², Changjian Li¹

¹University of Edinburgh ²ShanghaiTech University

ICCV 2025

Fig. 1. The goal of creative sketch generation is to generate vivid sketches, e.g., the birds from existing methods on the left. We have proposed VQ-SGen, for high-quality sketch generation with a new vector-quantized (VQ) representation and an efficient generator. See one of our results on the right for comparison.

Abstract

This paper presents VQ-SGen, a novel algorithm for high-quality creative sketch generation. Recent approaches have framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes. To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation. Our method follows a two-stage framework - in stage one, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In stage two, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process. By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as text or class label conditioned generation and sketch completion. Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset, underscoring its effectiveness.

Paper [ArXiv]
Code and Data [GitHub]
Citation:

@article{wang2025vq,
    title={VQ-SGen: A Vector Quantized Stroke Representation for 
        Creative Sketch Generation},
    author={Wang, Jiawei and Cui, Zhiming and Li, Changjian},
    journal={International Conference on Computer Vision (ICCV)},
    year={2025}
}

Algorithm

Fig. 2. Overview of VQ-SGen. Given an input sketch, it is first divided into a sequence of strokes. In the first stage, we begin by decoupling the shape and location information of each stroke and obtain their discrete representations (Sec. 3.1). In the second stage, we use a decoder-only Gen-Transformer to predict the stroke image, label, and position in an autoregressive manner (Sec. 3.2).

Results

Result Gallery

Fig. 3. Visual comparison against competitors on both the Creative Birds and Creative Creatures datasets.

Applications

Fig. 4. Applications of our approach. (a) Given the class label, our method can produce corresponding sketches with rich variations. (b) Our method supports text-conditioned generation, the resulting sketches explain the input text vividly. (c) Given the initial stroke, our method can complete the whole sketch, which is favorable against competitors.

Coodbook Visualization

Fig. 5. UMAP visualization of the shape codes, with the overlaid strokes indicating the semantic aware clustering in the discrete code space.