CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs

Abstract

CAD programs are a popular way to compactly encode shapes as a sequence of operations that are easy to parametrically modify. However, without sufficient semantic comments and structure, such programs can be challenging to understand, let alone modify. We introduce the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block. We solve the problem by combining program parsing with visual-semantic analysis afforded by recent advances in foundational language and vision models. Specifically, by executing the input programs, we create shapes, which we use to generate conditional photorealistic images to make use of semantic annotators for such images. We then distill the information across the images and link back to the original programs to semantically comment on them. Additionally, we collected and annotated a benchmark dataset, CADTalk, consisting of 5,280 machine-made programs and 45 human-made programs with ground truth semantic comments to foster future research. We extensively evaluated our approach, compared to a GPT-based baseline approach, and an open-set shape segmentation baseline, i.e., PartSLIP, and report an 83.24% accuracy on the new CADTalk dataset.

Video (Trailer)

Video (Long Presentation)

Method

Figure 1. Overview. We first parse the input program to identify commentable code blocks, marked with TBC (a). We then execute the program and render the resulting shape under several viewpoints to obtain multiview depth maps, which we convert into realistic images using image-to-image translation (b). In addition, we obtain a list of part names of the shape from ChatGPT. We use these labels to segment semantic parts in the images using computer vision foundation models (c). Finally, we aggregate this semantic information across views by linking it to code blocks that correspond to the segmented parts (d).

Image Generation and Segmentation

Image Processing. Given shapes (left) after executing programs, we use ControlNet to convert rendered depth maps into realistic images (middle), which form a valid input for detection and segmentation models trained on photographs (right).

Program Parsing

Program parsing. Irreducible blocks are basic-level geometric primitives and their direct compositions (a), while commentable blocks are code blocks of different compositional levels that correspond to semantic comments (b). The downward traversal of the syntax tree is used to identify irreducible blocks (c) and the upward traversal to collect commentable blocks (d). Exemplar masks of commentable blocks are shown in (c) and (d) rendered in red color.

Result Gallery

CADTalk Dataset and Commenting Results. Example shapes from CADTalk (left) along with ground-truth (right) and predicted comments (far right). In these examples, our prediction matches the ground truth, except for the Moai sculpture where CADTalker labeled the 'head' code block as 'body'. Machine-made shapes are rendered with dark blue and placed behind the human-made shapes rendered with light blue.

More Supplementary Files:

1) Example Abstract Syntax Tree: AST.pdf
2) Detailed ChatGTP Conversation: GPT-Converstation.pdf
3) More Commenting Results: Commenting-Results.pdf

BibTeX

@inproceedings{yuan2024cadtalk,
            title={CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs},
            author={Yuan, Haocheng and Xu, Jing and Pan, Hao and Bousseau, Adrien and Mitra, Niloy J and Li, Changjian},
            booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
            pages={3753--3762},
            year={2024}
          }