CAD programs are a popular way to compactly encode shapes as a sequence of operations
that are easy to parametrically modify. However, without sufficient semantic comments
and structure, such programs can be challenging to understand, let alone modify.
We introduce the problem of semantic commenting CAD programs, wherein the goal is to
segment the input program into code blocks corresponding to semantically meaningful
shape parts and assign a semantic label to each block.
We solve the problem by combining program parsing with visual-semantic analysis
afforded by recent advances in foundational language and vision models.
Specifically, by executing the input programs, we create shapes, which we use to
generate conditional photorealistic images to make use of semantic annotators for
such images. We then distill the information across the images and link back to
the original programs to semantically comment on them.
Additionally, we collected and annotated a benchmark dataset, CADTalk,
consisting of 5,280 machine-made programs and 45 human-made programs with ground truth
semantic comments to foster future research.
We extensively evaluated our approach, compared to a GPT-based baseline approach,
and an open-set shape segmentation baseline, i.e., PartSLIP, and report an 83.24% accuracy
on the new CADTalk dataset.