Abstract
We introduce BrickGPT, the first approach for generating physically stable toy brick models from text prompts. To achieve this, we construct a large-scale, physically stable dataset of brick designs, along with their associated captions, and train an autoregressive large language model to predict the next brick to add via next-token prediction. To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints. Our experiments show that BrickGPT produces stable, diverse, and aesthetically pleasing brick designs that align closely with the input text prompts. We also develop a text-based brick texturing method to generate colored and textured designs. We show that our designs can be assembled manually by humans and automatically by robotic arms. We also release our new dataset, StableText2Brick, containing over 47,000 brick structures of over 28,000 unique 3D objects accompanied by detailed captions, along with our code and models.
StableText2Brick dataset

BrickGPT pipeline

Step-by-step generation of brick structures from text
Automated assembly of generated brick structures using robots (8x speed)
Generated brick structures assembled by humans
