GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Jiawei Lu1,2#, Yingpeng Zhang2#*, Zengjun Zhao2, He Wang3, Kun Zhou1, Tianjia Shao1*
#Equal contribution
*Corresponding author


1State Key Lab of CAD&CG, Zhejiang University
2Tencent IEG
3AI Centre, Computer Science, University College London

AAAI 2025
Texturing results

Texturing results with GenesisTex2

Abstract

Large-scale text-guided image diffusion models have demonstrated remarkable results in text-to-image (T2I) generation. However, applying these models to synthesize textures for 3D geometries remains challenging due to the domain gap between 2D images and textures on a 3D surface. Early works that used a projecting-inpainting approach managed to preserve generation diversity, but often resulted in noticeable artifacts and style inconsistencies. While recent methods have attempted to address these inconsistencies, they often introduce other issues, such as blurring, over-saturation, or over-smoothing. To overcome these challenges, we propose a novel text-to-texture synthesis framework that takes advantage of pre-trained diffusion models. We introduce a local attention reweighing mechanism in the self-attention layers to guide the model in focusing on spatial-correlated patches across different views, thereby enhancing local details while preserving cross-view consistency. Additionally, we propose a novel latent space merge pipeline, which further ensures consistency across different viewpoints without sacrificing too much diversity. Our method significantly outperforms existing state-of-the-art techniques in terms of texture consistency and visual quality, while delivering results much faster than distillation-based methods. Importantly, our framework does not require additional training or fine-tuning, making it highly adaptable to a wide range of models available on public platforms.

Video

How does it work?

Texturing Pipeline

Given a mesh and a textual prompt, we aim to produce textures that well depict the prompt and suit the shape. To achieve this, we propose a local attention technique which enhances local details by reweighing the original self-attention layers based on the 3D shape. In addition, we introduce a framework for consistent texture synthesis, enabling the stable generation of consistent and high-quality textures.

Check out the paper to learn more 🤓