StitchDiffusion

Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models

¹University College London, ²Meta Reality Labs

WACV 2024

Abstract

Personalized text-to-image (T2I) synthesis based on diffusion models has attracted significant attention in recent research. However, existing methods primarily concentrate on customizing subjects or styles, neglecting the exploration of global geometry. In this study, we propose an approach that focuses on the customization of 360-degree panoramas, which inherently possess global geometric properties, using a T2I diffusion model. To achieve this, we curate a paired image-text dataset specifically designed for the task and subsequently employ it to fine-tune a pre-trained T2I diffusion model with LoRA. Nevertheless, the fine-tuned model alone does not ensure the continuity between the leftmost and rightmost sides of the synthesized images, a crucial characteristic of 360-degree panoramas. To address this issue, we propose a method called StitchDiffusion. Specifically, we perform pre-denoising operations twice at each time step of the denoising process on the stitch block consisting of the leftmost and rightmost image regions. Furthermore, a global cropping is adopted to synthesize seamless 360-degree panoramas. Experimental results demonstrate the effectiveness of our customized model combined with the proposed StitchDiffusion in generating high-quality 360-degree panoramic images. Moreover, our customized model exhibits exceptional generalization ability in producing scenes unseen in the fine-tuning dataset.

Synthesized 360-degree Panoramas with Given Text Prompts

To easily recognize the continuity or discontinuity between the leftmost and rightmost sides of the generated image, we copy the leftmost area indicated by the red dashed box and paste it onto the rightmost side of the image.

360-degree panoramic image, futuristic hyper-realistic environment, Epic concept art

360-degree panoramic image, cyberpunk city in a foggy night, science fiction industrial hard science concept art

360-degree panoramic image, japanese anime style downtown city street, afternoon

Method

Overview of our proposed StitchDiffusion for generating 360-degree panoramas. (a) At each time step \(t\) of the denoising process, the \(H\times{W}\) stitch block undergoes pre-denoising operations twice, which is constituted by the leftmost (\(H\times{\frac{W}{2}}\)) and rightmost (\(H\times{\frac{W}{2}}\)) regions of the image \(J_t\). Here, the value of \(W\) is twice that of \(H\). (b) The global cropping denoted by the blue dashed box of the final clear result \(J_0\) is taken to achieve the 360-degree panorama \(J_{syn}\). Note that if image \(J_0\) (\(H\times4H\)) is divided horizontally along the middle into two equal parts, then the left half (\(H\times2H\)) of \(J_0\) is identical to the right half (\(H\times2H\)) of \(J_0\), which could ensure the continuity between the leftmost and rightmost sides of the \(J_{syn}\) obtained from global cropping.

Visual Comparison

Text Prompt: 360-degree panoramic image, steampunk architecture, futuristic

Our Method

MultiDiffusion+Latent Diffusion

Text2Light

BibTeX

@inproceedings{wang2024customizing, title={Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models}, author={Wang, Hai and Xiang, Xiaoyu and Fan, Yuchen and Xue, Jing-Hao}, booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision}, pages={4933--4943}, year={2024} }