MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors

1School of Automotive Studies, Tongji University
2Mach Drive

*Project leader Corresponding author
Figure 8

MultiEditor demonstrates high controllability and flexible editing of complex-shaped vehicles. A roller vehicle is inserted into the scene at 45° intervals, showcasing consistent and precise object editing performance.

Abstract

Autonomous driving systems rely heavily on multimodal perception data to understand complex environments. However, the long-tailed distribution of real-world data hinders generalization, especially for rare but safety-critical vehicle categories. To address this challenge, we propose MultiEditor, a dual-branch latent diffusion framework designed to edit images and LiDAR point clouds in driving scenarios jointly. At the core of our approach is introducing 3D Gaussian Splatting (3DGS) as a structural and appearance prior for target objects. Leveraging this prior, we design a multi-level appearance control mechanism—comprising pixel-level pasting, semantic-level guidance, and multi-branch refinement—to achieve high-fidelity reconstruction across modalities. We further propose a depth-guided deformable cross-modality condition module that adaptively enables mutual guidance between modalities using 3DGS-rendered depth, significantly enhancing cross-modality consistency. Extensive experiments demonstrate that MultiEditor achieves superior performance in visual and geometric fidelity, editing controllability, and cross-modality consistency. Furthermore, generating rare-category vehicle data with MultiEditor substantially enhances the detection accuracy of perception models on underrepresented classes.

The Proposed Method

Qualitative Visualization

Downstream Task Benefits

BibTeX

@misc{lu2025multieditorcontrollablemultimodalobject,
  title={MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors},
  author={Shouyi Lu and Zihan Lin and Chao Lu and Huanran Wang and Guirong Zhuo and Lianqing Zheng},
  year={2025},
  eprint={2507.21872},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2507.21872}
}