Rec2Real: Semantics-Guided Photo-Realistic Image Synthesis Using Rough Urban Reconstruction Models

Miao, Hui; Lu, Feixiang; Xu, Tiancheng; Zhang, Liangjun; Zhou, Bin

doi:10.1007/978-3-031-23473-6_29

Hui Miao¹⁴,
Feixiang Lu¹⁵,
Tiancheng Xu¹⁴,
Liangjun Zhang¹⁵ &
…
Bin Zhou^14,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13443))

Included in the following conference series:

Computer Graphics International Conference

1239 Accesses

Abstract

We present a novel and effective photo-realistic image generation pipeline using the rough 3D reconstruction models from the Google Earth 3D map. Our goal is to transfer the images (rendered from the 3D models) from the reconstruction style (rec-style) to the photo-realistic style (real-style). To achieve this, we propose a bidirectional transferring approach that takes semantics as guidance. Specifically, we first design an unpaired patch-to-patch image translation method to transfer the images from real-style to rec-style, which can generate paired training data and introduce supervised information. Then, we fine-tune an auto-encoder network to transfer the images from rec-style to real-style. Our approach can generate arbitrary camera-view images with ground-truth annotations automatically, which can be used in AD perception tasks such as 2D detection and instance segmentation. Experiments show the effectiveness of our approach, which can generate diverse and photo-realistic images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Chen, D., Yuan, L., Liao, J., Yu, N., Hua, G.: StyleBank: an explicit representation for neural image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1897–1906 (2017)
Google Scholar
Chen, Y., et al.: GeoSim: realistic video simulation via geometry-aware composition for self-driving. In: CVPR, pp. 7230–7240 (2021)
Google Scholar
Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: IEEE International Conference on Computer Vision, vol. 3, pp. 1403–1403. IEEE Computer Society (2003)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Google Scholar
Fuhrmann, S., Langguth, F., Moehrle, N., Waechter, M., Goesele, M.: MVE-an image-based reconstruction environment. Comput. Graph 53, 44–53 (2015)
Article Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Guo, H., Sheng, B., Li, P., Chen, C.P.: Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybernet. 51(5), 2735–2747 (2019)
Article Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2003)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Google Scholar
Liao, Y., Xie, J., Geiger, A.: Kitti-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. arXiv preprint arXiv:2109.13410 (2021)
Liu, Z., et al.: 3D part guided image editing for fine-grained object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11336–11345 (2020)
Google Scholar
Lu, F., et al.: PerMO: perceiving more at once from a single image for autonomous driving. arXiv preprint arXiv:2007.08116 (2020)
Lu, P., Zhu, F., Li, P., Kim, J., Sheng, B., Mao, L.: Hierarchical rendering system based on viewpoint prediction in virtual reality. In: Magnenat-Thalmann, N., et al. (eds.) CGI 2020. LNCS, vol. 12221, pp. 24–32. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61864-3_3
Chapter Google Scholar
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. In: CVPR, pp. 11167–11176 (2020)
Google Scholar
Miao, H., Lu, F., Liu, Z., Zhang, L., Manocha, D., Zhou, B.: Robust 2D/3D vehicle parsing in arbitrary camera views for CVIS. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15631–15640 (2021)
Google Scholar
Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: ICCV, pp. 3248–3255 (2013)
Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2,: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)
Article Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. Adv. Neural. Inf. Process. Syst. 33, 7198–7211 (2020)
Google Scholar
Rong, G., et al.: LGSVL simulator: a high fidelity simulator for autonomous driving. arXiv preprint arXiv:2005.03778 (2020)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Chapter Google Scholar
Song, X. ,et al.: Apollocar3d: a large 3D car instance understanding benchmark for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5452–5462 (2019)
Google Scholar
Ullman, S.: The interpretation of structure from motion. Proc. R. Soc. Lond. Ser. B Biol. Sci. 203(1153), 405–426 (1979)
Google Scholar
Van, N.D., Sualeh, M., Kim, D., Kim, G.W.: A hierarchical control system for autonomous driving towards urban challenges. Appl. Sci. 10(10) (2020)
Google Scholar
Yang, Z., et al.: SurfelGan: synthesizing realistic sensor data for autonomous driving. In: CVPR, pp. 11118–11127 (2020)
Google Scholar
Zhang, B., Sheng, B., Li, P., Lee, T.Y.: Depth of field rendering using multilayer-neighborhood optimization. IEEE Trans. Visual Comput. Graphics 26(8), 2546–2559 (2019)
Article Google Scholar
Zhang, Y.: LILO: a novel lidar-IMU SLAM system with loop optimization. IEEE Trans. Aerosp. Electr. Syst. 58 (2021)
Google Scholar
Zhu, F., Lu, P., Li, P., Sheng, B., Mao, L.: Gaze-contingent rendering in virtual reality. In: 37th Computer Graphics International Conference on Advances in Computer Graphics CGI 2020, pp. 16–23 (2020)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgement

We thank reviewers for their comments to improve the paper. And we thank Yuexin Ma for her suggestions for this paper. This work was supported in part by National Key Research and Development Program of China (2019YFF0302902), and National Natural Science Foundation of China (61932003).

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Hui Miao, Tiancheng Xu & Bin Zhou
Robotics and Autonomous Driving Laboratory, Baidu Research, Nashville, USA
Feixiang Lu & Liangjun Zhang
Peng Cheng Laboratory, Shenzhen, China
Bin Zhou

Authors

Hui Miao
View author publications
You can also search for this author in PubMed Google Scholar
Feixiang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Tiancheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Liangjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Zhou .

Editor information

Editors and Affiliations

University of Geneva, Geneva, Switzerland
Nadia Magnenat-Thalmann
Bournemouth University, Poole, UK
Jian Zhang
University of Sydney, Sydney, NSW, Australia
Jinman Kim
University of Crete, Heraklion, Greece
George Papagiannakis
Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
Swiss Federal Institute of Technology, Lausanne, Switzerland
Daniel Thalmann
University of Calgary, Calgary, AB, Canada
Marina Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miao, H., Lu, F., Xu, T., Zhang, L., Zhou, B. (2022). Rec2Real: Semantics-Guided Photo-Realistic Image Synthesis Using Rough Urban Reconstruction Models. In: Magnenat-Thalmann, N., et al. Advances in Computer Graphics. CGI 2022. Lecture Notes in Computer Science, vol 13443. Springer, Cham. https://doi.org/10.1007/978-3-031-23473-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-23473-6_29
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23472-9
Online ISBN: 978-3-031-23473-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics