Abstract
We present a novel and effective photo-realistic image generation pipeline using the rough 3D reconstruction models from the Google Earth 3D map. Our goal is to transfer the images (rendered from the 3D models) from the reconstruction style (rec-style) to the photo-realistic style (real-style). To achieve this, we propose a bidirectional transferring approach that takes semantics as guidance. Specifically, we first design an unpaired patch-to-patch image translation method to transfer the images from real-style to rec-style, which can generate paired training data and introduce supervised information. Then, we fine-tune an auto-encoder network to transfer the images from rec-style to real-style. Our approach can generate arbitrary camera-view images with ground-truth annotations automatically, which can be used in AD perception tasks such as 2D detection and instance segmentation. Experiments show the effectiveness of our approach, which can generate diverse and photo-realistic images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Chen, D., Yuan, L., Liao, J., Yu, N., Hua, G.: StyleBank: an explicit representation for neural image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1897–1906 (2017)
Chen, Y., et al.: GeoSim: realistic video simulation via geometry-aware composition for self-driving. In: CVPR, pp. 7230–7240 (2021)
Davison, A.J.: Real-time simultaneous localisation and mapping with a single camera. In: IEEE International Conference on Computer Vision, vol. 3, pp. 1403–1403. IEEE Computer Society (2003)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)
Fuhrmann, S., Langguth, F., Moehrle, N., Waechter, M., Goesele, M.: MVE-an image-based reconstruction environment. Comput. Graph 53, 44–53 (2015)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Guo, H., Sheng, B., Li, P., Chen, C.P.: Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybernet. 51(5), 2735–2747 (2019)
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2003)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Liao, Y., Xie, J., Geiger, A.: Kitti-360: a novel dataset and benchmarks for urban scene understanding in 2D and 3D. arXiv preprint arXiv:2109.13410 (2021)
Liu, Z., et al.: 3D part guided image editing for fine-grained object understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11336–11345 (2020)
Lu, F., et al.: PerMO: perceiving more at once from a single image for autonomous driving. arXiv preprint arXiv:2007.08116 (2020)
Lu, P., Zhu, F., Li, P., Kim, J., Sheng, B., Mao, L.: Hierarchical rendering system based on viewpoint prediction in virtual reality. In: Magnenat-Thalmann, N., et al. (eds.) CGI 2020. LNCS, vol. 12221, pp. 24–32. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61864-3_3
Manivasagam, S., et al.: LiDARsim: realistic lidar simulation by leveraging the real world. In: CVPR, pp. 11167–11176 (2020)
Miao, H., Lu, F., Liu, Z., Zhang, L., Manocha, D., Zhou, B.: Robust 2D/3D vehicle parsing in arbitrary camera views for CVIS. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15631–15640 (2021)
Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: ICCV, pp. 3248–3255 (2013)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2,: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2017)
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Park, T., Zhu, J.Y., Wang, O., Lu, J., Shechtman, E., Efros, A., Zhang, R.: Swapping autoencoder for deep image manipulation. Adv. Neural. Inf. Process. Syst. 33, 7198–7211 (2020)
Rong, G., et al.: LGSVL simulator: a high fidelity simulator for autonomous driving. arXiv preprint arXiv:2005.03778 (2020)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Song, X. ,et al.: Apollocar3d: a large 3D car instance understanding benchmark for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5452–5462 (2019)
Ullman, S.: The interpretation of structure from motion. Proc. R. Soc. Lond. Ser. B Biol. Sci. 203(1153), 405–426 (1979)
Van, N.D., Sualeh, M., Kim, D., Kim, G.W.: A hierarchical control system for autonomous driving towards urban challenges. Appl. Sci. 10(10) (2020)
Yang, Z., et al.: SurfelGan: synthesizing realistic sensor data for autonomous driving. In: CVPR, pp. 11118–11127 (2020)
Zhang, B., Sheng, B., Li, P., Lee, T.Y.: Depth of field rendering using multilayer-neighborhood optimization. IEEE Trans. Visual Comput. Graphics 26(8), 2546–2559 (2019)
Zhang, Y.: LILO: a novel lidar-IMU SLAM system with loop optimization. IEEE Trans. Aerosp. Electr. Syst. 58 (2021)
Zhu, F., Lu, P., Li, P., Sheng, B., Mao, L.: Gaze-contingent rendering in virtual reality. In: 37th Computer Graphics International Conference on Advances in Computer Graphics CGI 2020, pp. 16–23 (2020)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Acknowledgement
We thank reviewers for their comments to improve the paper. And we thank Yuexin Ma for her suggestions for this paper. This work was supported in part by National Key Research and Development Program of China (2019YFF0302902), and National Natural Science Foundation of China (61932003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Miao, H., Lu, F., Xu, T., Zhang, L., Zhou, B. (2022). Rec2Real: Semantics-Guided Photo-Realistic Image Synthesis Using Rough Urban Reconstruction Models. In: Magnenat-Thalmann, N., et al. Advances in Computer Graphics. CGI 2022. Lecture Notes in Computer Science, vol 13443. Springer, Cham. https://doi.org/10.1007/978-3-031-23473-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-23473-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23472-9
Online ISBN: 978-3-031-23473-6
eBook Packages: Computer ScienceComputer Science (R0)