Abstract
In this paper, we present a novel approach to image generation using two convolutional neural network (CNN) algorithms, operating in a complementary manner. One CNN is designed for feature extraction from a dataset of images, while the other is responsible for synthesizing a random image based on the extracted features. The resulting image, which is distinct from any image in the dataset, retains the overall characteristics of the original set. To demonstrate the effectiveness of our approach, we employ two distinct datasets: one containing human faces and the other featuring various animal species. Our method outperforms state-of-the-art techniques in terms of accuracy, F1 score, recall, peak signal-to-noise ratio (PSNR), and mean squared error (MSE). Additionally, we provide a comprehensive review of related works in the field of image generation and CNNs, and we thoroughly analyze the advantages and disadvantages of the proposed dual-pipeline CNN method. The results indicate that our approach is a promising alternative for generating high-quality, unique images, with potential applications in various domains, including computer vision, graphics, and data augmentation. Future work will focus on enhancing the efficiency of the method, extending its applicability to other domains, and exploring additional evaluation metrics.
Similar content being viewed by others
Availability of Data and Material
Not applicable.
References
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. Advances in Neural Information Processing Systems. https://doi.org/10.1145/3422622
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of international conference on learning representations.
Dong, X., et al. (2021). Peco: Perceptual codebook for bert pretraining of vision transformers. https://arxiv.org/abs/2111.12710
Szegedy, C., et al. (2015). Going deeper with convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1–9).
Abdulrahman, S. A., & Alhayani, B. (2023). A comprehensive survey on the biometric systems based on physiological and behavioural characteristics. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2021.07.005
Zhang, Y., Han, S., Zhang, Z., Wang, J., & Bi, H. (2022). CF-GAN: Cross-domain feature fusion generative adversarial network for text-to-image synthesis. The Visual Computer. https://doi.org/10.1016/j.matpr.2021.07.005
Sabri, T., & Alhayani, B. (2022). Network page building methodical reviews using involuntary manuscript classification procedures founded on deep learning. In 2022 International conference on electrical, computer, communications and mechatronics engineering (ICECCME), Maldives (pp. 1–8). https://doi.org/10.1109/ICECCME55909.2022.9988457.
AlKawak, O. A., Ozturk, B. A., Jabbar, Z. S., & Mohammed, H. J. (2023). Quantum optics in visual sensors and adaptive optics by quantum vacillations of laser beams wave propagation apply in data mining. Optik, 273, 170396.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings IEEE conference on computer vision and pattern recognition (pp. 770–778).
Bassel, A., Abdulkareem, A. B., Alyasseri, Z. A. A., Sani, N. S., & Mohammed, H. J. (2022). Automatic malignant and benign skin cancer classification using a hybrid deep learning approach. Diagnostics. https://doi.org/10.3390/diagnostics12102472
Hashemi, S., Mohammed, H. J., Kiumarsi, S., Kee, D. M. H., & Anarestani, B. B. (2021). Destinations food image and food neophobia on behavioral intentions: culinary tourist behavior in Malaysia. Journal of International Food and Agribusiness Marketing. https://doi.org/10.1080/08974438.2021.1943101
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of international conference on learning representations.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European conference on computer vision (pp. 694–711).
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of international conference on learning representations.
Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In Proceedings of international conference on learning representations.
Chen, L., Srivastava, S., Duan, Z., & Xu, C. (2017). Deep cross-modal audio-visual generation. In Proceedings of the ACM multimedia thematic workshops (pp. 349–357).
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis using convolutional neural networks. Advances in Neural Information Processing Systems, vol. 28.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4700–4708).
Ramesh, A., Pavlov, M., Goh, G., & Gray, S. (2021). DALL·E: Creating images from text. OpenAI, tech.rep.
Liu, Z., Luo, P., Wang, X., Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of international conference on computer vision (pp. 3730–3738).
Yu, Y., et al. (2021). Unbalanced feature transport for exemplar-based image translation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. https://arxiv.org/abs/1411.1784
Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1125–1134).
Krizhevsky, A., Sutskever, I. &Hinton, G. H. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of advances in neural information processing systems (pp. 1097–1105).
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs.
Ronneberger, O., Fischer, & P., Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of international conference on medical image computing and computer-assisted intervention (pp. 234–241).
Dumoulin, V., et al. (2017). Adversarially learned inference. In Proceedings of International Conference on Learning Representations.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3431–3440). https://doi.org/10.1109/CVPR.2015.7298965.
Miyato, T., Kataoka, T., Koyama, M. & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. In Proceedings of international conference on learning representations.
Howard, G., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861
Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019). Self-attention generative adversarial networks. In Proceedings of international conference on machine learning (pp. 7354–7363).
Chen, T., Lucic, M., Houlsby, N., & Gelly, S. (2019). On self-modulation for generative adversarial networks. In Proceedings of international conference on learning representations.
Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics, 38(4), 1–12.
Masci, J., Meier, U., Cireşan, D., & Schmidhuber, J. (2019). Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of international conference on artificial neural networks (pp. 52–59).
Gatys, L., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2414–2423).
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4401–4410).
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 586–595).
Lotter, W., Kreiman, G., & Cox, D. (2017). Deep predictive coding networks for video prediction and unsupervised learning. In Proceedings of international conference on learning representations.
Theis, L., van den Oord, A., & Bethge, M. (2016). A note on the evaluation of generative models. In Proceedings of international conference on learning representations.
Y. Taigman, A. Polyak, and L. Wolf, "Unsupervised cross-domain image generation," in Proc. International Conference on Learning Representations, 2017.
Huang, X., & Belongie, S. J. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of IEEE international conference on computer vision (pp. 1501–1510).
Chen, Y., Wang, Y., Kao, M., & Chuang, Y. (2018). Deep photo enhancer: Unpaired learning for image enhancement from photographs with GANs. In Proceedings IEEE conference on computer vision and pattern recognition workshops (pp. 630–638).
Zhao, H., Gallo, O., Frosio, I., & Kautz, J. (2017). Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging, 3(1), 47–57.
Dosovitskiy, A., & Brox, T. (2016). Generating images with perceptual similarity metrics based on deep networks. In Proceedings of advances in neural information processing systems (pp. 658–666).
Reed, S., et al. (2016). Generative adversarial text to image synthesis. In Proceedings of international conference on machine learning (pp. 1060–1069).
Bojanowski, P., Joulin, A., Lopez-Paz, D., & Szlam A. (2018). Optimizing the latent space of generative networks. In Proceedings of international conference on machine learning (pp. 619–628).
Acknowledgements
Not applicant.
Funding
This work received no specific funding.
Author information
Authors and Affiliations
Contributions
The authors contributed significantly to the research and this paper, and the first author is the main contributor.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Informed Consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al-Obaidi, H.S.H., Kurnaz, S. Divergent CNN Architectures for Novel Image Generation: A Dual-Pipeline Approach. Wireless Pers Commun (2023). https://doi.org/10.1007/s11277-023-10758-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s11277-023-10758-w