Search | arXiv e-print repository

Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models

Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio… ▽ More Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data. △ Less

Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.00196 [pdf, other]

Synthetic Image Verification in the Era of Generative AI: What Works and What Isn't There Yet

Authors: Diangarti Tariang, Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva

Abstract: In this work we present an overview of approaches for the detection and attribution of synthetic images and highlight their strengths and weaknesses. We also point out and discuss hot topics in this field and outline promising directions for future research. In this work we present an overview of approaches for the detection and attribution of synthetic images and highlight their strengths and weaknesses. We also point out and discuss hot topics in this field and outline promising directions for future research. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2312.00195 [pdf, other]

Raising the Bar of AI-generated Image Detection with CLIP

Authors: Davide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva

Abstract: The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset… ▽ More The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios. We find that, contrary to previous beliefs, it is neither necessary nor convenient to use a large domain-specific dataset for training. On the contrary, by using only a handful of example images from a single generative model, a CLIP-based detector exhibits surprising generalization ability and high robustness across different architectures, including recent commercial tools such as Dalle-3, Midjourney v5, and Firefly. We match the state-of-the-art (SoTA) on in-distribution data and significantly improve upon it in terms of generalization to out-of-distribution data (+6% AUC) and robustness to impaired/laundered data (+13%). Our project is available at https://grip-unina.github.io/ClipBased-SyntheticImageDetection/ △ Less

Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

arXiv:2309.12428 [pdf, other]

Synthetic Image Detection: Highlights from the IEEE Video and Image Processing Cup 2022 Student Competition

Authors: Davide Cozzolino, Koki Nagano, Lucas Thomaz, Angshul Majumdar, Luisa Verdoliva

Abstract: The Video and Image Processing (VIP) Cup is a student competition that takes place each year at the IEEE International Conference on Image Processing. The 2022 IEEE VIP Cup asked undergraduate students to develop a system capable of distinguishing pristine images from generated ones. The interest in this topic stems from the incredible advances in the AI-based generation of visual data, with tools… ▽ More The Video and Image Processing (VIP) Cup is a student competition that takes place each year at the IEEE International Conference on Image Processing. The 2022 IEEE VIP Cup asked undergraduate students to develop a system capable of distinguishing pristine images from generated ones. The interest in this topic stems from the incredible advances in the AI-based generation of visual data, with tools that allows the synthesis of highly realistic images and videos. While this opens up a large number of new opportunities, it also undermines the trustworthiness of media content and fosters the spread of disinformation on the internet. Recently there was strong concern about the generation of extremely realistic images by means of editing software that includes the recent technology on diffusion models. In this context, there is a need to develop robust and automatic tools for synthetic image detection. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.07973 [pdf, other]

M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations

Authors: Giada Zingarini, Davide Cozzolino, Riccardo Corvi, Giovanni Poggi, Luisa Verdoliva

Abstract: The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack… ▽ More The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack of large and curated datasets to use for development and benchmarking purposes. Here, we investigate this issue and propose M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability. Dataset and code are publicly available at https://grip-unina.github.io/M3Dsynth/. △ Less

Submitted 1 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2304.06408 [pdf, other]

Intriguing properties of synthetic images: from generative adversarial networks to diffusion models

Authors: Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, Luisa Verdoliva

Abstract: Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake… ▽ More Detecting fake images is becoming a major goal of computer vision. This need is becoming more and more pressing with the continuous improvement of synthesis methods based on Generative Adversarial Networks (GAN), and even more with the appearance of powerful methods based on Diffusion Models (DM). Towards this end, it is important to gain insight into which image features better discriminate fake images from real ones. In this paper we report on our systematic study of a large number of image generators of different families, aimed at discovering the most forensically relevant characteristics of real and generated images. Our experiments provide a number of interesting observations and shed light on some intriguing properties of synthetic images: (1) not only the GAN models but also the DM and VQ-GAN (Vector Quantized Generative Adversarial Networks) models give rise to visible artifacts in the Fourier domain and exhibit anomalous regular patterns in the autocorrelation; (2) when the dataset used to train the model lacks sufficient variety, its biases can be transferred to the generated images; (3) synthetic and real images exhibit significant differences in the mid-high frequency signal content, observable in their radial and angular spectral power distributions. △ Less

Submitted 29 June, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2212.10957 [pdf, other]

TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization

Authors: Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, Luisa Verdoliva

Abstract: In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter lea… ▽ More In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter learns to embed the artifacts related to the camera internal and external processing by training only on real data in a self-supervised manner. Forgeries are detected as deviations from the expected regular pattern that characterizes each pristine image. Looking for anomalies makes the approach able to robustly detect a variety of local manipulations, ensuring generalization. In addition to a pixel-level localization map and a whole-image integrity score, our approach outputs a reliability map that highlights areas where localization predictions may be error-prone. This is particularly important in forensic applications in order to reduce false alarms and allow for a large scale analysis. Extensive experiments on several datasets show that our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works. Code is publicly available at https://grip-unina.github.io/TruFor/ △ Less

Submitted 25 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2211.00680 [pdf, other]

On the detection of synthetic images generated by diffusion models

Authors: Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, Luisa Verdoliva

Abstract: Over the past decade, there has been tremendous progress in creating synthetic media, mainly thanks to the development of powerful methods based on generative adversarial networks (GAN). Very recently, methods based on diffusion models (DM) have been gaining the spotlight. In addition to providing an impressive level of photorealism, they enable the creation of text-based visual content, opening u… ▽ More Over the past decade, there has been tremendous progress in creating synthetic media, mainly thanks to the development of powerful methods based on generative adversarial networks (GAN). Very recently, methods based on diffusion models (DM) have been gaining the spotlight. In addition to providing an impressive level of photorealism, they enable the creation of text-based visual content, opening up new and exciting opportunities in many different application fields, from arts to video games. On the other hand, this property is an additional asset in the hands of malicious users, who can generate and distribute fake media perfectly adapted to their attacks, posing new challenges to the media forensic community. With this work, we seek to understand how difficult it is to distinguish synthetic images generated by diffusion models from pristine ones and whether current state-of-the-art detectors are suitable for the task. To this end, first we expose the forensics traces left by diffusion models, then study how current detectors, developed for GAN-generated images, perform on these new synthetic images, especially in challenging social-networks scenarios involving image compression and resizing. Datasets and code are available at github.com/grip-unina/DMimageDetection. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.02227 [pdf, other]

Comprint: Image Forgery Detection and Localization using Compression Fingerprints

Authors: Hannes Mareen, Dante Vanden Bussche, Fabrizio Guillaro, Davide Cozzolino, Glenn Van Wallendael, Peter Lambert, Luisa Verdoliva

Abstract: Manipulation tools that realistically edit images are widely available, making it easy for anyone to create and spread misinformation. In an attempt to fight fake news, forgery detection and localization methods were designed. However, existing methods struggle to accurately reveal manipulations found in images on the internet, i.e., in the wild. That is because the type of forgery is typically un… ▽ More Manipulation tools that realistically edit images are widely available, making it easy for anyone to create and spread misinformation. In an attempt to fight fake news, forgery detection and localization methods were designed. However, existing methods struggle to accurately reveal manipulations found in images on the internet, i.e., in the wild. That is because the type of forgery is typically unknown, in addition to the tampering traces being damaged by recompression. This paper presents Comprint, a novel forgery detection and localization method based on the compression fingerprint or comprint. It is trained on pristine data only, providing generalization to detect different types of manipulation. Additionally, we propose a fusion of Comprint with the state-of-the-art Noiseprint, which utilizes a complementary camera model fingerprint. We carry out an extensive experimental analysis and demonstrate that Comprint has a high level of accuracy on five evaluation datasets that represent a wide range of manipulation types, mimicking in-the-wild circumstances. Most notably, the proposed fusion significantly outperforms state-of-the-art reference methods. As such, Comprint and the fusion Comprint+Noiseprint represent a promising forensics tool to analyze in-the-wild tampered images. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Presented at the Workshop on MultiMedia FORensics in the WILD 2022, held in conjunction with the International Conference on Pattern Recognition (ICPR) 2022

arXiv:2209.14098 [pdf, other]

Deepfake audio detection by speaker verification

Authors: Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately… ▽ More Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes them unfit to face real-world scenarios. In this work, we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2204.03083 [pdf, other]

Audio-Visual Person-of-Interest DeepFake Detection

Authors: Davide Cozzolino, Alessandro Pianese, Matthias Nießner, Luisa Verdoliva

Abstract: Face manipulation technology is advancing very rapidly, and new methods are being proposed day by day. The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. Our key insight is that each person has specific characteristics that a synthetic generator likely cannot reproduce. Accordingly, we extr… ▽ More Face manipulation technology is advancing very rapidly, and new methods are being proposed day by day. The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. Our key insight is that each person has specific characteristics that a synthetic generator likely cannot reproduce. Accordingly, we extract audio-visual features which characterize the identity of a person, and use them to create a person-of-interest (POI) deepfake detector. We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity. As a result, when the video and/or audio of a person is manipulated, its representation in the embedding space becomes inconsistent with the real identity, allowing reliable detection. Training is carried out exclusively on real talking-face video; thus, the detector does not depend on any specific manipulation method and yields the highest generalization ability. In addition, our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos. Experiments on a wide variety of datasets confirm that our method ensures a SOTA performance, especially on low quality videos. Code is publicly available on-line at https://github.com/grip-unina/poi-forensics. △ Less

Submitted 18 May, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

arXiv:2112.12606 [pdf, other]

Towards Universal GAN Image Detection

Authors: Davide Cozzolino, Diego Gragnaniello, Giovanni Poggi, Luisa Verdoliva

Abstract: The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and… ▽ More The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2112.08739 [pdf, other]

doi 10.1109/ACCESS.2022.3179116

Forensic Analysis of Synthetically Generated Western Blot Images

Authors: Sara Mandelli, Davide Cozzolino, Edoardo D. Cannas, Joao P. Cardenuto, Daniel Moreira, Paolo Bestagini, Walter J. Scheirer, Anderson Rocha, Luisa Verdoliva, Stefano Tubaro, Edward J. Delp

Abstract: The widespread diffusion of synthetically generated content is a serious threat that needs urgent countermeasures. As a matter of fact, the generation of synthetic content is not restricted to multimedia data like videos, photographs or audio sequences, but covers a significantly vast area that can include biological images as well, such as western blot and microscopic images. In this paper, we fo… ▽ More The widespread diffusion of synthetically generated content is a serious threat that needs urgent countermeasures. As a matter of fact, the generation of synthetic content is not restricted to multimedia data like videos, photographs or audio sequences, but covers a significantly vast area that can include biological images as well, such as western blot and microscopic images. In this paper, we focus on the detection of synthetically generated western blot images. These images are largely explored in the biomedical literature and it has been already shown they can be easily counterfeited with few hopes to spot manipulations by visual inspection or by using standard forensics detectors. To overcome the absence of publicly available data for this task, we create a new dataset comprising more than 14K original western blot images and 24K synthetic western blot images, generated using four different state-of-the-art generation methods. We investigate different strategies to detect synthetic western blots, exploring binary classification methods as well as one-class detectors. In both scenarios, we never exploit synthetic western blot images at training stage. The achieved results show that synthetically generated western blot images can be spot with good accuracy, even though the exploited detectors are not optimized over synthetic versions of these scientific images. We also test the robustness of the developed detectors against post-processing operations commonly performed on scientific images, showing that we can be robust to JPEG compression and that some generative models are easily recognizable, despite the application of editing might alter the artifacts they leave. △ Less

Submitted 1 June, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

arXiv:2104.02617 [pdf, other]

Are GAN generated images easy to detect? A critical analysis of the state-of-the-art

Authors: Diego Gragnaniello, Davide Cozzolino, Francesco Marra, Giovanni Poggi, Luisa Verdoliva

Abstract: The advent of deep learning has brought a significant improvement in the quality of generated media. However, with the increased level of photorealism, synthetic media are becoming hardly distinguishable from real ones, raising serious concerns about the spread of fake or manipulated information over the Internet. In this context, it is important to develop automated tools to reliably and timely d… ▽ More The advent of deep learning has brought a significant improvement in the quality of generated media. However, with the increased level of photorealism, synthetic media are becoming hardly distinguishable from real ones, raising serious concerns about the spread of fake or manipulated information over the Internet. In this context, it is important to develop automated tools to reliably and timely detect synthetic media. In this work, we analyze the state-of-the-art methods for the detection of synthetic images, highlighting the key ingredients of the most successful approaches, and comparing their performance over existing generative architectures. We will devote special attention to realistic and challenging scenarios, like media uploaded on social networks or generated by new and unseen architectures, analyzing the impact of suitable augmentation and training strategies on the detectors' generalization ability. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 7 pages, 5 figures, conference

arXiv:2012.02512 [pdf, other]

ID-Reveal: Identity-aware DeepFake Video Detection

Authors: Davide Cozzolino, Andreas Rössler, Justus Thies, Matthias Nießner, Luisa Verdoliva

Abstract: A major challenge in DeepFake forgery detection is that state-of-the-art algorithms are mostly trained to detect a specific fake method. As a result, these approaches show poor generalization across different types of facial manipulations, e.g., from face swapping to facial reenactment. To this end, we introduce ID-Reveal, a new approach that learns temporal facial features, specific of how a pers… ▽ More A major challenge in DeepFake forgery detection is that state-of-the-art algorithms are mostly trained to detect a specific fake method. As a result, these approaches show poor generalization across different types of facial manipulations, e.g., from face swapping to facial reenactment. To this end, we introduce ID-Reveal, a new approach that learns temporal facial features, specific of how a person moves while talking, by means of metric learning coupled with an adversarial training strategy. The advantage is that we do not need any training data of fakes, but only train on real videos. Moreover, we utilize high-level semantic features, which enables robustness to widespread and disruptive forms of post-processing. We perform a thorough experimental analysis on several publicly available benchmarks. Compared to state of the art, our method improves generalization and is more robust to low-quality videos, that are usually spread over social networks. In particular, we obtain an average improvement of more than 15% in terms of accuracy for facial reenactment on high compressed videos. △ Less

Submitted 20 August, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: Video: https://www.youtube.com/watch?v=RsFxsOLvRdY

arXiv:2001.11847 [pdf, other]

doi 10.1109/LSP.2020.3008855

CNN-based fast source device identification

Authors: Sara Mandelli, Davide Cozzolino, Paolo Bestagini, Luisa Verdoliva, Stefano Tubaro

Abstract: Source identification is an important topic in image forensics, since it allows to trace back the origin of an image. This represents a precious information to claim intellectual property but also to reveal the authors of illicit materials. In this paper we address the problem of device identification based on sensor noise and propose a fast and accurate solution using convolutional neural network… ▽ More Source identification is an important topic in image forensics, since it allows to trace back the origin of an image. This represents a precious information to claim intellectual property but also to reveal the authors of illicit materials. In this paper we address the problem of device identification based on sensor noise and propose a fast and accurate solution using convolutional neural networks (CNNs). Specifically, we propose a 2-channel-based CNN that learns a way of comparing camera fingerprint and image noise at patch level. The proposed solution turns out to be much faster than the conventional approach and to ensure an increased accuracy. This makes the approach particularly suitable in scenarios where large databases of images are analyzed, like over social networks. In this vein, since images uploaded on social media usually undergo at least two compression stages, we include investigations on double JPEG compressed images, always reporting higher accuracy than standard approaches. △ Less

Submitted 8 July, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

arXiv:2001.06440 [pdf, other]

Combining PRNU and noiseprint for robust and efficient device source identification

Authors: Davide Cozzolino, Francesco Marra, Diego Gragnaniello, Giovanni Poggi, Luisa Verdoliva

Abstract: PRNU-based image processing is a key asset in digital multimedia forensics. It allows for reliable device identification and effective detection and localization of image forgeries, in very general conditions. However, performance impairs significantly in challenging conditions involving low quality and quantity of data. These include working on compressed and cropped images, or estimating the cam… ▽ More PRNU-based image processing is a key asset in digital multimedia forensics. It allows for reliable device identification and effective detection and localization of image forgeries, in very general conditions. However, performance impairs significantly in challenging conditions involving low quality and quantity of data. These include working on compressed and cropped images, or estimating the camera PRNU pattern based on only a few images. To boost the performance of PRNU-based analyses in such conditions we propose to leverage the image noiseprint, a recently proposed camera-model fingerprint that has proved effective for several forensic tasks. Numerical experiments on datasets widely used for source identification prove that the proposed method ensures a significant performance improvement in a wide range of challenging situations. △ Less

Submitted 17 January, 2020; originally announced January 2020.

arXiv:1911.12069 [pdf, other]

SpoC: Spoofing Camera Fingerprints

Authors: Davide Cozzolino, Justus Thies, Andreas Rössler, Matthias Nießner, Luisa Verdoliva

Abstract: Thanks to the fast progress in synthetic media generation, creating realistic false images has become very easy. Such images can be used to wrap "rich" fake news with enhanced credibility, spawning a new wave of high-impact, high-risk misinformation campaigns. Therefore, there is a fast-growing interest in reliable detectors of manipulated media. The most powerful detectors, to date, rely on the s… ▽ More Thanks to the fast progress in synthetic media generation, creating realistic false images has become very easy. Such images can be used to wrap "rich" fake news with enhanced credibility, spawning a new wave of high-impact, high-risk misinformation campaigns. Therefore, there is a fast-growing interest in reliable detectors of manipulated media. The most powerful detectors, to date, rely on the subtle traces left by any device on all images acquired by it. In particular, due to proprietary in-camera processes, like demosaicing or compression, each camera model leaves trademark traces that can be exploited for forensic analyses. The absence or distortion of such traces in the target image is a strong hint of manipulation. In this paper, we challenge such detectors to gain better insight into their vulnerabilities. This is an important study in order to build better forgery detectors able to face malicious attacks. Our proposal consists of a GAN-based approach that injects camera traces into synthetic images. Given a GAN-generated image, we insert the traces of a specific camera model into it and deceive state-of-the-art detectors into believing the image was acquired by that model. Likewise, we deceive independent detectors of synthetic GAN images into believing the image is real. Experiments prove the effectiveness of the proposed method in a wide array of conditions. Moreover, no prior information on the attacked detectors is needed, but only sample images from the target camera. △ Less

Submitted 22 April, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

arXiv:1901.08971 [pdf, other]

FaceForensics++: Learning to Detect Manipulated Facial Images

Authors: Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner

Abstract: The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult… ▽ More The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on DeepFakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domainspecific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers. △ Less

Submitted 26 August, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: Video: https://youtu.be/x2g48Q2I2ZQ

arXiv:1812.02510 [pdf, other]

ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery Detection

Authors: Davide Cozzolino, Justus Thies, Andreas Rössler, Christian Riess, Matthias Nießner, Luisa Verdoliva

Abstract: Distinguishing manipulated from real images is becoming increasingly difficult as new sophisticated image forgery approaches come out by the day. Naive classification approaches based on Convolutional Neural Networks (CNNs) show excellent performance in detecting image manipulations when they are trained on a specific forgery method. However, on examples from unseen manipulation approaches, their… ▽ More Distinguishing manipulated from real images is becoming increasingly difficult as new sophisticated image forgery approaches come out by the day. Naive classification approaches based on Convolutional Neural Networks (CNNs) show excellent performance in detecting image manipulations when they are trained on a specific forgery method. However, on examples from unseen manipulation approaches, their performance drops significantly. To address this limitation in transferability, we introduce Forensic-Transfer (FT). We devise a learning-based forensic detector which adapts well to new domains, i.e., novel manipulation methods and can handle scenarios where only a handful of fake examples are available during training. To this end, we learn a forensic embedding based on a novel autoencoder-based architecture that can be used to distinguish between real and fake imagery. The learned embedding acts as a form of anomaly detector; namely, an image manipulated from an unseen method will be detected as fake provided it maps sufficiently far away from the cluster of real images. Comparing to prior works, FT shows significant improvements in transferability, which we demonstrate in a series of experiments on cutting-edge benchmarks. For instance, on unseen examples, we achieve up to 85% in terms of accuracy, and with only a handful of seen examples, our performance already reaches around 95%. △ Less

Submitted 27 November, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

arXiv:1811.11872 [pdf, other]

doi 10.1109/TGRS.2019.2906412

Guided patch-wise nonlocal SAR despeckling

Authors: Sergio Vitale, Davide Cozzolino, Giuseppe Scarpa, Luisa Verdoliva, Giovanni Poggi

Abstract: We propose a new method for SAR image despeckling which leverages information drawn from co-registered optical imagery. Filtering is performed by plain patch-wise nonlocal means, operating exclusively on SAR data. However, the filtering weights are computed by taking into account also the optical guide, which is much cleaner than the SAR data, and hence more discriminative. To avoid injecting opti… ▽ More We propose a new method for SAR image despeckling which leverages information drawn from co-registered optical imagery. Filtering is performed by plain patch-wise nonlocal means, operating exclusively on SAR data. However, the filtering weights are computed by taking into account also the optical guide, which is much cleaner than the SAR data, and hence more discriminative. To avoid injecting optical-domain information into the filtered image, a SAR-domain statistical test is preliminarily performed to reject right away any risky predictor. Experiments on two SAR-optical datasets prove the proposed method to suppress very effectively the speckle, preserving structural details, and without introducing visible filtering artifacts. Overall, the proposed method compares favourably with all state-of-the-art despeckling filters, and also with our own previous optical-guided filter. △ Less

Submitted 28 November, 2018; originally announced November 2018.

arXiv:1808.09714 [pdf, other]

Camera-based Image Forgery Localization using Convolutional Neural Networks

Authors: Davide Cozzolino, Luisa Verdoliva

Abstract: Camera fingerprints are precious tools for a number of image forensics tasks. A well-known example is the photo response non-uniformity (PRNU) noise pattern, a powerful device fingerprint. Here, to address the image forgery localization problem, we rely on noiseprint, a recently proposed CNN-based camera model fingerprint. The CNN is trained to minimize the distance between same-model patches, and… ▽ More Camera fingerprints are precious tools for a number of image forensics tasks. A well-known example is the photo response non-uniformity (PRNU) noise pattern, a powerful device fingerprint. Here, to address the image forgery localization problem, we rely on noiseprint, a recently proposed CNN-based camera model fingerprint. The CNN is trained to minimize the distance between same-model patches, and maximize the distance otherwise. As a result, the noiseprint accounts for model-related artifacts just like the PRNU accounts for device-related non-uniformities. However, unlike the PRNU, it is only mildly affected by residuals of high-level scene content. The experiments show that the proposed noiseprint-based forgery localization method improves over the PRNU-based reference. △ Less

Submitted 29 August, 2018; originally announced August 2018.

arXiv:1808.08396 [pdf, other]

Noiseprint: a CNN-based camera model fingerprint

Authors: Davide Cozzolino, Luisa Verdoliva

Abstract: Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint. If one is able to recover them, by suppressing the high-level scene content and other disturbances, a number of forensic tasks can be easily accomplished. A notable example is the PRNU pattern, which can be regarded as… ▽ More Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint. If one is able to recover them, by suppressing the high-level scene content and other disturbances, a number of forensic tasks can be easily accomplished. A notable example is the PRNU pattern, which can be regarded as a device fingerprint, and has received great attention in multimedia forensics. In this paper we propose a method to extract a camera model fingerprint, called noiseprint, where the scene content is largely suppressed and model-related artifacts are enhanced. This is obtained by means of a Siamese network, which is trained with pairs of image patches coming from the same (label +1) or different (label -1) cameras. Although noiseprints can be used for a large variety of forensic tasks, here we focus on image forgery localization. Experiments on several datasets widespread in the forensic community show noiseprint-based methods to provide state-of-the-art performance. △ Less

Submitted 25 August, 2018; originally announced August 2018.

arXiv:1803.09179 [pdf, other]

FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces

Authors: Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner

Abstract: With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especiall… ▽ More With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models. △ Less

Submitted 24 March, 2018; originally announced March 2018.

Comments: Video: https://youtu.be/Tle7YaPkO_k

arXiv:1709.06054 [pdf, other]

doi 10.1109/TGRS.2018.2817393

Target-adaptive CNN-based pansharpening

Authors: Giuseppe Scarpa, Sergio Vitale, Davide Cozzolino

Abstract: We recently proposed a convolutional neural network (CNN) for remote sensing image pansharpening obtaining a significant performance gain over the state of the art. In this paper, we explore a number of architectural and training variations to this baseline, achieving further performance gains with a lightweight network which trains very fast. Leveraging on this latter property, we propose a targe… ▽ More We recently proposed a convolutional neural network (CNN) for remote sensing image pansharpening obtaining a significant performance gain over the state of the art. In this paper, we explore a number of architectural and training variations to this baseline, achieving further performance gains with a lightweight network which trains very fast. Leveraging on this latter property, we propose a target-adaptive usage modality which ensures a very good performance also in the presence of a mismatch w.r.t. the training set, and even across different sensors. The proposed method, published online as an off-the-shelf software tool, allows users to perform fast and high-quality CNN-based pansharpening of their own target images on general-purpose hardware. △ Less

Submitted 23 February, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

arXiv:1708.08754 [pdf, other]

Autoencoder with recurrent neural networks for video forgery detection

Authors: Dario D'Avino, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Video forgery detection is becoming an important issue in recent years, because modern editing software provide powerful and easy-to-use tools to manipulate videos. In this paper we propose to perform detection by means of deep learning, with an architecture based on autoencoders and recurrent neural networks. A training phase on a few pristine frames allows the autoencoder to learn an intrinsic m… ▽ More Video forgery detection is becoming an important issue in recent years, because modern editing software provide powerful and easy-to-use tools to manipulate videos. In this paper we propose to perform detection by means of deep learning, with an architecture based on autoencoders and recurrent neural networks. A training phase on a few pristine frames allows the autoencoder to learn an intrinsic model of the source. Then, forged material is singled out as anomalous, as it does not fit the learned model, and is encoded with a large reconstruction error. Recursive networks, implemented with the long short-term memory model, are used to exploit temporal dependencies. Preliminary results on forged videos show the potential of this approach. △ Less

Submitted 29 August, 2017; originally announced August 2017.

Comments: Presented at IS&T Electronic Imaging: Media Watermarking, Security, and Forensics, January 2017

arXiv:1704.00275 [pdf, other]

SAR image despeckling through convolutional neural networks

Authors: G. Chierchia, D. Cozzolino, G. Poggi, L. Verdoliva

Abstract: In this paper we investigate the use of discriminative model learning through Convolutional Neural Networks (CNNs) for SAR image despeckling. The network uses a residual learning strategy, hence it does not recover the filtered image, but the speckle component, which is then subtracted from the noisy one. Training is carried out by considering a large multitemporal SAR image and its multilook vers… ▽ More In this paper we investigate the use of discriminative model learning through Convolutional Neural Networks (CNNs) for SAR image despeckling. The network uses a residual learning strategy, hence it does not recover the filtered image, but the speckle component, which is then subtracted from the noisy one. Training is carried out by considering a large multitemporal SAR image and its multilook version, in order to approximate a clean image. Experimental results, both on synthetic and real SAR data, show the method to achieve better performance with respect to state-of-the-art techniques. △ Less

Submitted 3 May, 2017; v1 submitted 2 April, 2017; originally announced April 2017.

Comments: Accepted at 2017 IEEE International Geoscience and Remote Sensing Symposium, Fort Worth, Texas, July 23-28, 2017

arXiv:1703.04636 [pdf, other]

A PatchMatch-based Dense-field Algorithm for Video Copy-Move Detection and Localization

Authors: Luca D'Amiano, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: We propose a new algorithm for the reliable detection and localization of video copy-move forgeries. Discovering well crafted video copy-moves may be very difficult, especially when some uniform background is copied to occlude foreground objects. To reliably detect both additive and occlusive copy-moves we use a dense-field approach, with invariant features that guarantee robustness to several pos… ▽ More We propose a new algorithm for the reliable detection and localization of video copy-move forgeries. Discovering well crafted video copy-moves may be very difficult, especially when some uniform background is copied to occlude foreground objects. To reliably detect both additive and occlusive copy-moves we use a dense-field approach, with invariant features that guarantee robustness to several post-processing operations. To limit complexity, a suitable video-oriented version of PatchMatch is used, with a multiresolution search strategy, and a focus on volumes of interest. Performance assessment relies on a new dataset, designed ad hoc, with realistic copy-moves and a wide variety of challenging situations. Experimental results show the proposed method to detect and localize video copy-moves with good accuracy even in adverse conditions. △ Less

Submitted 14 March, 2017; originally announced March 2017.

arXiv:1703.04615 [pdf, other]

Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection

Authors: Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva

Abstract: Local descriptors based on the image noise residual have proven extremely effective for a number of forensic applications, like forgery detection and localization. Nonetheless, motivated by promising results in computer vision, the focus of the research community is now shifting on deep learning. In this paper we show that a class of residual-based descriptors can be actually regarded as a simple… ▽ More Local descriptors based on the image noise residual have proven extremely effective for a number of forensic applications, like forgery detection and localization. Nonetheless, motivated by promising results in computer vision, the focus of the research community is now shifting on deep learning. In this paper we show that a class of residual-based descriptors can be actually regarded as a simple constrained convolutional neural network (CNN). Then, by relaxing the constraints, and fine-tuning the net on a relatively small training set, we obtain a significant performance improvement with respect to the conventional detector. △ Less

Submitted 14 March, 2017; originally announced March 2017.

arXiv:1509.03453 [pdf, other]

A reliable order-statistics-based approximate nearest neighbor search algorithm

Authors: Luisa Verdoliva, Davide Cozzolino, Giovanni Poggi

Abstract: We propose a new algorithm for fast approximate nearest neighbor search based on the properties of ordered vectors. Data vectors are classified based on the index and sign of their largest components, thereby partitioning the space in a number of cones centered in the origin. The query is itself classified, and the search starts from the selected cone and proceeds to neighboring ones. Overall, the… ▽ More We propose a new algorithm for fast approximate nearest neighbor search based on the properties of ordered vectors. Data vectors are classified based on the index and sign of their largest components, thereby partitioning the space in a number of cones centered in the origin. The query is itself classified, and the search starts from the selected cone and proceeds to neighboring ones. Overall, the proposed algorithm corresponds to locality sensitive hashing in the space of directions, with hashing based on the order of components. Thanks to the statistical features emerging through ordering, it deals very well with the challenging case of unstructured data, and is a valuable building block for more complex techniques dealing with structured data. Experiments on both simulated and real-world data prove the proposed algorithm to provide a state-of-the-art performance. △ Less

Submitted 29 October, 2016; v1 submitted 11 September, 2015; originally announced September 2015.

arXiv:1311.6934 [pdf, other]

Image forgery detection based on the fusion of machine learning and block-matching methods

Authors: Davide Cozzolino, Diego Gragnaniello, Luisa Verdoliva

Abstract: Dense local descriptors and machine learning have been used with success in several applications, like classification of textures, steganalysis, and forgery detection. We develop a new image forgery detector building upon some descriptors recently proposed in the steganalysis field suitably merging some of such descriptors, and optimizing a SVM classifier on the available training set. Despite the… ▽ More Dense local descriptors and machine learning have been used with success in several applications, like classification of textures, steganalysis, and forgery detection. We develop a new image forgery detector building upon some descriptors recently proposed in the steganalysis field suitably merging some of such descriptors, and optimizing a SVM classifier on the available training set. Despite the very good performance, very small forgeries are hardly ever detected because they contribute very little to the descriptors. Therefore we also develop a simple, but extremely specific, copy-move detector based on region matching and fuse decisions so as to reduce the missing detection rate. Overall results appear to be extremely encouraging. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Comments: 4 pages

arXiv:1311.6932 [pdf, other]

A novel framework for image forgery localization

Authors: Davide Cozzolino, Diego Gragnaniello, Luisa Verdoliva

Abstract: Image forgery localization is a very active and open research field for the difficulty to handle the large variety of manipulations a malicious user can perform by means of more and more sophisticated image editing tools. Here, we propose a localization framework based on the fusion of three very different tools, based, respectively, on sensor noise, patch-matching, and machine learning. The binar… ▽ More Image forgery localization is a very active and open research field for the difficulty to handle the large variety of manipulations a malicious user can perform by means of more and more sophisticated image editing tools. Here, we propose a localization framework based on the fusion of three very different tools, based, respectively, on sensor noise, patch-matching, and machine learning. The binary masks provided by these tools are finally fused based on some suitable reliability indexes. According to preliminary experiments on the training set, the proposed framework provides often a very good localization accuracy and sometimes valuable clues for visual scrutiny. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Comments: 4 pages

Showing 1–32 of 32 results for author: Cozzolino, D