The Challenge of Detecting Synthetic Manipulations in ID Documents
(DeepID 2025)

Challenge is held in conjunction with IEEE/CVF ICCV, Honolulu, Hawaii, USA on October 2025.

Overview

With the advancements and wide spread of visual generative models, ID document tampering has become a pressing issue. DeepID is the first competition that challenges participants to detect synthetic manipulations (i.e., injection attacks (not presentation attacks)) in ID documents. The results of the competition will be presented and discussed at the ICCV 2025 workshop of the same name. The top performing teams in the challenge will be invited to be co-authors of the overview competition paper, which will be published in the ICCV proceedings, and will be asked to present their approaches during the workshop. The first place team will also receive a monetary token of appreciation provided by PXL Vision.

The focus of the competition is on:

Important Dates

Competition Details

Two tracks for detecting Digital Manipulations in ID Documents

Track 1

Binary classification (bona fide vs. forged). For a given image of ID card, submitted models return one score from 0 to 1: nearer to 1 means it is a bona fide card and nearer to 0, it is a manipulated card. For evaluation, we use F1-score, for which we assume a 0.5 decision threshold on the scores computed for all images.

Track 2

Localization (mask of manipulated regions). For a given image of ID card, submitted models return a binary mask of the same size as an image: with values at 0 for manipulated regions and values at 1 for bona fide regions. For evaluation, we compute an aggregated F1-score based on image-level statistics.

Evaluation details

Participants need to submit Docker containers with pre-trained models. The baseline model with a baseline Docker code will be provided on May 30. Evaluation will be performed offline on a standalone GPU machine (RTX3090 with 24GB) with no internet connection on a private dataset. For each track, the results of evaluations on the private dataset will be updated once per day in the leaderboard. The submissions of participants will be evaluated under equal conditions and using the same evaluation data.

Dataset

We provide participants with the Fantasy ID Dataset for tuning or training their detection and localization models. They can use any other public, such as MIDV-2020 or BID datasets, and private datasets of ID documents for training/tuning their submitted models. The download link to the dataset will be provided upon registration.

Fantasy ID Dataset (for training/tuning)

We provide 262 Fantasy ID cards designed to resemble ID documents of 10 different countries and languages. The design of the cards mimics the real ID documents and the corresponding cultural and language elements. The cards contain fantasy personal information (not of real people) but has faces from real people, which we took from these datasets of faces: American Multiracial Face Database (AMFD), Face London Research Dataset, and the High-Quality Wide Multi-Channel Attack (HQ-WMCA). The cards were printed using Evolis Primacy 2 card printer and scanned with three devices (iPhone 15 Pro, Huawei Mate 30, and Kyocera TASKalfa 2554ci office scanner), resulting in 786 images. These constitute bona fide samples. Manipulated versions were generated from bona fide using 2 face swapping methods and 2 text inpainting methods (by changing some fields such as name, date, id number, etc). In the dataset, one type of attack has one method for face swapping and one method for text inpainting. In the end, there are 786*2=1572 manipulated samples. All of the dataset (including more attacks and an additional test set) will be publicly released after the competition (target date for public release: August 2025). Majority of the data will be available under CC BY 4.0 license.

In-domain test dataset

This test dataset is based on a different set of Fantasy ID cards. It will be used for testing the submitted models. It will also include manipulations that are not present in the training data, which means this test set includes in-domain bona fide but unknown attacks. This test dataset will be publicly released after the competition. The Leaderboard with the evaluation results on this test dataset will be updated every day during the competition.

Private out-of-domain test dataset (hidden testing)

This private undisclosed set consists of bona fide images of ID documents from real individuals, provided by the ID verification company (PXL Vision), and their forged versions. This data represents out-of-domain set and will be used for evaluation only; no samples will be released or shown. The Leaderboard with the evaluation results on these private samples will be updated every day during the competition.

Here are some examples of Fantasy ID cards with original generated cards, printed and scanned bona fide, and manipulated then printed and re-scanned forged versions. The participants will need to detect manipulated/forged IDs either as a classification task (is it manipulated) or as a localization task (at which places it is manipulated).

Original Portugal

Original Portugal

Bonafide Portugal

Bonafide Portugal

Forged Portugal

Forged Portugal

Original Turkiye

Original Turkiye

Bonafide Turkiye

Bonafide Turkiye

Forged Turkiye

Forged Turkiye

Register to Participate

To participate in the competition, please fill out the registration form

Register

Leaderboard

Evaluation results are posted for each track (detection and localization) separately. The submissions are sorted by the aggregate F1 score (at the moment by 'F1 on fantasy'). The 'baseline' team in the tables is the baseline TruFor model we provide in baseline docker code. If your submission is not listed, it means we could not process the docker, so we could not read it (please submit the tarball in .tgz format, not zip), it was invalid, or it did not adhere to the API. Please use our baseline docker code for preparing the submission and adhere to the suggested convention for naming the docker files < team_name >_< track_name >_< algorithm_name >_< version >.tgz, because we rely on the names to extract team, algorithm, and versions.

Best per team (detection track) (updated on July 15, 2025 at 11:23:43 Swiss time)
This is the final ranking.

  Team Algorithm Version Runtime-fantasy F1 on fantasy Runtime-private F1 on private Aggregate F1
1 Sunlight unetc v9 0:26:08 0.9906416660367655 1:50:57 0.7191297839379392 0.800583
2 Incode IccvDeepID v023 0:45:35 0.8679166343284155 4:10:16 0.7533954606111526 0.787752
3 AG edgedoc v15 0:56:39 0.9580710412020603 1:27:56 0.7108249627328226 0.784999
4 UAMBiDALab trufor v1.1 0:51:19 0.7116720789267821 1:18:41 0.7882604496971795 0.765284
5 hardik tuned v1 0:59:31 0.8388464501314228 1:28:16 0.6580754423848288 0.712307
6 Reagvis tuned_vlm v1 1:09:15 0.8158094677480847 2:12:05 0.663301034569992 0.709054
7 baseline trufor v0.1 0:44:46 0.8067118967592016 1:09:49 0.662222601003402 0.705569
8 LatentVibesOnly trufor v0.0 0:52:48 0.8067118967592016 1:26:10 0.662222601003402 0.705569
9 Deepakto baseline v0.1 0:47:44 0.8067118967592016 1:09:59 0.662222601003402 0.705569
10 VISION trufor v1 1:02:18 0.570987139473393 1:30:46 0.7506756800369373 0.696769
11 UNJ clip v6 0:02:25 0.6882444934886508 0:23:26 0.6553133084277257 0.665193
12 Asmodeus trufor v1.1 0:46:24 0.6882444934886508 1:11:21 0.6553133084277257 0.665193
13 KU-Forgeye forgerydetection v7 0:07:11 0.6038971637387054 1:00:18 0.6710191125688072 0.650883
14 SuperIdolSmile unet v0.2 0:12:04 0.8975010342736072 0:52:34 0.5386412954031168 0.646299
15 UVersumAI ensembled v4 0:02:56 0.537161310967704 0:22:38 0.656495936014569 0.620696
16 LJ resnet v2 0:09:33 0.500008301989212 2:08:23 0.6581210504898414 0.610687
17 DUCS ensemble v2 0:02:18 0.4576437408874124 0:22:51 0.6751082341269501 0.609869
18 ens-epfl peunet v2.25 0:22:38 0.32179688651450933 0:41:36 0.7147045144271646 0.596832
19 VCL-ITI swinface v1 0:16:09 0.31769049627017437 3:10:27 0.7010807403250812 0.586064
20 Aphi dpadb v2 0:03:32 0.2989340686324383 0:40:29 0.706648362562519 0.584334
21 IDNT docauth v0.11 0:05:49 0.37037163127979356 0:44:12 0.6553114962632904 0.569830
22 e0nia mmfusion v0 0:02:35 0.410882392561378 0:21:49 0.6157795362699562 0.554310
23 VisGen scu-net v1 0:05:42 0.48938910752959736 0:06:12 0.5506675117272216 0.532284
24 idvc st_bf_comp_no_sc v1.1 0:12:00 0.5632905869632292 0:45:39 0.5171312556765423 0.530979
25 Fake-Hunters bionet v3 0:01:48 0.32187103766106395 0:15:17 0.21610810639615832 0.247837
Full table with all detection results

Best per team (localization track) (updated on July 15, 2025 at 11:23:39 Swiss time)
This is the final ranking.

  Team Algorithm Version Runtime-fantasy F1 on fantasy Runtime-private F1 on private Aggregate F1
1 Sunlight unetc v7 0:24:31 0.78391260410437 1:51:43 0.7162494760503502 0.736548
2 UAMBiDALab trufor v1.1 0:51:19 0.6204511200997092 1:18:41 0.7569658860449113 0.716011
3 VISION trufor v1 1:02:18 0.6117324838968435 1:30:46 0.7378285553501286 0.700000
4 AG trudoc v5 8:30:49 0.6862783664612659 11:34:53 0.6618887933602785 0.669206
5 baseline trufor v0.1 0:44:46 0.5898009453750437 1:09:49 0.6274328723240761 0.616143
6 Reagvis trufor v2 0:58:16 0.5898009453750437 1:22:18 0.6274328723240761 0.616143
7 LatentVibesOnly trufor v0.0 0:52:48 0.5898009453750437 1:26:10 0.6274328723240761 0.616143
8 hardik tuned v1 0:59:31 0.5898009453750437 1:28:16 0.6274328723240761 0.616143
9 Deepakto baseline v0.1 0:47:44 0.5898009453750437 1:09:59 0.6274328723240761 0.616143
10 ens-epfl peunet v2.25 0:22:38 0.5634039904480761 0:41:36 0.6195108908471354 0.602679
11 Fake-Hunters bionet v2 0:04:26 0.5493777434348279 0:28:35 0.5896946730916813 0.577600
12 SuperIdolSmile unet v0.1 0:12:17 0.5635784925443205 0:52:13 0.5693047971614335 0.567587
13 IDNT docauth v0.12 0:06:16 0.5339141142472696 0:53:13 0.5504448403227907 0.545486
14 KU-Forgeye forgerydetection v5 0:06:04 0.5202362995180636 0:45:41 0.5381362843556258 0.532766
15 IDCH trufor_augmented v0.1 0:06:39 0.2629827806996036 0:47:18 0.1714349184560774 0.198899
16 VisGen scu-net v1 0:05:42 0.035723586318595124 0:06:12 0.16439219695195906 0.125792
Full table with all localization results

How we computed the tables

We ran all valid dockers on the dataset. At the moment, we have results for the test set of fantasy ID cards, which has 1385 cards. Algorithm name and version columns are parsed from the file names of your submissions. Runtime-db_name column shows how long it took for the docker to finish. Note that if it ran longer than 2 hours on Fantasy dataset (1385 samples), it means the docker was not using a GPU, even though it was running on the machine with one. We may stop running the dockers if they do not use a GPU, because private dataset is 20K images and it will take too long in that case.
F1 detection score is computed on one dataset from the predicted scores of each image by using f1-score function from scikit-learn like this: f1_score(labels, pred_labels, average='weighted'). It means the f1-score is weighted per class (bonafide and attack), which is a better way to compute f1-score when the data is unbalanced (our case). We used the decision threshold of 0.5 for computing the predicted label, assuming 1 is a bonafide and 0 is an attack.
F1 localization score is computed for each image independently as following:

if sample.is_bonafide:
    # Fully bonafide image: the whole image is 1 (no altered regions), 
    # bonafide pixels are positives, zeros are negatives
    # We want high F1 if the model predicted mostly 1s (bonafide)
    tp = np.sum(mask == 1)  # all pixels should be 1s in the mask
    fn = np.sum(mask == 0)  # any zeros are falsely detected as a negative class
    tn = 0
    fp = 0
else:
    tn = np.sum(mask * gt_mask)
    tp = np.sum((~mask) * (~gt_mask))
    fp = np.sum((~mask) * gt_mask)
    fn = np.sum(mask * (~gt_mask))
precision = tp / (tp + fp + 1e-8)
recall = tp / (tp + fn + 1e-8)
f1 = 2 * precision * recall / (precision + recall + 1e-8)
The final f1-score is a mean of two f1-score means computed for each bonafide and attack classes (mean(bonafide_f1_scores) + mean(attack_f1_scores)) / 2, so that the class with larger number of samples would not dominate the final f1-score.
Aggregate F1 is computed from two f1-scores on fantasy ID cards test set and on Private set of real documents as weighted average: f1_fantasy*0.3 + f1_private*0.7. This approach puts more importance on the results from Private set.

Workshop details

The workshop will features two prominent keynote speakers, who are very well known in media forensics, deepfake detection, and face and ID document presentation attack detection. The overview and results of the challenge will be presented at the workshop. Also, the top teams, who would be invited as co-authors of the overview paper, will be asked to present the approaches they used in the challenge.

Workshop schedule: TBD

Keynote Speakers

>Luisa Verdoliva

Prof. Luisa Verdoliva

University Federico II of Naples

Digital forensics and deepfake detection

Juan Tapia

Dr. Juan Tapia

ATHENE National Research Center

Presentation/morphing attack detection

Organizers

Pavel Korshunov

Pavel Korshunov

Idiap Research Institute
Nevena Shamoska

Nevena Shamoska

CTO, PXL Vision
Magdalena Połlać

Magdalena Połać

PXL Vision
Vedrana Krivokuca

Vedrana Krivokuca

Idiap Research Institute
Vidit Vidit

Vidit

Idiap Research Institute
Amir Mohammadi

Amir Mohammadi

Idiap Research Institute
Christophe Ecabert

Christophe Ecabert

Idiap Research Institute
Sébastien Marcel

Sébastien Marcel

Idiap Research Institute