With the advancements and wide spread of visual generative models, ID document tampering has become a pressing issue. DeepID is the first competition that challenges participants to detect synthetic manipulations (i.e., injection attacks (not presentation attacks)) in ID documents. The results of the competition will be presented and discussed at the ICCV 2025 workshop of the same name. The top performing teams in the challenge will be invited to be co-authors of the overview competition paper, which will be published in the ICCV proceedings, and will be asked to present their approaches during the workshop. The first place team will also receive a monetary token of appreciation provided by PXL Vision.
Two tracks for detecting Digital Manipulations in ID Documents
Binary classification (bona fide vs. forged). For a given image of ID card, submitted models return one score from 0 to 1: nearer to 1 means it is a bona fide card and nearer to 0, it is a manipulated card. For evaluation, we use F1-score, for which we assume a 0.5 decision threshold on the scores computed for all images.
Localization (mask of manipulated regions). For a given image of ID card, submitted models return a binary mask of the same size as an image: with values at 0 for manipulated regions and values at 1 for bona fide regions. For evaluation, we compute an aggregated F1-score based on image-level statistics.
Participants need to submit Docker containers with pre-trained models. The baseline model with a baseline Docker code will be provided on May 30. Evaluation will be performed offline on a standalone GPU machine (RTX3090 with 24GB) with no internet connection on a private dataset. For each track, the results of evaluations on the private dataset will be updated once per day in the leaderboard. The submissions of participants will be evaluated under equal conditions and using the same evaluation data.
We provide participants with the Fantasy ID Dataset for tuning or training their detection and localization models. They can use any other public, such as MIDV-2020 or BID datasets, and private datasets of ID documents for training/tuning their submitted models. The download link to the dataset will be provided upon registration.
We provide 262 Fantasy ID cards designed to resemble ID documents of 10 different countries and languages. The design of the cards mimics the real ID documents and the corresponding cultural and language elements. The cards contain fantasy personal information (not of real people) but has faces from real people, which we took from these datasets of faces: American Multiracial Face Database (AMFD), Face London Research Dataset, and the High-Quality Wide Multi-Channel Attack (HQ-WMCA). The cards were printed using Evolis Primacy 2 card printer and scanned with three devices (iPhone 15 Pro, Huawei Mate 30, and Kyocera TASKalfa 2554ci office scanner), resulting in 786 images. These constitute bona fide samples. Manipulated versions were generated from bona fide using 2 face swapping methods and 2 text inpainting methods (by changing some fields such as name, date, id number, etc). In the dataset, one type of attack has one method for face swapping and one method for text inpainting. In the end, there are 786*2=1572 manipulated samples. All of the dataset (including more attacks and an additional test set) will be publicly released after the competition (target date for public release: August 2025). Majority of the data will be available under CC BY 4.0 license.
This test dataset is based on a different set of Fantasy ID cards. It will be used for testing the submitted models. It will also include manipulations that are not present in the training data, which means this test set includes in-domain bona fide but unknown attacks. This test dataset will be publicly released after the competition. The Leaderboard with the evaluation results on this test dataset will be updated every day during the competition.
This private undisclosed set consists of bona fide images of ID documents from real individuals, provided by the ID verification company (PXL Vision), and their forged versions. This data represents out-of-domain set and will be used for evaluation only; no samples will be released or shown. The Leaderboard with the evaluation results on these private samples will be updated every day during the competition.
Here are some examples of Fantasy ID cards with original generated cards, printed and scanned bona fide, and manipulated then printed and re-scanned forged versions. The participants will need to detect manipulated/forged IDs either as a classification task (is it manipulated) or as a localization task (at which places it is manipulated).
To participate in the competition, please fill out the registration form
Evaluation results are posted for each track (detection and localization) separately.
The submissions are sorted by the aggregate F1 score (at the moment by 'F1 on fantasy'). The 'baseline' team in the tables
is the baseline TruFor model we provide in baseline docker code.
If your submission is not listed, it means we could not process the docker, so
we could not read it (please submit the tarball in .tgz format, not zip), it was invalid, or it did not adhere to the API.
Please use our baseline docker code for preparing the submission
and adhere to the suggested convention for naming the docker files < team_name >_< track_name >_< algorithm_name >_< version >.tgz,
because we rely on the names to extract team, algorithm, and versions.
Team | Algorithm | Version | Runtime-fantasy | F1 on fantasy | Runtime-private | F1 on private | Aggregate F1 | |
---|---|---|---|---|---|---|---|---|
1 | Sunlight | unetc | v9 | 0:26:08 | 0.9906416660367655 | 1:50:57 | 0.7191297839379392 | 0.800583 |
2 | Incode | IccvDeepID | v023 | 0:45:35 | 0.8679166343284155 | 4:10:16 | 0.7533954606111526 | 0.787752 |
3 | AG | edgedoc | v15 | 0:56:39 | 0.9580710412020603 | 1:27:56 | 0.7108249627328226 | 0.784999 |
4 | UAMBiDALab | trufor | v1.1 | 0:51:19 | 0.7116720789267821 | 1:18:41 | 0.7882604496971795 | 0.765284 |
5 | hardik | tuned | v1 | 0:59:31 | 0.8388464501314228 | 1:28:16 | 0.6580754423848288 | 0.712307 |
6 | Reagvis | tuned_vlm | v1 | 1:09:15 | 0.8158094677480847 | 2:12:05 | 0.663301034569992 | 0.709054 |
7 | baseline | trufor | v0.1 | 0:44:46 | 0.8067118967592016 | 1:09:49 | 0.662222601003402 | 0.705569 |
8 | LatentVibesOnly | trufor | v0.0 | 0:52:48 | 0.8067118967592016 | 1:26:10 | 0.662222601003402 | 0.705569 |
9 | Deepakto | baseline | v0.1 | 0:47:44 | 0.8067118967592016 | 1:09:59 | 0.662222601003402 | 0.705569 |
10 | VISION | trufor | v1 | 1:02:18 | 0.570987139473393 | 1:30:46 | 0.7506756800369373 | 0.696769 |
11 | UNJ | clip | v6 | 0:02:25 | 0.6882444934886508 | 0:23:26 | 0.6553133084277257 | 0.665193 |
12 | Asmodeus | trufor | v1.1 | 0:46:24 | 0.6882444934886508 | 1:11:21 | 0.6553133084277257 | 0.665193 |
13 | KU-Forgeye | forgerydetection | v7 | 0:07:11 | 0.6038971637387054 | 1:00:18 | 0.6710191125688072 | 0.650883 |
14 | SuperIdolSmile | unet | v0.2 | 0:12:04 | 0.8975010342736072 | 0:52:34 | 0.5386412954031168 | 0.646299 |
15 | UVersumAI | ensembled | v4 | 0:02:56 | 0.537161310967704 | 0:22:38 | 0.656495936014569 | 0.620696 |
16 | LJ | resnet | v2 | 0:09:33 | 0.500008301989212 | 2:08:23 | 0.6581210504898414 | 0.610687 |
17 | DUCS | ensemble | v2 | 0:02:18 | 0.4576437408874124 | 0:22:51 | 0.6751082341269501 | 0.609869 |
18 | ens-epfl | peunet | v2.25 | 0:22:38 | 0.32179688651450933 | 0:41:36 | 0.7147045144271646 | 0.596832 |
19 | VCL-ITI | swinface | v1 | 0:16:09 | 0.31769049627017437 | 3:10:27 | 0.7010807403250812 | 0.586064 |
20 | Aphi | dpadb | v2 | 0:03:32 | 0.2989340686324383 | 0:40:29 | 0.706648362562519 | 0.584334 |
21 | IDNT | docauth | v0.11 | 0:05:49 | 0.37037163127979356 | 0:44:12 | 0.6553114962632904 | 0.569830 |
22 | e0nia | mmfusion | v0 | 0:02:35 | 0.410882392561378 | 0:21:49 | 0.6157795362699562 | 0.554310 |
23 | VisGen | scu-net | v1 | 0:05:42 | 0.48938910752959736 | 0:06:12 | 0.5506675117272216 | 0.532284 |
24 | idvc | st_bf_comp_no_sc | v1.1 | 0:12:00 | 0.5632905869632292 | 0:45:39 | 0.5171312556765423 | 0.530979 |
25 | Fake-Hunters | bionet | v3 | 0:01:48 | 0.32187103766106395 | 0:15:17 | 0.21610810639615832 | 0.247837 |
Team | Algorithm | Version | Runtime-fantasy | F1 on fantasy | Runtime-private | F1 on private | Aggregate F1 | |
---|---|---|---|---|---|---|---|---|
1 | Sunlight | unetc | v7 | 0:24:31 | 0.78391260410437 | 1:51:43 | 0.7162494760503502 | 0.736548 |
2 | UAMBiDALab | trufor | v1.1 | 0:51:19 | 0.6204511200997092 | 1:18:41 | 0.7569658860449113 | 0.716011 |
3 | VISION | trufor | v1 | 1:02:18 | 0.6117324838968435 | 1:30:46 | 0.7378285553501286 | 0.700000 |
4 | AG | trudoc | v5 | 8:30:49 | 0.6862783664612659 | 11:34:53 | 0.6618887933602785 | 0.669206 |
5 | baseline | trufor | v0.1 | 0:44:46 | 0.5898009453750437 | 1:09:49 | 0.6274328723240761 | 0.616143 |
6 | Reagvis | trufor | v2 | 0:58:16 | 0.5898009453750437 | 1:22:18 | 0.6274328723240761 | 0.616143 |
7 | LatentVibesOnly | trufor | v0.0 | 0:52:48 | 0.5898009453750437 | 1:26:10 | 0.6274328723240761 | 0.616143 |
8 | hardik | tuned | v1 | 0:59:31 | 0.5898009453750437 | 1:28:16 | 0.6274328723240761 | 0.616143 |
9 | Deepakto | baseline | v0.1 | 0:47:44 | 0.5898009453750437 | 1:09:59 | 0.6274328723240761 | 0.616143 |
10 | ens-epfl | peunet | v2.25 | 0:22:38 | 0.5634039904480761 | 0:41:36 | 0.6195108908471354 | 0.602679 |
11 | Fake-Hunters | bionet | v2 | 0:04:26 | 0.5493777434348279 | 0:28:35 | 0.5896946730916813 | 0.577600 |
12 | SuperIdolSmile | unet | v0.1 | 0:12:17 | 0.5635784925443205 | 0:52:13 | 0.5693047971614335 | 0.567587 |
13 | IDNT | docauth | v0.12 | 0:06:16 | 0.5339141142472696 | 0:53:13 | 0.5504448403227907 | 0.545486 |
14 | KU-Forgeye | forgerydetection | v5 | 0:06:04 | 0.5202362995180636 | 0:45:41 | 0.5381362843556258 | 0.532766 |
15 | IDCH | trufor_augmented | v0.1 | 0:06:39 | 0.2629827806996036 | 0:47:18 | 0.1714349184560774 | 0.198899 |
16 | VisGen | scu-net | v1 | 0:05:42 | 0.035723586318595124 | 0:06:12 | 0.16439219695195906 | 0.125792 |
We ran all valid dockers on the dataset. At the moment, we have results for the test set of fantasy ID cards, which has 1385 cards.
Algorithm name and version columns are parsed from the file names of your submissions. Runtime-db_name column shows how long it took
for the docker to finish. Note that if it ran longer than 2 hours on Fantasy dataset (1385 samples),
it means the docker was not using a GPU, even though it was running on the machine with one.
We may stop running the dockers if they do not use a GPU, because private dataset is 20K images and it will take too long in that case.
F1 detection score is computed on one dataset from the predicted scores of each image by
using f1-score function from scikit-learn like this:
f1_score(labels, pred_labels, average='weighted')
. It means the f1-score is weighted per class (bonafide and attack), which is a better way to compute
f1-score when the data is unbalanced (our case). We used the decision threshold of 0.5 for computing the predicted label, assuming
1 is a bonafide and 0 is an attack.
F1 localization score is computed for each image independently as following:
if sample.is_bonafide:
# Fully bonafide image: the whole image is 1 (no altered regions),
# bonafide pixels are positives, zeros are negatives
# We want high F1 if the model predicted mostly 1s (bonafide)
tp = np.sum(mask == 1) # all pixels should be 1s in the mask
fn = np.sum(mask == 0) # any zeros are falsely detected as a negative class
tn = 0
fp = 0
else:
tn = np.sum(mask * gt_mask)
tp = np.sum((~mask) * (~gt_mask))
fp = np.sum((~mask) * gt_mask)
fn = np.sum(mask * (~gt_mask))
precision = tp / (tp + fp + 1e-8)
recall = tp / (tp + fn + 1e-8)
f1 = 2 * precision * recall / (precision + recall + 1e-8)
The final f1-score is a mean of two f1-score means computed for each bonafide and attack classes
(mean(bonafide_f1_scores) + mean(attack_f1_scores)) / 2
,
so that the class with larger number of samples would not dominate the final f1-score.f1_fantasy*0.3 + f1_private*0.7
.
This approach puts more importance on the results from Private set.
The workshop will features two prominent keynote speakers, who are very well known in media forensics, deepfake detection, and face and ID document presentation attack detection. The overview and results of the challenge will be presented at the workshop. Also, the top teams, who would be invited as co-authors of the overview paper, will be asked to present the approaches they used in the challenge.