GitPedia

Retinaface

Reimplement RetinaFace use C++ and TensorRT

From clancylian·Updated December 2, 2025·View on GitHub·

Reference resources [RetinaFace](https://github.com/deepinsight/insightface/tree/master/RetinaFace) in insightface with python code. The project is written primarily in C++, distributed under the MIT License license, first published in 2019. Key topics include: caffe, int8, mxnet2caffe, retinaface, tensorrt.

RetinaFace C++ Reimplement

source

Reference resources RetinaFace in insightface with python code.

model transformation tool

MXNet2Caffe

you need to add some layers yourself, and in caffe there is not upsample,you can replace with deconvolution,and maybe slight accuracy loss.

the origin model reference from mobilenet25,and I have retrain it.

Demo

$ mkdir build
$ cd build/
$ cmake ../
$ make

you need to modify dependency path in CmakeList file.

Speed

test hardware:1080Ti

test1:

modelspeedinput sizepreprocess timeinferencepostprocess time
mxnet44.8ms1280x89619.0ms8.0ms16.0ms
caffe46.9ms1280x8965.8ms24.1ms16.0ms
tensorrt29.3ms1280x8966.9ms5.4ms15.0ms

test2:

modelspeedinputsizepreprocess timeinferencepostprocess time
mxnet6.4ms320x4161.3ms0.1ms4.2ms
caffe30.8ms320x4161.2ms27ms2.3ms
tensorrt4.7ms320x4160.7ms1.9ms1.8ms

tensorrt batch test:

batchsizeinputsizemaxbatchsizepreprocess timeinferencepostprocess timeallGPU
1448x44881.0ms2.3ms2.6ms6.7ms35%
2448x44882.5ms3.3ms5.2ms11.8ms33%
4448x44884.1ms4.6ms10.0ms21.8ms28%
8448x44888.7ms7.0ms20.3ms40.7ms23%
16448x4483228.114.738.7ms92.0ms-
32448x4483236.2ms26.375.7ms163.5ms-

note: batch size have some advantage in inference but can't speed up preprocess and postprocess.

optimize post process:

batchsizeinputsizemaxbatchsizepreprocess timeinferencepostprocess timeallGPU
1448x44881.0ms2.3ms0.09ms3.5ms70%
2448x44882.2ms2.8ms0.2ms5.3ms60%
4448x44883.7ms5.0ms0.3ms8.4ms55%
8448x44887.5ms6.5ms0.67ms14.9ms50%
16448x4483226ms13ms1.3ms41ms40%
32448x4483232ms22ms2.7ms56.6ms50%

use nvidia npp library to speed up preprocess:

batchsizeinputsizemaxbatchsizepreprocess timeinferencepostprocess timeallGPU
1448x44880.2ms2.3ms0.1ms2.6ms91%
2448x44880.3ms3.0ms0.2ms3.5ms85%
4448x44880.5ms4.1ms0.32ms5.0ms82%
8448x44881.2ms6.3ms0.77ms8.3ms79%
16448x448322.2ms14ms1.3ms16.7ms80%
32448x448325.0ms22ms2.8ms29.3ms77%

INT8 inference

INT8 calibration table can generate by INT8-Calibration-Tool.

Accuracy

https://raw.githubusercontent.com/clancylian/retinaface/master/data/retinaface-widerface%E6%B5%8B%E8%AF%95.png

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from clancylian/retinaface via the GitHub API.Last fetched: 6/28/2026