Latency and throughput

CPU release¶

IDLive Doc SDK for CPU is configured to achieve the optimal latency by default. In order to switch the SDK to throughput optimization mode one should set the DOCSDK_OPTIMIZE_THROUGHPUT environment variable to 1 value.

GPU release¶

IDLive Doc SDK for GPU doesn't provide different performance modes. However, it's important to mention that available GPU resources are distributed fairly between all created LivenessPipeline instances. This means that the more instances you have within a single process, the less concurrent liveness check requests can be processed in parallel by each instance resulting in lower per-pipeline TPS.

Faster JPEG image processing ¶

It's possible to enable GPU-based JPEG image decoder in IDLive Doc SDK. This can be achieved by setting the DOCSDK_USE_GPU_JPEG_DECODER environment variable to 1 value. The expected latency decrease is 30-100% depending on the input image resoultion and GPU model.

Warning

Please note that using GPU-based decoder with CUDA versions lower that 12.1 Update 1 may lead to might lead to crashes (SIGSEGV) when processing images with invalid EXIF metadata.

Lower liveness check latency ¶

Latency of liveness pipeline can be improved significantly by using FP16 precision on the GPU. This can be enabled by setting the DOCSDK_TRT_FP16_ENABLED environment variable to 1 value. The expected latency reduction is 40-50% depending on the pipeline and GPU model.

Lower GPU initialization ¶

Initialization time of the GPU release can be significantly improved by enabling GPU engine cache. When enabled, the SDK caches compiled GPU engines to disk, avoiding the time-consuming compilation process on subsequent initializations. This can be enabled by setting the DOCSDK_TRT_CACHE_DIR environment variable to the directory path where the engine cache will be stored and loaded from. The expected initialization time reduction is ~50%.

Note

The engine cache is exclusive to the specific GPU device and compute capability. Cache files cannot be shared across different GPU devices or compute capabilities and must be regenerated for each unique hardware configuration.