AmusementClub/vs-mlrt

Efficient CPU/GPU ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet, ArtCNN and more!)

30 Releases

Latest: 2d ago

v16.1.test1LatestPre-release

github-actions[bot]·2d ago·June 25, 2026

GitHub

📦 TRT

Upgraded to TensorRT [11.1](https://docs.nvidia.com/deeplearning/tensorrt/11.1.0/getting-started/release-notes-11/11.1.0.html).

📦 General

Upgraded to CUDA 13.3.0.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v16.test1...v16.1.test1

v16.test1Pre-release

github-actions[bot]·3w ago·June 6, 2026

GitHub

📦 vsmlrt.py

__Breaking__: For the `TRT` backend, fp16 inference of built-in models requires either [`onnxconverter-common`](https://pypi.org/project/onnxconverter-common/) or [`nvidia-modelopt`](https://github.com/nvidia/Model-Optimizer#install) Python packages to be installed. bf16 inference is not supported yet.

📦 TRT

Upgraded to TensorRT [11.0](https://docs.nvidia.com/deeplearning/tensorrt/11.0.0/getting-started/release-notes-11/11.0.0.html).

📦 TRT-RTX

Upgraded to TensorRT-RTX [1.5](https://docs.nvidia.com/deeplearning/tensorrt-rtx/latest/getting-started/release-notes-1/1.5.html).

📦 General

Upgraded to CUDA 13.2.1.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.16...v16.test1

v15.16: latest TensorRT librariesv15.16

github-actions[bot]·3mo ago·March 26, 2026

GitHub

📦 TRT

Upgraded to TensorRT [10.16.0](https://docs.nvidia.com/deeplearning/tensorrt/10.16.0/getting-started/release-notes-10/10.16.0.html).

📦 TRT-RTX

Upgraded to TensorRT-RTX [1.4](https://docs.nvidia.com/deeplearning/tensorrt-rtx/1.4/getting-started/release-notes-1/1.4.html).

📦 General

Upgraded to CUDA 13.2.0.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.15...v15.16

v15.15: latest TensorRT librariesv15.15

github-actions[bot]·4mo ago·February 5, 2026

GitHub

📦 TRT

Upgraded to TensorRT [10.15.1](https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/getting-started/release-notes-10/10.15.1.html).

📦 General

Upgraded to CUDA 13.1.1 and cuDNN [9.19.0](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/release-notes.html#cudnn-9-19-0).

📦 vsmlrt.py

Added support for ArtCNN v1.5.0 models.
Added bf16 I/O support for the `MIGX` backend.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.14...v15.15

v15.14.rtx: latest TensorRT-RTX librariesv15.14.rtxPre-release

github-actions[bot]·7mo ago·November 11, 2025

GitHub

📦 TRT-RTX

Upgraded to TensorRT-RTX [1.2](https://docs.nvidia.com/deeplearning/tensorrt-rtx/v1.2/getting-started/release-notes.html#tensorrt-rtx-1-2).
Added engine validity check for debugging invalid engines.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.14...v15.14.rtx

v15.14: latest TensorRT librariesv15.14

github-actions[bot]·7mo ago·November 8, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.14.1](https://docs.nvidia.com/deeplearning/tensorrt/10.14.1/getting-started/release-notes.html#tensorrt-10-14-1).

📦 General

Upgraded to CUDA 13.0.2 and cuDNN [9.13.0](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.13.0/release-notes.html#cudnn-9-13-0).

📦 ORT

Upgraded to ONNX Runtime 1.23.0 ([`ecb26fb`](https://github.com/microsoft/onnxruntime/tree/ecb26fb7754d7c9edf24b1844ea807180a2e3e23)).

📦 NCNN_VK

Upgraded to the latest ncnn ([`86efe80`](https://github.com/Tencent/ncnn/tree/86efe80b50408bfeca79761edcb3fa4b4e513331)) to fix hangs with NVIDIA 565 or later drivers.
Added support for fp16 I/O, similar to other existing supported backends.

📦 vsmlrt.py

Added support for ArtCNN R16F96 Chroma model.
Added `output_format` parameter to non-cuda ort backends.
Added fp16 I/O support for the `TRT_RTX` backend.
Added optional support for fp16 conversion using [TensorRT model optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) for `TRT_RTX`.
Attempt to regenerate the engine after the failure of engine compilation for `TRT`, `MIGX` and `TRT_RTX`.
Remove extraneous plugin check by @Rukario in https://github.com/AmusementClub/vs-mlrt/pull/135
Improve `TRT_RTX` in handling fp16 conversion and standalone usage by @abihf in https://github.com/AmusementClub/vs-mlrt/pull/140
fix: use correct path for checking alter engine size by @shssoichiro in https://github.com/AmusementClub/vs-mlrt/pull/144
+ 1 more

v15.13.ncnnPre-release

github-actions[bot]·9mo ago·September 26, 2025

GitHub

📦 NCNN_VK

Upgraded to the latest ncnn ([`86efe80`](https://github.com/Tencent/ncnn/tree/86efe80b50408bfeca79761edcb3fa4b4e513331)) to fix hangs with NVIDIA 565 or later drivers.
Added support for fp16 I/O, similar to other existing supported backends.

📦 vsmlrt.py

Added support for ArtCNN [v1.4.0](https://github.com/Artoriuz/ArtCNN/releases/tag/v1.4.0) models.

📦 Known issue

Using the `NCNN_VK(fp16=True)` backend on the ArtCNN R8F64 chroma model may exhibit chroma shift with irregular input resolutions.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.13.cu13...v15.13.ncnn

v15.13.cu13: latest TensorRT librariesv15.13.cu13Pre-release

github-actions[bot]·9mo ago·September 6, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.13.3](https://docs.nvidia.com/deeplearning/tensorrt/10.13.3/getting-started/release-notes.html#tensorrt-10-13-3).

📦 General

Upgraded to CUDA 13.0.1 and cuDNN [9.13.0](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.13.0/release-notes.html#cudnn-9-13-0).

📦 vsmlrt.py

Attempt to regenerate the engine after the failure of engine compilation for `TRT`, `MIGX` and `TRT_RTX`.

📦 ORT

Upgraded to ONNX Runtime 1.23.0 ([`ecb26fb`](https://github.com/microsoft/onnxruntime/tree/ecb26fb7754d7c9edf24b1844ea807180a2e3e23)).
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.13.ort...v15.13.cu13

v15.13.ort: latest ONNX Runtime librariesv15.13.ortPre-release

github-actions[bot]·10mo ago·August 31, 2025

GitHub

📦 ORT

Upgraded to ONNX Runtime 1.23.0 ([`4754a1d`](https://github.com/microsoft/onnxruntime/tree/4754a1d64e5920a715b0396906f339e6c15742a0)) and added support for Nvidia RTX 50-series GPUs.
Support for attention operations in ONNX Runtime for LLMs is disabled.
Support for 900 and 10-series GPUs are dropped from `ORT_CUDA`.

📦 General

Upgraded to cuDNN [9.12.0](https://docs.nvidia.com/deeplearning/cudnn/backend/v9.12.0/release-notes.html#cudnn-9-12-0).

📦 vsmlrt.py

Added optional support for fp16 conversion using [TensorRT model optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) for `TRT_RTX`.

📦 Community contributions

`TRT_RTX` improvements by @abihf in https://github.com/AmusementClub/vs-mlrt/pull/140

📦 Known issues

fp16 inference for RIFE v2 and SAFA models, as well as fp32/fp16 inference for some SwinIR models, are not currently working in `TRT_RTX`.
The old cudnn v8 installation should be removed; otherwise, DLL loading may not work.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.13.rtx...v15.13.ort

v15.13.rtx: experimental TensorRT-RTX backendv15.13.rtxPre-release

github-actions[bot]·10mo ago·August 15, 2025

GitHub

📦 TRT-RTX

Upgraded to TensorRT-RTX [1.1](https://docs.nvidia.com/deeplearning/tensorrt-rtx/v1.1/getting-started/release-notes.html#tensorrt-rtx-1-1).

📦 vsmlrt.py

Added support for ArtCNN R16F96 Chroma model.
Added `output_format` parameter to non-cuda ort backends.
Added fp16 I/O support for the `TRT_RTX` backend.

📦 Community contributions

Remove extraneous plugin check for RIFE by @Rukario in https://github.com/AmusementClub/vs-mlrt/pull/135

📦 Known issues

fp16 inference for RIFE v2 and SAFA models is currently not supported in the `TRT_RTX` backend.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.13...v15.13.rtx

v15.13: latest TensorRT librariesv15.13

github-actions[bot]·11mo ago·July 24, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.13.0](https://docs.nvidia.com/deeplearning/tensorrt/10.13.0/getting-started/release-notes.html#tensorrt-10-13-0) and CUDA 12.9.1.

📦 vsmlrt.py

Fix input name.
Fix error handling for `Expr`.

📦 TRT-RTX

Added support for dynamic shapes.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.12...v15.13
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.13/vsmlrt-windows-x64-cuda.v15.13.7z.002?label=downloads)

v15.12: latest TensorRT librariesv15.12

github-actions[bot]·1y ago·June 13, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.12.0](https://docs.nvidia.com/deeplearning/tensorrt/10.12.0/getting-started/release-notes.html#tensorrt-10-12-0).

📦 vsmlrt.py

Added support for the [SAFA](https://github.com/hzwer/Practical-RIFE/tree/9aff2a278b1fb5085e137b4f4b748e518bf7ab26?tab=readme-ov-file#video-enhancement) v0.5 models.
Prioritize the use of `akarin.Expr`.
Fix tile size check in `SAFA()`.

📦 misc

Fix tile size check in `vsort` and `vsov`.
Added __experimental__ support for the [TensorRT-RTX](https://developer.nvidia.com/tensorrt-rtx) library. This `TRT_RTX` backend is under development, check pre-releases with the `.rtx` suffix.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.11...v15.12
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.12/vsmlrt-windows-x64-cuda.v15.12.7z.002?label=downloads)

v15.11.rtx: experimental TensorRT-RTX backendv15.11.rtxPre-release

github-actions[bot]·1y ago·June 12, 2025

GitHub

📦 Known issues

Dynamic shape is not supported.
For the vsmlrt.py wrapper, fp16 processing currently requires the [onnxconverter-common](https://pypi.org/project/onnxconverter-common/) package, and fp16 input/output is not supported.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.11...v15.11.rtx

v15.11: latest TensorRT librariesv15.11

github-actions[bot]·1y ago·May 15, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.11.0](https://docs.nvidia.com/deeplearning/tensorrt/10.11.0/getting-started/release-notes.html#tensorrt-10-11-0).
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.10...v15.11
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.11/vsmlrt-windows-x64-cuda.v15.11.7z.002?label=downloads)

v15.10: latest TensorRT librariesv15.10

github-actions[bot]·1y ago·May 1, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.10.0](https://docs.nvidia.com/deeplearning/tensorrt/10.10.0/getting-started/release-notes.html#tensorrt-10-10-0).

📦 NCNN_VK

Added `DeviceProperties()` to query information about a Vulkan device.

📦 General

Upgraded to CUDA 12.9.0.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.9...v15.10
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.10/vsmlrt-windows-x64-cuda.v15.10.7z.002?label=downloads)

v15.9: latest TensorRT librariesv15.9

github-actions[bot]·1y ago·March 5, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.9.0](https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/getting-started/release-notes.html#tensorrt-10-9-0).

📦 vsmlrt.py

Better overlap defaults for SCUNet and DPIR by @damster101 in https://github.com/AmusementClub/vs-mlrt/pull/126
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.8...v15.9
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.9/vsmlrt-windows-x64-cuda.v15.9.7z.002?label=downloads)

v15.8: Blackwell support, latest TensorRT and MIGraphX librariesv15.8

github-actions[bot]·1y ago·January 24, 2025

GitHub

📦 TRT

Upgraded to TensorRT [10.8.0](https://docs.nvidia.com/deeplearning/tensorrt/10.8.0/getting-started/release-notes.html#tensorrt-10-8-0), which adds support for Blackwell GPUs including RTX 50-series.
The release archive is split into 2GB volumes (`.7z.001`, `.7z.002`).

📦 MIGX

Upgraded to MIGraphX 2.12.0 [`6acc1f9`](https://github.com/ROCm/AMDMIGraphX/commit/6acc1f957bab2d2b23b3adffccd29f7e10178986).

📦 General

Upgraded to CUDA 12.8.0 and HIP 6.2.4.

📦 vsmlrt.py

Added support for [ArtCNN](https://github.com/Artoriuz/ArtCNN) v1.2.0 models.
Add `tiling_optimization_level` and `l2_limit_for_tiling` options to the `TRT` backend for memory-bandwidth-limited models ([docs](https://docs.nvidia.com/deeplearning/tensorrt/10.8.0/inference-library/advanced.html#tiling-optimization)).
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.7...v15.8
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.8/vsmlrt-windows-x64-cuda.v15.8.7z.002?label=downloads)

v15.7: latest TensorRT libraries, ONNX Runtime and MIGraphX interface improvementsv15.7

github-actions[bot]·1y ago·December 3, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.7.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1070/release-notes/index.html#rel-10-7-0).

📦 ORT_DML

Fixed blank output for the first returned frame reported by @Mr-Z-2697 in https://github.com/AmusementClub/vs-mlrt/issues/117

📦 MIGX

Allow `num_streams` > 1 by @abihf in https://github.com/AmusementClub/vs-mlrt/pull/113

📦 ORT_COREML

Add support for ML program in by @yuygfgg in https://github.com/AmusementClub/vs-mlrt/pull/116

📦 General

Upgraded to CUDA 12.6.3.

📦 vsmlrt.py

Added support for RIFE v4.26 heavy model.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.6...v15.7
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.7/vsmlrt-windows-x64-cuda.v15.7.7z?label=downloads)

v15.6: latest TensorRT and OpenVINO librariesv15.6

github-actions[bot]·1y ago·November 1, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.6.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1060/release-notes/index.html#rel-10-6-0).

📦 OV

Upgraded to OpenVINO [2024.5.0](https://github.com/openvinotoolkit/openvino/tree/5833781ddbc476d77cf5593f1f8b34758988b9a8), which adds support for Xe2 GPU and NPU 4 on Lunar Lake.

📦 MIGX

Fix missing precision check.

📦 General

Upgraded to CUDA 12.6.2.

📦 vsmlrt.py

Added support for RIFE v4.25 lite and heavy models.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.5...v15.6
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.6/vsmlrt-windows-x64-cuda.v15.6.7z?label=downloads)

v15.5: latest TensorRT library, CoreML backendv15.5

github-actions[bot]·1y ago·October 1, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.5.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1050/release-notes/index.html#rel-10-5-0).
Volta GPUs (TITAN V, V100) are no longer supported.

📦 ORT

Fix MacOS CoreML support for vsort by @yuygfgg in https://github.com/AmusementClub/vs-mlrt/pull/106.
This pull request also added the`ORT_COREML` backend to vsmlrt.py.

📦 General

Upgraded to CUDA 12.6.1.

📦 vsmlrt.py

Added support for RIFE v4.25 and v4.26 models.
Added automatic batch inference support via `batch_size` option in `inference()` and `flexible_inference()`, which may improve device utilization for inference on small inputs using some small models.
On the one hand, batching improves utilization by creating more work for each kernel invocation and reducing quantization inefficiency of kernel tiles in bulk parallelism. It also reduces average kernel launch and synchronization overhead per work.
On the other hand, however, batching causes cache misses and inserts bubbles in the pipeline that may degrade performance.
This feature requires flexible output support starting with vs-mlrt v15 and is inspired by https://github.com/styler00dollar/VSGAN-tensorrt-docker/commit/ac47012b9313fe76f37914601dafea82df0e94e6.
Note that not all onnx models are supported.
Future RIFE v2 models will be fixed to support batch inference.
benchmark:
+ 17 more

v15.4: latest TensorRT libraryv15.4

github-actions[bot]·1y ago·September 7, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.4.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1040/release-notes/index.html#rel-10-4-0).

📦 General

Upgraded to CUDA 12.6.0.

📦 vsmlrt.py

Added support for Ani4K-v2 model by @srk24 in https://github.com/AmusementClub/vs-mlrt/pull/105
Added support for RIFE v4.23 and v4.24 models.
Add `max_tactics` option to the `TRT` backend, which can reduce engine build time by limiting the number of tactics to time.
By default, TensorRT will determine the number of tactics based on its own heuristic.
---

📦 Batch Inference (Preview)

This feature requires flexible output support starting with vs-mlrt v15 and is inspired by https://github.com/styler00dollar/VSGAN-tensorrt-docker/commit/ac47012b9313fe76f37914601dafea82df0e94e6.
Note that not all onnx models are supported.
Preliminary benchmark:
NVIDIA GeForce RTX 4090
driver 560.94
Windows Server 2019
python 3.12.6, vapoursynth-classic R57.A10
input: 720x480 RGBS
+ 12 more

v15.3: MIGraphX on Windowsv15.3

github-actions[bot]·1y ago·August 21, 2024

GitHub

📦 MIGX

Add __experimental__ [MIGraphX](https://github.com/ROCm/AMDMIGraphX) support on Windows. MIGraphX is AMD's graph optimization engine to accelerate machine learning model inference.
[Supported GPUs](https://rocm.docs.amd.com/projects/install-on-windows/en/docs-6.1.2/reference/system-requirements.html#windows-supported-gpus):
gfx1030: Radeon RX 6950 XT, Radeon RX 6900 XT, Radeon RX 6800 XT, Radeon RX 6800, ...
gfx1100: Radeon RX 7900 XTX, Radeon RX 7900 XT, ...
gfx1101: Radeon RX 7700 XT, ...
gfx1102: Radeon RX 7600
Relevant archives include:
[`vsmlrt-windows-x64-migraphx.<version>.7z`](https://github.com/AmusementClub/vs-mlrt/releases/download/v15.3/vsmlrt-windows-x64-migraphx.v15.3.7z): the all-in-one archive, contains `vsmlrt.py` Python wrapper, some built-in ONNX models, `vsmigx`/`vsov`/`vsort`/`vsncnn` plugins and runtime.
+ 5 more

📦 Known limitation

The `MIGX` backend in the vsmlrt.py wrapper does not support device selection and will always use the default device (`device_id=0`).

📦 vsmlrt.py

Added support for RIFE v4.22 (lite) models.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.2...v15.3
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.3/vsmlrt-windows-x64-cuda.v15.3.7z?label=downloads)

v15.2: latest TensorRT libraryv15.2

github-actions[bot]·1y ago·August 7, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.3.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1030/release-notes/index.html#rel-10-3-0).
Fixed performance regression of RIFE and SAFA models starting with vs-mlrt v14.test4. This version may still be slightly slower than vs-mlrt [v14.test3](https://github.com/AmusementClub/vs-mlrt/releases/tag/v14.test3) under some conditions, however.

📦 General

Upgraded to CUDA 12.5.1.

📦 vsmlrt.py

Added support for RIFE v4.19 ~ v4.21 models.
Added support for ArtCNN R8F64 (chroma) models.
Deprecated ArtCNN C4F32 models based on developer's request, but compatibility at the vsmlrt.py level will be guaranteed.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15.1...v15.2
![](https://img.shields.io/github/downloads/AmusementClub/vs-mlrt/v15.2/vsmlrt-windows-x64-cuda.v15.2.7z?label=downloads)

v15.1: latest TensorRT library v15.1

github-actions[bot]·1y ago·July 4, 2024

GitHub

📦 TRT

Upgraded to TensorRT [10.2.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1020/release-notes/index.html#rel-10-2-0).
Add TensorRT release package (`vsmlrt-windows-x64-tensorrt`). https://github.com/AmusementClub/vs-mlrt/issues/102
This package is a strict subset of the CUDA release package, with cuDNN, cuBLAS libraries and support for `ORT_CUDA` backend removed.
It supports `TRT`, `OV_*`, `ORT_CPU`, `ORT_DML` and `NCNN_VK` backends.

📦 known issue

Accoding to the [documentation](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1020/release-notes/index.html#rel-10-2-0),
`There is an up to 4x performance regression for networks containing "GridSample" ops compared to TensorRT 9.2.`
This affects RIFE and SAFA models.
vs-mlrt [v14.test3](https://github.com/AmusementClub/vs-mlrt/releases/tag/v14.test3) is the latest one that is not affected. This will be fixed in the next release by TensorRT 10.3.0.

📦 General

Upgraded to CUDA 12.5.0.

📦 vsmlrt.py

Added support for [RIFE v4.17 lite and v4.18](https://github.com/hzwer/Practical-RIFE/tree/e992041754c9a81e0566a0adce7589f79e12f0e8?tab=readme-ov-file#model-list) models.
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v15...v15.1

v15: latest TensorRT libraryv15

github-actions[bot]·2y ago·June 15, 2024

GitHub

📦 plugins

Added parameter `flexible_output_prop` for flexible output:
Traditionally, all plugins can only support onnx models with one or three output channels, due to vapoursynth's limitation.
By using the new flexible output feature, plugins can support onnx models with arbitrary number of output planes.
```python3
from typing import TypedDict
class Output(TypedDict):
clip: vs.VideoNode
num_planes: int
+ 10 more

📦 vsmlrt.py

Added support for [RIFE v4.17](https://github.com/hzwer/Practical-RIFE/tree/f3e48ceb02e4c21bc8868b03994b98f3402ffb3d?tab=readme-ov-file#model-list) models.
Added support for [ArtCNN](https://github.com/Artoriuz/ArtCNN) models optimised for anime content. The chroma variants are not supported on previous versions of vs-mlrt, because they require the flexible output feature.
Added function `flexible_inference` for flexible output:
The above sample is simplified as
```python3
output_planes = flexible_inference(src, network_path) # type: list[vs.VideoNode]
```

📦 TRT

Upgraded to TensorRT [10.1.0](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1010/release-notes/index.html#rel-10-1-0).

📦 known issue

Accoding to the [documentation](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1010/release-notes/index.html#rel-10-1-0),
`There is an up to 4x performance regression for networks containing "GridSample" ops compared to TensorRT 9.2.`
This affects RIFE and SAFA models.
vs-mlrt [v14.test3](https://github.com/AmusementClub/vs-mlrt/releases/tag/v14.test3) is the latest one that is not affected.

📦 Community contributions

Fix `multiple flexible_output_prop keyword argument` error by @LightArrowsEXE in https://github.com/AmusementClub/vs-mlrt/pull/97
Fix missing spaces in exceptions by @LightArrowsEXE in https://github.com/AmusementClub/vs-mlrt/pull/98
Full Changelog: https://github.com/AmusementClub/vs-mlrt/compare/v14...v15

v14: latest librariesv14

github-actions[bot]·2y ago·April 25, 2024

GitHub

📦 General

[External models](https://github.com/AmusementClub/vs-mlrt/releases/tag/external-models) are no longer packaged.

📦 vsmlrt.py

Plugin invocation order in the `get_plugin_path()` function is sorted to reduce memory consumption.
Added support for [RIFE v4.7 ~ v4.16](https://github.com/hzwer/Practical-RIFE/tree/6f9dc10493b9f15391b68a5002f8e5201159634e?tab=readme-ov-file#model-list) (lite, ensemble) models.
Added support for [SCUNet](https://github.com/cszn/SCUNet/tree/52e440a80a655b01e0b41e9dd9bfe599bc11625e) models for image denoising.

📦 plugin and runtime libraries

Upgraded to TensorRT [10.0.1](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/release-notes/index.html#rel-10-0-1).
Maxwell and Pascal GPUs are no longer supported. Other backends still support these GPUs.
Reduce GPU memory usage for dynamically shaped engines when the actual tile size is smaller than the maximum tile size set during engine building.
Reduced engine build time.
Added [long path](https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later) support for engines on Windows.
cuDNN is no longer a strict runtime dependency.

📦 vsmlrt.py

The cuDNN tactic is no longer enabled by default.
TF32 acceleration is disabled by default.
The maximum workspace is set to `None` for the total memory size of the GPU.
Add parameters `builder_optimization_level`, `max_aux_streams`, `bf16` (https://github.com/AmusementClub/vs-mlrt/issues/64), `custom_env`, `custom_args`, `short_path` and `engine_folder` (https://github.com/AmusementClub/vs-mlrt/issues/90):
`builder_optimization_level`: "adjust how long TensorRT should spend searching for tactics with potentially better performance" [link](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/developer-guide/index.html#opt-builder-optimization-level)
`max_aux_streams`: Within-inference multi-streaming, "if enabled, TensorRT will run some layers on the auxiliary streams in parallel to the layers running on the main stream, ..., may increase the memory consumption, ..." [link](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/developer-guide/index.html#within-inference-multi-streaming)
`bf16`: "TensorRT supports the bfloat16 (brain float) floating point format on NVIDIA Ampere and later architectures ... Note that not all layers support bfloat16." [link](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/developer-guide/index.html#bf16)
`custom_env`, `custom_args`: custom environment variable and arguments for trtexec engine build.
+ 3 more

📦 known issues

Accoding to the [documentation](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1001/release-notes/index.html#rel-10-0-1), `There is an up to 4x performance regression for networks containing "GridSample" ops compared to TensorRT 9.2.`This affects RIFE and SAFA models.
trtexec may reports errors like:
`[E] Error[9]: Skipping tactic 0xded5318b4a444b84 due to exception Cask convolution execution`
`[E] Error[2]: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)`
This issue has been submitted to NVIDIA.

📦 ORT

Upgraded to ONNX Runtime [v1.18.0](https://github.com/microsoft/onnxruntime/tree/e5947f57293045c60a7f44c5557bbc022be2f9e7).

📦 interface

The `ORT_*` backends now support fp16 I/O. The semantics of the `fp16` flag in these backends is as follows:
Enabling `fp16` will use a built-in quantization that converts a fp32 onnx to a fp16 onnx. If the input video is of half-precision floating-point format, the generated fp16 onnx will use fp16 input. The output format can be controlled by the `output_format` option (`0 = fp32, 1 = fp16`).
Disabling `fp16` will not use the built-in quantization. However, if the onnx file itself uses fp16 for computation, the actual computation will be done in fp16. In this case, the input video format should match the input format of the onnx, and the output format is inferred from the onnx.

📦 CUDA

Reduced execution overhead.
Added support for TF32 acceleration. This is disabled by default.
Added experimental `prefer_nhwc` flag to reduce the number of layout transformations when using tensor cores. This is disabled by default.

📦 OV

Upgraded to OpenVINO [2024.2.0](https://github.com/openvinotoolkit/openvino/tree/4655dd6ce3fa7947d5f70c3552f31fdda50582d0).
Added experimental `OV_NPU` backend for Intel NPUs.

📦 MIGX

Added support for [MIGraphX](https://github.com/ROCm/AMDMIGraphX) backend for AMD GPUs. Currently this backend is Linux only.

📦 Community contributions

`scripts/vsmlrt.py`: update esrgan janai models by @hooke007 in https://github.com/AmusementClub/vs-mlrt/pull/53
`scripts/vsmlrt.py`: add more esrgan janai models by @hooke007 in https://github.com/AmusementClub/vs-mlrt/pull/82
`vsmigx`: allow fp16 input & output by @abihf in https://github.com/AmusementClub/vs-mlrt/pull/86
`scripts/vsmlrt.py`: fix fp16 precision issues of RIFE v2 representations by @charlessuh in https://github.com/AmusementClub/vs-mlrt/issues/66#issuecomment-1791986979

📦 Benchmark

NVIDIA GeForce RTX 3090, 10496 shaders @ 1695 MHz, driver 552.22, Windows Server 2022, Python 3.11.9, vapoursynth-classic R57.A8
1920x1080 RGBS, TRT backend, CUDA graphs enabled, fp16
Measurements: FPS / Device Memory (MB)
| model | 1 stream | 2 streams | 3 streams |
|:---------------------------------------------:|:----------------:|:-----------------:|:-----------------:|
| dpir color | 10.99 / 1715.172 | 11.62 / 3048.540 | 11.64 / 4381.912 |
| waifu2x upconv_7_{anime_style_art_rgb, photo} | 22.38 / 2016.352 | 32.66 / 3734.880 | 32.54 / 5453.404 |
| waifu2x cunet / cugan | 12.41 / 4359.284 | 15.53 / 8363.392 | 15.47 / 12367.504 |
+ 5 more

v14.test4: latest TensorRT and ONNX Runtime librariesv14.test4Pre-release

github-actions[bot]·2y ago·March 27, 2024

GitHub

📋 Changes

__The `TRT` backend no longer supports Maxwell and Pascal GPUs__. Other backends still support these GPUs. Same as those releases, the current release requires driver version >= 525.
Added support for [SwinIR](https://github.com/JingyunLiang/SwinIR/tree/6545850fbf8df298df73d81f3e8cba638787c8bd) models for image restoration, which are only supported by the `TRT` backend and the `ORT_CPU` backend from vs-mlrt v14.test4 or later. SwinIR-M and SwinIR-L models exhibit precision issue with fp16 implementation, this is under investigation.
Added support for [SCUNet](https://github.com/cszn/SCUNet/tree/52e440a80a655b01e0b41e9dd9bfe599bc11625e) models for image denoising, which are only supported by the `TRT` backend and the `ORT_CPU` backend from vs-mlrt v14.test4 or later.
Added `engine_folder` argument to the `TRT` backend in vsmlrt.py to specify custom directory for engines.
Starting with this pre-release, for dynamically shaped engines, the trt runtime allocates gpu memory based on the actual tile size, whereas in previous releases, the runtime would have to allocate gpu memory based on the maximum tile size set at engine compile time. This feature requires TensorRT 10 or later.
The `ORT_*` backends now support fp16 I/O. The semantics of the `fp16` flag is as follows:
Enabling `fp16` will use a built-in quantization that converts a fp32 onnx to a fp16 onnx. If the input video is of half-precision floating-point format, the generated fp16 onnx will use fp16 input. The output format can be controlled by the `output_format` option (`0 = fp32, 1 = fp16`).
Disabling `fp16` will not use the built-in quantization. However, if the onnx file itself uses fp16 for computation, the actual computation will be done in fp16. In this case, the input video format should match the input format of the onnx, and the output format is inferred from the onnx.
+ 5 more

📦 benchmark 1

[previous benchmark](https://github.com/AmusementClub/vs-mlrt/releases/tag/v14.test3)
RTX 4090
processor clock @ 2520 MHz
Intel Icelake server @ 2100 MHz
Driver 551.86
Windows 10 21H2 (19044.1415)
TensorRT 10.0.0
VapourSynth-Classic R57.A8, vapoursynth-plugin v0.96g3
+ 2 more

📦 general

| model | 1 stream | 2 streams | 3 streams |
|:---------------------------------------------:|:-----------------:|:----------------:|:-----------------:|
| dpir gray | 22.05 / 1818.796 | 25.30 / 3111.114 | 25.33 / 4403.488 |
| dpir color | 18.30 / 1851.632 | 25.13 / 3176.808 | 25.17 / 4501.984 |
| | | | |
| waifu2x upconv_7_{anime_style_art_rgb, photo} | 20.45 / 2148.716 | 41.22 / 3867.240 | 61.21 / 5585.764 |
| waifu2x upresnet10 | 17.91 / 1716.588 | 34.53 / 2941.540 | 42.33 / 4166.492 |
| waifu2x cunet / cugan | 13.89 / 4391.292 | 25.74 / 8346.248 | 25.96 / 12301.202 |
+ 11 more

📦 rife

v2, fp16 i/o
| version | 1 stream | 2 streams | 3 streams | 4 streams | 5 streams |
|:-----------------------------------------------:|:--------------:|:--------------:|:---------------:|:---------------:|:---------------:|
| v4.4-v4.5 | 136.92/778.432 | 273.80/1149.204 | 414.80/1522.028 | 553.70/1892.796 | 574.31/2263.568 |
| v4.6 | 136.01/800.960 | 275.26/1192.212 | 411.01/1585.516 | 544.30/1979.764 | 550.01/2368.020 |
| v4.7-v4.9 | 98.20/1302.724 | 195.78/2187.548 | 210.12/3074.420 | 210.45/3957.196 | 210.66/4844.068 |
| v4.10-v4.15 | 84.41/1595.592 | 160.93/2773.280 | 161.96/3953.020 | 162.04/5132.760 | 162.07/6310.448 |
| {v4.12, v4.13, v4.15, v4.16}_lite | 93.39/1333.444 | 187.32/2255.132 | 197.71/3178.872 | 198.01/4098.508 | 197.95/5022.248 |
+ 1 more

📦 benchmark 2

[previous benchmark](https://github.com/AmusementClub/vs-mlrt/releases/tag/v13.2)
NVIDIA GeForce RTX 3090, 10496 shaders @ 1695 MHz, driver 552.22, Windows Server 2022, Python 3.11.9, vapoursynth-classic R57.A8
Measurements: (1080p, fp16) FPS / Device Memory (MB)
| model | ORT_CUDA NCHW | ORT_CUDA NHWC | ORT_DML |
|:---------------------------------:|:----------------:|:---------------:|:--------------:|
| dpir color | 4.54 / 2573.3 | 5.98 / 2470.9 | 8.45 / 2364.5 |
| dpir color (2 streams) | 4.66 / 4854.9 | 6.30 / 4680.8 | 9.48 / 4630.9 |
| waifu2x upconv7 | 10.98 / 5432.5 | 3.18 / 3017.8 | 12.48 / 4493.0 |
+ 9 more

📦 benchmark 3

NVIDIA GeForce RTX 2080 Ti, 4352 shaders @ 1700 MHz, driver 552.22, Windows 10 LTSC 21H2 (19044.1415), Python 3.11.9, vapoursynth-classic R57.A8
Measurements: (1080p, fp16) FPS / Device Memory (MB)
| model | TRT | ORT_CUDA | ORT_DML | ORT_CUDA NHWC |
|:------------------------------------------:|:-------------:|:--------------:|:------------:|:-------------:|
| dpir color (1 stream) | 7.08 / 1899 | 3.10 / 2602 | 4.99 / 2341 | 4.26 / 2411 |
| dpir color (2 streams) | 8.06 / 3376 | 3.30 / 5016 | 5.85 / 4619 | 4.74 / 4650 |
| | | | | |
| waifu2x upconv7 (1 stream) | 11.47 / 2014 | 7.01 / 4949 | 7.45 / 4501 | 1.59 / 2923 |
+ 20 more

v14.test3: latest TensorRT, MIGraphX backend v14.test3Pre-release

github-actions[bot]·2y ago·December 3, 2023

GitHub

📋 Changes

Same as those releases, it requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.
TensorRT 9.2.0 is officially documented as `for Large Language Models (LLMs) on NVIDIA A100, A10G, L4, L40, L40S, H100 GPUs, and NVIDIA GH200 Grace Hopper™ Superchip only`. The Windows build is downloaded from [here](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/9.2.0/tensorrt-9.2.0.5.windows10.x86_64.cuda-12.2.llm.beta.zip), and can be used on other GPU models.
Users should use the same version of TensorRT as provided (9.2.0) because runtime version checking is [disabled](https://github.com/AmusementClub/vs-mlrt/commit/09a26ecec48c1b015bc1537f968db19473ea69b8) in this release.
Added support for [AnimeJaNai V3](https://github.com/the-database/mpv-upscale-2x_animejanai/releases/tag/3.0.0) models, contributed by contributed by @hooke007 in https://github.com/AmusementClub/vs-mlrt/pull/82.
Added support for [RIFE v4.13 ~ v4.16](https://github.com/hzwer/Practical-RIFE/tree/6f9dc10493b9f15391b68a5002f8e5201159634e?tab=readme-ov-file#model-list) (lite, ensemble) models, which are also available for previous vs-mlrt releases (simply download the new model file [here](https://github.com/AmusementClub/vs-mlrt/releases/tag/external-models) and update `vsmlrt.py`).
The v4.13 ~ v4.15 models should have the same execution speed as the v4.10 - v4.12 models.
The v4.13 lite model, the v4.15 lite model and the v4.16 lite model should all have the same execution speed as the v4.12 lite model, while the v4.14 lite model may run slower.
Added support for fractional video frame interpolation in RIFE.
+ 9 more

📦 benchmark

RTX 4090
processor clock @ 2520 MHz
Intel Icelake server @ 2100 MHz
Driver 551.86
Windows 10 21H2 (19044.1415)
TensorRT 9.2.0
VapourSynth-Classic R57.A8, vapoursynth-plugin v0.96g3
1920x1080 rgbs, CUDA graphs enabled, fp16
+ 1 more

📦 general

| model | 1 stream | 2 streams | 3 streams |
|:-----------------------------------------------:|:--------------:|:--------------:|:---------------:|
| dpir gray | 21.93/1757.352 | 25.48/3049.696 | 25.31/4342.044 |
| dpir color | 18.24/1790.184 | 25.11/3115.360 | 25.22/4440.540 |
| | | | |
| waifu2x upconv_7_{anime_style_art_rgb, photo} | 19.58/2148.716 | 39.87/3867.240 | 59.94/5585.768 |
| waifu2x upresnet10 | 17.40/1655.144 | 34.22/2880.096 | 42.78/4105.048 |
| waifu2x cunet / cugan | 13.64/4391.292 | 25.09/8346.248 | 25.19/12301.208 |
+ 3 more

📦 rife

v2, fp16 i/o
| version | 1 stream | 2 streams | 3 streams | 4 streams | 5 streams |
|:-----------------------------------------------:|:--------------:|:--------------:|:---------------:|:---------------:|:---------------:|
| v4.4-v4.5 | 150.20/622.784 | 301.05/835.860 | 448.90/1053.024 | 615.84/1268.152 | 787.57/1481.224 |
| v4.6 | 147.63/624.832 | 294.53/837.904 | 452.26/1055.072 | 603.63/1270.200 | 764.31/1485.320 |
| v4.7-v4.9 | 132.06/747.712 | 268.63/1075.476 | 403.54/1405.284 | 494.98/1737.152 | 496.41/2064.908 |
| v4.10-v4.15 | 119.09/862.400 | 238.68/1304.852 | 346.98/1749.352 | 349.48/2195.904 | 349.80/2638.356 |
| {v4.12, v4.13, v4.15, v4.16}_lite | 123.72/782.528 | 250.81/1151.252 | 377.27/1522.020 | 403.14/1894.844 | 403.79/2263.568 |
+ 4 more

v14.test2: latest TensorRT libraryv14.test2Pre-release

github-actions[bot]·2y ago·October 23, 2023

GitHub

📋 Changes

Same as `v14.test` release, it requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.
TensorRT 9.1.0 is officially documented as `for Large Language Models (LLMs) on NVIDIA A100, A10G, L4, L40, L40S, H100 GPUs, and NVIDIA GH200 Grace Hopper™ Superchip only` on Linux. The Windows build is downloaded from [here](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/9.1.0/tars/tensorrt-9.1.0.4.windows10.x86_64.cuda-12.2.llm.beta.zip), and can be used on other GPU models.
~On Windows, some users have reported crashes when using it in mpv~ ([#65](https://github.com/AmusementClub/vs-mlrt/discussions/65)). This problem occurs on an earlier version of this release, which is now fixed.
Add parameters `bf16` (https://github.com/AmusementClub/vs-mlrt/issues/64), `custom_env` and `custom_args` to the `TRT` backend.
fp16 execution of `Waifu2xModel.swin_unet_art` is more accurate, faster and uses less GPU memory than bf16 execution ([benchmark](https://github.com/AmusementClub/vs-mlrt/wiki/NVIDIA-GeForce-RTX-4090#waifu2xswin_unet_art))
Device memory usage of model `Waifu2xModel.swin_unet_art` is reduced compared to TensorRT 9.0.1 on A10G with 1080p input (at 2.66 fps with 7.0GB VRAM usage) with default auxiliary stream heuristic.
TensorRT 9.0.1 using 7 auxiliary streams compared to TensorRT 9.1.0 (3 streams) results in significantly more device memory with no performance gain.
Setting `max_aux_streams=3` lowers device memory usage of TensorRT 9.0.1 to ~8.9GB, and `max_aux_streams=0` corresponds to ~7.3GB usage.
+ 17 more

v14.test: latest TensorRT libraryv14.testPre-release

github-actions[bot]·3y ago·March 13, 2023

GitHub

📋 Changes

It requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.
Add parameters `builder_optimization_level` and `max_aux_streams` to the `TRT` backend.
`builder_optimization_level`: "adjust how long TensorRT should spend searching for tactics with potentially better performance" [link](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt-builder-optimization-level)
`max_aux_streams`: Within-inference multi-streaming, "if enabled, TensorRT will run some layers on the auxiliary streams in parallel to the layers running on the main stream, ..., may increase the memory consumption, ..." [link](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#within-inference-multi-streaming)
__It is advised to lower `max_aux_streams` to 0 on heavy models like `Waifu2xModel.swin_unet_art` to reduce memory usage.__ Check the benchmark data at the bottom.
Following TensorRT 8.6.1, `cudnn` tactic source of the `TRT` backend is disabled by default. `tf32` is also disabled by default in vsmlrt.py.
Add parameter `short_path` to the `TRT` backend, which shortens engine path and is enabled on Windows by default.
Model `Waifu2xModel.swin_unet_art` seems does not work with `builder_optimization_level=5` from the `TRT` backend before TRT 9.0. Use `builder_optimization_level=4` or lower instead.
+ 5 more

View all releases on GitHub

← Back to vs-mlrt wiki