trt ¶

Type Aliases:

Shape –

Classes:

TRT –

TensorRT backend for Nvidia GPUs using the core.trt plugin.
TRT_RTX –

TensorRT RTX backend for Nvidia RTX GPUs using the core.trt_rtx plugin.

Functions:

cuda_device –

Set cuda device id within this context.

Attributes:

LOGGING_VERBOSITY_MAP –
logger –

LOGGING_VERBOSITY_MAP `module-attribute` ¶

LOGGING_VERBOSITY_MAP = {DEBUG: 0, INFO: 1, WARNING: 2, ERROR: 3, CRITICAL: 4}

logger `module-attribute` ¶

logger = getLogger(__name__)

Shape ¶

Shape = tuple[int, int]

TRT `dataclass` ¶

TRT(
    *,
    device_id: int = 0,
    num_streams: int = 1,
    use_cuda_graph: bool = True,
    verbosity: SupportsInt | Severity | Severity | None = None,
    fp16: bool | None = None,
    fp16_blacklist_ops: Collection[str] | None = None,
    bf16: bool | None = None,
    tf32: bool = False,
    strict_nans: bool = False,
    static_shape: bool = True,
    min_shapes: Shape = (0, 0),
    opt_shapes: Shape | None = None,
    max_shapes: Shape | None = None,
    edge_mask_convolutions: bool = True,
    jit_convolutions: bool = True,
    sparse_weights: bool = False,
    workspace: int | None = None,
    builder_optimization_level: int = 3,
    max_aux_streams: int | None = None,
    max_num_tactics: int | None = None,
    tiling_optimization_level: SupportsInt
    | TilingOptimizationLevel
    | TilingOptimizationLevel = 0,
    l2_limit_for_tiling: int = -1,
    avg_timing_iterations: int = 1,
    tactic_dram: int | None = None,
    weight_streaming: bool = False,
    force_rebuild: bool = False,
    max_threads: int | None = None,
)

Bases: Backend

TensorRT backend for Nvidia GPUs using the core.trt plugin.

Methods:

autoselect –

Try to select the best backend for the current system.
build –
build_engine –

Build or retrieve a cached TensorRT engine.
configure_builder_config –
configure_optimization_settings –
configure_tactic_sources –
get_args –

Return backend plugin arguments derived from this configuration.
get_identity –
inference –

Run inference with this backend.
setup_optimization_profile –

Attributes:

avg_timing_iterations (int) –

Number of averaging iterations when timing tactics. Higher values produce more stable tactic selection.
bf16 (bool | None) –

Convert the ONNX model to BF16 before building. Default to False.
builder_optimization_level (int) –

TensorRT builder optimization level.
device_id (int) –

CUDA device index.
edge_mask_convolutions (bool) –

Enable TensorRT edge-mask convolution tactics.
flexible_output_prop (str) –
force_rebuild (bool) –

Force a full engine rebuild, ignoring any cached engine.
fp16 (bool | None) –

Convert the ONNX model to FP16 before building. Default to True.
fp16_blacklist_ops (Collection[str] | None) –

ONNX node or op names to keep in FP32 during FP16 conversion.
jit_convolutions (bool) –

Enable TensorRT JIT convolution tactics.
l2_limit_for_tiling (int) –

L2 cache usage hint for tiling optimization.
logger (ILogger) –
max_aux_streams (int | None) –

Maximum auxiliary streams used by TensorRT kernels.
max_num_tactics (int | None) –

Maximum number of tactics considered per layer.
max_shapes (Shape | None) –

Maximum dynamic input tile size as (width, height). Defaults to the inference tile size.
max_threads (int | None) –

Maximum number of builder threads. Limits CPU usage during engine build.
min_shapes (Shape) –

Minimum dynamic input tile size as (width, height).
num_streams (int) –

Number of parallel plugin inference streams.
opt_shapes (Shape | None) –

Optimal input tile size as (width, height). Defaults to the inference tile size.
plugin (Plugin) –
sparse_weights (bool) –

Allow the builder to exploit structured sparsity in weights.
static_shape (bool) –

Build a fixed-shape engine when true.
strict_nans (bool) –

Disable float optimizations (0*x => 0, x-x => 0, x/x => 1) to preserve NaN/Inf propagation.
tactic_dram (int | None) –

DRAM limit in bytes for the optimizer during tactic selection. Prevents OOM on memory-constrained systems.
tf32 (bool) –

Allow TensorRT TF32 tactics.
tiling_optimization_level (SupportsInt | TilingOptimizationLevel | TilingOptimizationLevel) –

TensorRT tiling optimization search level.
use_cuda_graph (bool) –

Enable CUDA graph execution for compatible engines to improve performance and reduce CPU overhead.
verbosity (SupportsInt | Severity | Severity | None) –

TensorRT/plugin logging severity.
version (tuple[int, int, int]) –
weight_streaming (bool) –

Stream weights from host to device to reduce GPU memory at the cost of performance.
workspace (int | None) –

Workspace memory pool limit in bytes.

avg_timing_iterations `class-attribute` `instance-attribute` ¶

avg_timing_iterations: int = 1

Number of averaging iterations when timing tactics. Higher values produce more stable tactic selection.

bf16 `class-attribute` `instance-attribute` ¶

bf16: bool | None = None

Convert the ONNX model to BF16 before building. Default to False.

builder_optimization_level `class-attribute` `instance-attribute` ¶

builder_optimization_level: int = 3

TensorRT builder optimization level.

device_id `class-attribute` `instance-attribute` ¶

device_id: int = 0

CUDA device index.

edge_mask_convolutions `class-attribute` `instance-attribute` ¶

edge_mask_convolutions: bool = True

Enable TensorRT edge-mask convolution tactics.

flexible_output_prop `class-attribute` ¶

flexible_output_prop: str = 'MlrtFlexible'

force_rebuild `class-attribute` `instance-attribute` ¶

force_rebuild: bool = field(default=False, repr=False)

Force a full engine rebuild, ignoring any cached engine.

fp16 `class-attribute` `instance-attribute` ¶

fp16: bool | None = None

Convert the ONNX model to FP16 before building. Default to True.

fp16_blacklist_ops `class-attribute` `instance-attribute` ¶

fp16_blacklist_ops: Collection[str] | None = None

ONNX node or op names to keep in FP32 during FP16 conversion.

jit_convolutions `class-attribute` `instance-attribute` ¶

jit_convolutions: bool = True

Enable TensorRT JIT convolution tactics.

l2_limit_for_tiling `class-attribute` `instance-attribute` ¶

l2_limit_for_tiling: int = -1

L2 cache usage hint for tiling optimization.

logger `property` ¶

logger: ILogger

max_aux_streams `class-attribute` `instance-attribute` ¶

max_aux_streams: int | None = None

Maximum auxiliary streams used by TensorRT kernels.

max_num_tactics `class-attribute` `instance-attribute` ¶

max_num_tactics: int | None = None

Maximum number of tactics considered per layer.

max_shapes `class-attribute` `instance-attribute` ¶

max_shapes: Shape | None = None

Maximum dynamic input tile size as (width, height). Defaults to the inference tile size.

max_threads `class-attribute` `instance-attribute` ¶

max_threads: int | None = field(default=None, repr=False)

Maximum number of builder threads. Limits CPU usage during engine build.

min_shapes `class-attribute` `instance-attribute` ¶

min_shapes: Shape = (0, 0)

Minimum dynamic input tile size as (width, height).

num_streams `class-attribute` `instance-attribute` ¶

num_streams: int = 1

Number of parallel plugin inference streams.

opt_shapes `class-attribute` `instance-attribute` ¶

opt_shapes: Shape | None = None

Optimal input tile size as (width, height). Defaults to the inference tile size.

plugin `class-attribute` ¶

plugin: Plugin = core.lazy.trt

sparse_weights `class-attribute` `instance-attribute` ¶

sparse_weights: bool = False

Allow the builder to exploit structured sparsity in weights.

static_shape `class-attribute` `instance-attribute` ¶

static_shape: bool = True

Build a fixed-shape engine when true.

strict_nans `class-attribute` `instance-attribute` ¶

strict_nans: bool = False

Disable float optimizations (0*x => 0, x-x => 0, x/x => 1) to preserve NaN/Inf propagation.

tactic_dram `class-attribute` `instance-attribute` ¶

tactic_dram: int | None = None

DRAM limit in bytes for the optimizer during tactic selection. Prevents OOM on memory-constrained systems.

tf32 `class-attribute` `instance-attribute` ¶

tf32: bool = False

Allow TensorRT TF32 tactics.

tiling_optimization_level `class-attribute` `instance-attribute` ¶

tiling_optimization_level: (
    SupportsInt | TilingOptimizationLevel | TilingOptimizationLevel
) = 0

TensorRT tiling optimization search level.

use_cuda_graph `class-attribute` `instance-attribute` ¶

use_cuda_graph: bool = True

Enable CUDA graph execution for compatible engines to improve performance and reduce CPU overhead.

verbosity `class-attribute` `instance-attribute` ¶

verbosity: SupportsInt | Severity | Severity | None = field(
    default=None, repr=False
)

TensorRT/plugin logging severity.

version `property` ¶

version: tuple[int, int, int]

weight_streaming `class-attribute` `instance-attribute` ¶

weight_streaming: bool = False

Stream weights from host to device to reduce GPU memory at the cost of performance.

workspace `class-attribute` `instance-attribute` ¶

workspace: int | None = None

Workspace memory pool limit in bytes.

autoselect `staticmethod` ¶

autoselect(device_id: int = 0, **kwargs: Any) -> Backend

Try to select the best backend for the current system.

Parameters:

device_id ¶
(int, default: 0 ) –

The GPU device id.
**kwargs ¶
(Any, default: {} ) –

Additional arguments to pass to the backend.

Returns:

Backend –

The selected backend.

Source code in vsscale/mlrt/backend/base.py

@staticmethod
def autoselect(device_id: int = 0, **kwargs: Any) -> Backend:
    """
    Try to select the best backend for the current system.

    Args:
        device_id: The GPU device id.
        **kwargs: Additional arguments to pass to the backend.

    Returns:
        The selected backend.
    """

    gpu = get_gpu(device_id)
    vendor = None if not gpu else str(gpu.vendor).strip()

    match vendor:
        # Windows & Linux
        case "NVIDIA Corporation":
            if hasattr(core, "trt"):
                backend = UserBackend.TRT
            elif hasattr(core, "trt_rtx"):
                backend = UserBackend.TRT_RTX
            elif platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "ort"):
                backend = UserBackend.ORT_CUDA
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN
            else:
                backend = UserBackend.OV_CPU
        # Windows & Linux
        case "Advanced Micro Devices, Inc.":
            if platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "migx"):
                backend = UserBackend.MIGX
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            else:
                backend = UserBackend.OV_CPU
        # Windows & Linux
        case "Intel(R) Corporation":
            if hasattr(core, "ov"):
                backend = UserBackend.OV_GPU
            elif platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            else:
                backend = UserBackend.OV_CPU
        # macOS ARM64 & x86_64
        case "Apple":
            if hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            elif hasattr(core, "ort"):
                backend = UserBackend.ORT_COREML
            else:
                backend = UserBackend.OV_CPU
        case _:
            backend = UserBackend.OV_CPU

    del gpu

    if hasattr(backend, "device_id"):
        kwargs["device_id"] = device_id

    return backend(**kwargs)

build ¶

build(
    network_path: Path,
    engine_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str,
) -> None

Source code in vsscale/mlrt/backend/trt.py

def build(
    self,
    network_path: Path,
    engine_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str,
) -> None:
    with cuda_device(self.device_id):
        trt_logger = self.logger
        builder = self.trt.Builder(trt_logger)

        if self.max_threads is not None:
            builder.max_threads = self.max_threads

        network = builder.create_network()
        parser = self.trt.OnnxParser(network, trt_logger)

        if not parser.parse_from_file(str(network_path)):
            errors = [str(parser.get_error(i)) for i in range(parser.num_errors)]
            raise CustomRuntimeError(f"Failed to parse ONNX model: {network_path}\n" + "\n".join(errors))

        config = builder.create_builder_config()

        # Delegate builder setup
        self.configure_builder_config(config, network)
        self.setup_optimization_profile(builder, network, config, channels, input_name, tilesize)

        # Timing Cache
        timing_cache_path = Path(f"{engine_path}.cache")
        timing_cache_data = b""
        if timing_cache_path.exists():
            timing_cache_data = timing_cache_path.read_bytes()

        timing_cache = config.create_timing_cache(timing_cache_data)
        config.set_timing_cache(timing_cache, ignore_mismatch=True)

        # Build
        logger.info(f"Building TensorRT {self.__class__.__name__} engine from {network_path}...")
        serialized = builder.build_serialized_network(network, config)

        if not serialized:
            raise CustomRuntimeError(f"TensorRT engine build failed for {network_path}")

        engine_path.write_bytes(serialized)

        # Save Cache
        updated_cache = config.get_timing_cache()
        timing_cache_path.write_bytes(updated_cache.serialize())

    logger.info(f"Engine saved to {engine_path}")

build_engine ¶

build_engine(
    network_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str = "input",
) -> Path

Build or retrieve a cached TensorRT engine.

Parameters:

network_path ¶
(Path) –

Path to the ONNX model.
channels ¶
(int) –

Number of model input channels.
tilesize ¶
(Shape) –

Inference tile size as (width, height).
input_name ¶
(str, default: 'input' ) –

Name of the model input tensor.

Returns:

Path –

Path to the serialized engine file.

Source code in vsscale/mlrt/backend/trt.py

def build_engine(self, network_path: Path, channels: int, tilesize: Shape, input_name: str = "input") -> Path:
    """
    Build or retrieve a cached TensorRT engine.

    Args:
        network_path: Path to the ONNX model.
        channels: Number of model input channels.
        tilesize: Inference tile size as `(width, height)`.
        input_name: Name of the model input tensor.

    Returns:
        Path to the serialized engine file.
    """
    if self.fp16:
        network_path = self._convert_onnx_fp16(network_path)
    elif self.bf16:
        network_path = self._convert_onnx_bf16(network_path)

    identity = self.get_identity(network_path, channels, tilesize)
    engine_path = get_artifact_path(f"{identity}.engine", fallback=not self.force_rebuild)

    if not self.force_rebuild and engine_path.is_file() and engine_path.stat().st_size >= 1024:
        return engine_path

    engine_path.parent.mkdir(parents=True, exist_ok=True)

    self.build(
        network_path=network_path,
        engine_path=engine_path,
        channels=channels,
        tilesize=tilesize,
        input_name=input_name,
    )

    return engine_path

configure_builder_config ¶

configure_builder_config(
    config: IBuilderConfig, network: INetworkDefinition
) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_builder_config(self, config: trt.IBuilderConfig, network: trt.INetworkDefinition) -> None:
    if self.workspace is not None:
        config.set_memory_pool_limit(self.trt.MemoryPoolType.WORKSPACE, self.workspace)

    if self.tactic_dram is not None:
        config.set_memory_pool_limit(self.trt.MemoryPoolType.TACTIC_DRAM, self.tactic_dram)

    if not self.tf32:
        config.flags &= ~(1 << self.trt.BuilderFlag.TF32.value)

    if self.sparse_weights:
        config.flags |= 1 << self.trt.BuilderFlag.SPARSE_WEIGHTS.value

    if self.strict_nans:
        config.flags |= 1 << self.trt.BuilderFlag.STRICT_NANS.value

    if self.weight_streaming:
        config.flags |= 1 << self.trt.BuilderFlag.WEIGHT_STREAMING.value

    self.configure_tactic_sources(config)
    self.configure_optimization_settings(config)

configure_optimization_settings ¶

configure_optimization_settings(config: IBuilderConfig) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_optimization_settings(self, config: trt.IBuilderConfig) -> None:
    config.builder_optimization_level = self.builder_optimization_level
    config.avg_timing_iterations = self.avg_timing_iterations

    if self.max_aux_streams is not None:
        config.max_aux_streams = self.max_aux_streams

    if self.max_num_tactics is not None:
        config.max_num_tactics = self.max_num_tactics

    if int(self.tiling_optimization_level) != 0:
        config.tiling_optimization_level = self.trt.TilingOptimizationLevel(self.tiling_optimization_level)
        config.l2_limit_for_tiling = self.l2_limit_for_tiling

configure_tactic_sources ¶

configure_tactic_sources(config: IBuilderConfig) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_tactic_sources(self, config: trt.IBuilderConfig) -> None:
    tactic_sources = config.get_tactic_sources()

    if self.edge_mask_convolutions:
        tactic_sources |= 1 << self.trt.TacticSource.EDGE_MASK_CONVOLUTIONS.value
    else:
        tactic_sources &= ~(1 << self.trt.TacticSource.EDGE_MASK_CONVOLUTIONS.value)

    if self.jit_convolutions:
        tactic_sources |= 1 << self.trt.TacticSource.JIT_CONVOLUTIONS.value
    else:
        tactic_sources &= ~(1 << self.trt.TacticSource.JIT_CONVOLUTIONS.value)

    config.set_tactic_sources(tactic_sources)

get_args ¶

get_args(clips: VideoNode | Sequence[VideoNode]) -> dict[str, Any]

Return backend plugin arguments derived from this configuration.

Source code in vsscale/mlrt/backend/trt.py

def get_args(self, clips: vs.VideoNode | Sequence[vs.VideoNode]) -> dict[str, Any]:
    return {
        "device_id": self.device_id,
        "use_cuda_graph": self.use_cuda_graph,
        "num_streams": self.num_streams,
        "verbosity": self.verbosity,
    }

get_identity ¶

get_identity(network_path: Path, channels: int, tilesize: Shape) -> int

Source code in vsscale/mlrt/backend/trt.py

def get_identity(self, network_path: Path, channels: int, tilesize: Shape) -> int:
    checksum = zlib.crc32(network_path.read_bytes())

    command = [
        "nvidia-smi",
        "-i",
        str(self.device_id),
        "--query-gpu=name,driver_version",
        "--format=csv,noheader,nounits",
    ]
    res = subprocess.run(command, capture_output=True, text=True, check=True)
    device = [d.strip().replace(" ", "_") for d in res.stdout.split(",")]

    components = (
        str(self),
        str(self.version),
        str(sys.version_info[:2]),
        network_path.name,
        f"{checksum:x}",
        str(channels),
        str(tilesize),
        *device,
    )
    return zlib.crc32(bytes("|".join(components), "utf-8"))

inference ¶

inference(
    clips: VideoNode | Sequence[VideoNode],
    network_path: str | PathLike[str],
    /,
    overlap: tuple[int, int],
    tilesize: tuple[int, int],
    *,
    flexible: bool = False,
    **kwargs: Any,
) -> VideoNode | list[VideoNode]

Run inference with this backend.

Parameters:

clips ¶
(VideoNode | Sequence[VideoNode]) –

Input clip or clips passed to the backend model.
network_path ¶
(str | PathLike[str]) –

Path to the model file or backend artifact.
overlap ¶
(tuple[int, int]) –

Horizontal and vertical tile overlap in pixels.
tilesize ¶
(tuple[int, int]) –

Horizontal and vertical tile size in pixels.
flexible ¶
(bool, default: False ) –

Return each flexible output plane as a separate clip.
**kwargs ¶
(Any, default: {} ) –

Additional backend plugin arguments forwarded unchanged.

Returns:

VideoNode | list[VideoNode] –

A single output clip, or a list of output clips when flexible is enabled.

Source code in vsscale/mlrt/backend/trt.py

@copy_signature(Backend.inference)
def inference(
    self,
    clips: vs.VideoNode | Sequence[vs.VideoNode],
    network_path: str | os.PathLike[str],
    /,
    overlap: tuple[int, int],
    tilesize: tuple[int, int],
    *,
    flexible: bool = False,
    **kwargs: Any,
) -> vs.VideoNode | list[vs.VideoNode]:
    UnsupportedSampleTypeError.check(clips, vs.FLOAT, self.__class__)

    clips = to_arr(clips)
    channels = sum(c.format.num_planes for c in clips)
    bitdepth = max(c.format.bits_per_sample for c in clips)

    engine_path = self.build_engine(Path(network_path), channels, tilesize)

    if self.fp16 or self.bf16:
        # Clips must be in fp16 format is fp16 or bf16 mode is enabled,
        # otherwise the TRT plugins error out.
        clips = [depth(c, 16, sample_type=vs.SampleType.FLOAT) for c in clips]
    else:
        clips = [depth(c, 32) for c in clips]

    res = super().inference(clips, engine_path, overlap, tilesize, flexible=flexible, **kwargs)

    return (
        depth(res, bitdepth, sample_type=vs.FLOAT)
        if isinstance(res, vs.VideoNode)
        else [depth(r, bitdepth, sample_type=vs.FLOAT) for r in res]
    )

setup_optimization_profile ¶

setup_optimization_profile(
    builder: Builder,
    network: INetworkDefinition,
    config: IBuilderConfig,
    channels: int,
    input_name: str,
    tilesize: Shape,
) -> None

Source code in vsscale/mlrt/backend/trt.py

def setup_optimization_profile(
    self,
    builder: trt.Builder,
    network: trt.INetworkDefinition,
    config: trt.IBuilderConfig,
    channels: int,
    input_name: str,
    tilesize: Shape,
) -> None:
    profile = builder.create_optimization_profile()
    opt_shapes = self.trt.Dims(self.opt_shapes or tilesize)
    max_shapes = self.trt.Dims(self.max_shapes or tilesize)

    input_names = [network.get_input(i).name for i in range(network.num_inputs)]
    if input_name not in input_names:
        logger.debug("input_name %r isn't in the input network", input_name)
        if network.num_inputs == 1:
            input_name = input_names[0]
        else:
            raise CustomValueError(f"Input name '{input_name}' not found in network inputs: {input_names}")

    if self.static_shape:
        shape = self.trt.Dims((1, channels, opt_shapes[1], opt_shapes[0]))

        for i in range(network.num_inputs):
            input_tensor = network.get_input(i)
            if input_tensor.name == input_name:
                input_tensor.shape = shape

        profile.set_shape(input_name, shape, shape, shape)
    else:
        profile.set_shape(
            input_name,
            self.trt.Dims((1, channels, self.min_shapes[1], self.min_shapes[0])),
            self.trt.Dims((1, channels, opt_shapes[1], opt_shapes[0])),
            self.trt.Dims((1, channels, max_shapes[1], max_shapes[0])),
        )

    config.add_optimization_profile(profile)

TRT_RTX `dataclass` ¶

TRT_RTX(
    *,
    device_id: int = 0,
    num_streams: int = 1,
    use_cuda_graph: bool = True,
    verbosity: SupportsInt | Severity | Severity | None = None,
    fp16: bool | None = None,
    fp16_blacklist_ops: Collection[str] | None = None,
    bf16: bool | None = None,
    tf32: bool = False,
    strict_nans: bool = False,
    static_shape: bool = True,
    min_shapes: Shape = (0, 0),
    opt_shapes: Shape | None = None,
    max_shapes: Shape | None = None,
    edge_mask_convolutions: bool = True,
    jit_convolutions: bool = True,
    sparse_weights: bool = False,
    workspace: int | None = None,
    builder_optimization_level: int = 3,
    max_aux_streams: int | None = None,
    max_num_tactics: int | None = None,
    tiling_optimization_level: SupportsInt
    | TilingOptimizationLevel
    | TilingOptimizationLevel = 0,
    l2_limit_for_tiling: int = -1,
    avg_timing_iterations: int = 1,
    tactic_dram: int | None = None,
    weight_streaming: bool = False,
    force_rebuild: bool = False,
    max_threads: int | None = None,
)

Bases: TRT

TensorRT RTX backend for Nvidia RTX GPUs using the core.trt_rtx plugin.

Methods:

autoselect –

Try to select the best backend for the current system.
build –
build_engine –

Build or retrieve a cached TensorRT engine.
configure_builder_config –
configure_optimization_settings –
configure_tactic_sources –
get_args –

Return backend plugin arguments derived from this configuration.
get_identity –
inference –

Run inference with this backend.
setup_optimization_profile –

Attributes:

avg_timing_iterations (int) –

Number of averaging iterations when timing tactics. Higher values produce more stable tactic selection.
bf16 (bool | None) –

Convert the ONNX model to BF16 before building. Default to False.
builder_optimization_level (int) –

TensorRT builder optimization level.
device_id (int) –

CUDA device index.
edge_mask_convolutions (bool) –

Enable TensorRT edge-mask convolution tactics.
flexible_output_prop (str) –
force_rebuild (bool) –

Force a full engine rebuild, ignoring any cached engine.
fp16 (bool | None) –

Convert the ONNX model to FP16 before building. Default to True.
fp16_blacklist_ops (Collection[str] | None) –

ONNX node or op names to keep in FP32 during FP16 conversion.
jit_convolutions (bool) –

Enable TensorRT JIT convolution tactics.
l2_limit_for_tiling (int) –

L2 cache usage hint for tiling optimization.
logger (ILogger) –
max_aux_streams (int | None) –

Maximum auxiliary streams used by TensorRT kernels.
max_num_tactics (int | None) –

Maximum number of tactics considered per layer.
max_shapes (Shape | None) –

Maximum dynamic input tile size as (width, height). Defaults to the inference tile size.
max_threads (int | None) –

Maximum number of builder threads. Limits CPU usage during engine build.
min_shapes (Shape) –

Minimum dynamic input tile size as (width, height).
num_streams (int) –

Number of parallel plugin inference streams.
opt_shapes (Shape | None) –

Optimal input tile size as (width, height). Defaults to the inference tile size.
plugin (Plugin) –
sparse_weights (bool) –

Allow the builder to exploit structured sparsity in weights.
static_shape (bool) –

Build a fixed-shape engine when true.
strict_nans (bool) –

Disable float optimizations (0*x => 0, x-x => 0, x/x => 1) to preserve NaN/Inf propagation.
tactic_dram (int | None) –

DRAM limit in bytes for the optimizer during tactic selection. Prevents OOM on memory-constrained systems.
tf32 (bool) –

Allow TensorRT TF32 tactics.
tiling_optimization_level (SupportsInt | TilingOptimizationLevel | TilingOptimizationLevel) –

TensorRT tiling optimization search level.
use_cuda_graph (bool) –

Enable CUDA graph execution for compatible engines to improve performance and reduce CPU overhead.
verbosity (SupportsInt | Severity | Severity | None) –

TensorRT/plugin logging severity.
version (tuple[int, int, int]) –
weight_streaming (bool) –

Stream weights from host to device to reduce GPU memory at the cost of performance.
workspace (int | None) –

Workspace memory pool limit in bytes.

avg_timing_iterations `class-attribute` `instance-attribute` ¶

avg_timing_iterations: int = 1

Number of averaging iterations when timing tactics. Higher values produce more stable tactic selection.

bf16 `class-attribute` `instance-attribute` ¶

bf16: bool | None = None

Convert the ONNX model to BF16 before building. Default to False.

builder_optimization_level `class-attribute` `instance-attribute` ¶

builder_optimization_level: int = 3

TensorRT builder optimization level.

device_id `class-attribute` `instance-attribute` ¶

device_id: int = 0

CUDA device index.

edge_mask_convolutions `class-attribute` `instance-attribute` ¶

edge_mask_convolutions: bool = True

Enable TensorRT edge-mask convolution tactics.

flexible_output_prop `class-attribute` ¶

flexible_output_prop: str = 'MlrtFlexible'

force_rebuild `class-attribute` `instance-attribute` ¶

force_rebuild: bool = field(default=False, repr=False)

Force a full engine rebuild, ignoring any cached engine.

fp16 `class-attribute` `instance-attribute` ¶

fp16: bool | None = None

Convert the ONNX model to FP16 before building. Default to True.

fp16_blacklist_ops `class-attribute` `instance-attribute` ¶

fp16_blacklist_ops: Collection[str] | None = None

ONNX node or op names to keep in FP32 during FP16 conversion.

jit_convolutions `class-attribute` `instance-attribute` ¶

jit_convolutions: bool = True

Enable TensorRT JIT convolution tactics.

l2_limit_for_tiling `class-attribute` `instance-attribute` ¶

l2_limit_for_tiling: int = -1

L2 cache usage hint for tiling optimization.

logger `property` ¶

logger: ILogger

max_aux_streams `class-attribute` `instance-attribute` ¶

max_aux_streams: int | None = None

Maximum auxiliary streams used by TensorRT kernels.

max_num_tactics `class-attribute` `instance-attribute` ¶

max_num_tactics: int | None = None

Maximum number of tactics considered per layer.

max_shapes `class-attribute` `instance-attribute` ¶

max_shapes: Shape | None = None

Maximum dynamic input tile size as (width, height). Defaults to the inference tile size.

max_threads `class-attribute` `instance-attribute` ¶

max_threads: int | None = field(default=None, repr=False)

Maximum number of builder threads. Limits CPU usage during engine build.

min_shapes `class-attribute` `instance-attribute` ¶

min_shapes: Shape = (0, 0)

Minimum dynamic input tile size as (width, height).

num_streams `class-attribute` `instance-attribute` ¶

num_streams: int = 1

Number of parallel plugin inference streams.

opt_shapes `class-attribute` `instance-attribute` ¶

opt_shapes: Shape | None = None

Optimal input tile size as (width, height). Defaults to the inference tile size.

plugin `class-attribute` ¶

plugin: Plugin = core.lazy.trt_rtx

sparse_weights `class-attribute` `instance-attribute` ¶

sparse_weights: bool = False

Allow the builder to exploit structured sparsity in weights.

static_shape `class-attribute` `instance-attribute` ¶

static_shape: bool = True

Build a fixed-shape engine when true.

strict_nans `class-attribute` `instance-attribute` ¶

strict_nans: bool = False

Disable float optimizations (0*x => 0, x-x => 0, x/x => 1) to preserve NaN/Inf propagation.

tactic_dram `class-attribute` `instance-attribute` ¶

tactic_dram: int | None = None

DRAM limit in bytes for the optimizer during tactic selection. Prevents OOM on memory-constrained systems.

tf32 `class-attribute` `instance-attribute` ¶

tf32: bool = False

Allow TensorRT TF32 tactics.

tiling_optimization_level `class-attribute` `instance-attribute` ¶

tiling_optimization_level: (
    SupportsInt | TilingOptimizationLevel | TilingOptimizationLevel
) = 0

TensorRT tiling optimization search level.

use_cuda_graph `class-attribute` `instance-attribute` ¶

use_cuda_graph: bool = True

Enable CUDA graph execution for compatible engines to improve performance and reduce CPU overhead.

verbosity `class-attribute` `instance-attribute` ¶

verbosity: SupportsInt | Severity | Severity | None = field(
    default=None, repr=False
)

TensorRT/plugin logging severity.

version `property` ¶

version: tuple[int, int, int]

weight_streaming `class-attribute` `instance-attribute` ¶

weight_streaming: bool = False

Stream weights from host to device to reduce GPU memory at the cost of performance.

workspace `class-attribute` `instance-attribute` ¶

workspace: int | None = None

Workspace memory pool limit in bytes.

autoselect `staticmethod` ¶

autoselect(device_id: int = 0, **kwargs: Any) -> Backend

Try to select the best backend for the current system.

Parameters:

device_id ¶
(int, default: 0 ) –

The GPU device id.
**kwargs ¶
(Any, default: {} ) –

Additional arguments to pass to the backend.

Returns:

Backend –

The selected backend.

Source code in vsscale/mlrt/backend/base.py

@staticmethod
def autoselect(device_id: int = 0, **kwargs: Any) -> Backend:
    """
    Try to select the best backend for the current system.

    Args:
        device_id: The GPU device id.
        **kwargs: Additional arguments to pass to the backend.

    Returns:
        The selected backend.
    """

    gpu = get_gpu(device_id)
    vendor = None if not gpu else str(gpu.vendor).strip()

    match vendor:
        # Windows & Linux
        case "NVIDIA Corporation":
            if hasattr(core, "trt"):
                backend = UserBackend.TRT
            elif hasattr(core, "trt_rtx"):
                backend = UserBackend.TRT_RTX
            elif platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "ort"):
                backend = UserBackend.ORT_CUDA
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN
            else:
                backend = UserBackend.OV_CPU
        # Windows & Linux
        case "Advanced Micro Devices, Inc.":
            if platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "migx"):
                backend = UserBackend.MIGX
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            else:
                backend = UserBackend.OV_CPU
        # Windows & Linux
        case "Intel(R) Corporation":
            if hasattr(core, "ov"):
                backend = UserBackend.OV_GPU
            elif platform.system().lower() == "windows" and hasattr(core, "ort"):
                backend = UserBackend.ORT_DML
            elif hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            else:
                backend = UserBackend.OV_CPU
        # macOS ARM64 & x86_64
        case "Apple":
            if hasattr(core, "ncnn"):
                backend = UserBackend.NCNN_VK
            elif hasattr(core, "ort"):
                backend = UserBackend.ORT_COREML
            else:
                backend = UserBackend.OV_CPU
        case _:
            backend = UserBackend.OV_CPU

    del gpu

    if hasattr(backend, "device_id"):
        kwargs["device_id"] = device_id

    return backend(**kwargs)

build ¶

build(
    network_path: Path,
    engine_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str,
) -> None

Source code in vsscale/mlrt/backend/trt.py

def build(
    self,
    network_path: Path,
    engine_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str,
) -> None:
    with cuda_device(self.device_id):
        trt_logger = self.logger
        builder = self.trt.Builder(trt_logger)

        if self.max_threads is not None:
            builder.max_threads = self.max_threads

        network = builder.create_network()
        parser = self.trt.OnnxParser(network, trt_logger)

        if not parser.parse_from_file(str(network_path)):
            errors = [str(parser.get_error(i)) for i in range(parser.num_errors)]
            raise CustomRuntimeError(f"Failed to parse ONNX model: {network_path}\n" + "\n".join(errors))

        config = builder.create_builder_config()

        # Delegate builder setup
        self.configure_builder_config(config, network)
        self.setup_optimization_profile(builder, network, config, channels, input_name, tilesize)

        # Timing Cache
        timing_cache_path = Path(f"{engine_path}.cache")
        timing_cache_data = b""
        if timing_cache_path.exists():
            timing_cache_data = timing_cache_path.read_bytes()

        timing_cache = config.create_timing_cache(timing_cache_data)
        config.set_timing_cache(timing_cache, ignore_mismatch=True)

        # Build
        logger.info(f"Building TensorRT {self.__class__.__name__} engine from {network_path}...")
        serialized = builder.build_serialized_network(network, config)

        if not serialized:
            raise CustomRuntimeError(f"TensorRT engine build failed for {network_path}")

        engine_path.write_bytes(serialized)

        # Save Cache
        updated_cache = config.get_timing_cache()
        timing_cache_path.write_bytes(updated_cache.serialize())

    logger.info(f"Engine saved to {engine_path}")

build_engine ¶

build_engine(
    network_path: Path,
    channels: int,
    tilesize: Shape,
    input_name: str = "input",
) -> Path

Build or retrieve a cached TensorRT engine.

Parameters:

network_path ¶
(Path) –

Path to the ONNX model.
channels ¶
(int) –

Number of model input channels.
tilesize ¶
(Shape) –

Inference tile size as (width, height).
input_name ¶
(str, default: 'input' ) –

Name of the model input tensor.

Returns:

Path –

Path to the serialized engine file.

Source code in vsscale/mlrt/backend/trt.py

def build_engine(self, network_path: Path, channels: int, tilesize: Shape, input_name: str = "input") -> Path:
    """
    Build or retrieve a cached TensorRT engine.

    Args:
        network_path: Path to the ONNX model.
        channels: Number of model input channels.
        tilesize: Inference tile size as `(width, height)`.
        input_name: Name of the model input tensor.

    Returns:
        Path to the serialized engine file.
    """
    if self.fp16:
        network_path = self._convert_onnx_fp16(network_path)
    elif self.bf16:
        network_path = self._convert_onnx_bf16(network_path)

    identity = self.get_identity(network_path, channels, tilesize)
    engine_path = get_artifact_path(f"{identity}.engine", fallback=not self.force_rebuild)

    if not self.force_rebuild and engine_path.is_file() and engine_path.stat().st_size >= 1024:
        return engine_path

    engine_path.parent.mkdir(parents=True, exist_ok=True)

    self.build(
        network_path=network_path,
        engine_path=engine_path,
        channels=channels,
        tilesize=tilesize,
        input_name=input_name,
    )

    return engine_path

configure_builder_config ¶

configure_builder_config(
    config: IBuilderConfig, network: INetworkDefinition
) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_builder_config(self, config: trt.IBuilderConfig, network: trt.INetworkDefinition) -> None:
    if self.workspace is not None:
        config.set_memory_pool_limit(self.trt.MemoryPoolType.WORKSPACE, self.workspace)

    if self.tactic_dram is not None:
        config.set_memory_pool_limit(self.trt.MemoryPoolType.TACTIC_DRAM, self.tactic_dram)

    if not self.tf32:
        config.flags &= ~(1 << self.trt.BuilderFlag.TF32.value)

    if self.sparse_weights:
        config.flags |= 1 << self.trt.BuilderFlag.SPARSE_WEIGHTS.value

    if self.strict_nans:
        config.flags |= 1 << self.trt.BuilderFlag.STRICT_NANS.value

    if self.weight_streaming:
        config.flags |= 1 << self.trt.BuilderFlag.WEIGHT_STREAMING.value

    self.configure_tactic_sources(config)
    self.configure_optimization_settings(config)

configure_optimization_settings ¶

configure_optimization_settings(config: IBuilderConfig) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_optimization_settings(self, config: trt.IBuilderConfig) -> None:
    config.builder_optimization_level = self.builder_optimization_level
    config.avg_timing_iterations = self.avg_timing_iterations

    if self.max_aux_streams is not None:
        config.max_aux_streams = self.max_aux_streams

    if self.max_num_tactics is not None:
        config.max_num_tactics = self.max_num_tactics

    if int(self.tiling_optimization_level) != 0:
        config.tiling_optimization_level = self.trt.TilingOptimizationLevel(self.tiling_optimization_level)
        config.l2_limit_for_tiling = self.l2_limit_for_tiling

configure_tactic_sources ¶

configure_tactic_sources(config: IBuilderConfig) -> None

Source code in vsscale/mlrt/backend/trt.py

def configure_tactic_sources(self, config: trt.IBuilderConfig) -> None:
    tactic_sources = config.get_tactic_sources()

    if self.edge_mask_convolutions:
        tactic_sources |= 1 << self.trt.TacticSource.EDGE_MASK_CONVOLUTIONS.value
    else:
        tactic_sources &= ~(1 << self.trt.TacticSource.EDGE_MASK_CONVOLUTIONS.value)

    if self.jit_convolutions:
        tactic_sources |= 1 << self.trt.TacticSource.JIT_CONVOLUTIONS.value
    else:
        tactic_sources &= ~(1 << self.trt.TacticSource.JIT_CONVOLUTIONS.value)

    config.set_tactic_sources(tactic_sources)

get_args ¶

get_args(clips: VideoNode | Sequence[VideoNode]) -> dict[str, Any]

Return backend plugin arguments derived from this configuration.

Source code in vsscale/mlrt/backend/trt.py

def get_args(self, clips: vs.VideoNode | Sequence[vs.VideoNode]) -> dict[str, Any]:
    return {
        "device_id": self.device_id,
        "use_cuda_graph": self.use_cuda_graph,
        "num_streams": self.num_streams,
        "verbosity": self.verbosity,
    }

get_identity ¶

get_identity(network_path: Path, channels: int, tilesize: Shape) -> int

Source code in vsscale/mlrt/backend/trt.py

def get_identity(self, network_path: Path, channels: int, tilesize: Shape) -> int:
    checksum = zlib.crc32(network_path.read_bytes())

    command = [
        "nvidia-smi",
        "-i",
        str(self.device_id),
        "--query-gpu=name,driver_version",
        "--format=csv,noheader,nounits",
    ]
    res = subprocess.run(command, capture_output=True, text=True, check=True)
    device = [d.strip().replace(" ", "_") for d in res.stdout.split(",")]

    components = (
        str(self),
        str(self.version),
        str(sys.version_info[:2]),
        network_path.name,
        f"{checksum:x}",
        str(channels),
        str(tilesize),
        *device,
    )
    return zlib.crc32(bytes("|".join(components), "utf-8"))

inference ¶

inference(
    clips: VideoNode | Sequence[VideoNode],
    network_path: str | PathLike[str],
    /,
    overlap: tuple[int, int],
    tilesize: tuple[int, int],
    *,
    flexible: bool = False,
    **kwargs: Any,
) -> VideoNode | list[VideoNode]

Run inference with this backend.

Parameters:

clips ¶
(VideoNode | Sequence[VideoNode]) –

Input clip or clips passed to the backend model.
network_path ¶
(str | PathLike[str]) –

Path to the model file or backend artifact.
overlap ¶
(tuple[int, int]) –

Horizontal and vertical tile overlap in pixels.
tilesize ¶
(tuple[int, int]) –

Horizontal and vertical tile size in pixels.
flexible ¶
(bool, default: False ) –

Return each flexible output plane as a separate clip.
**kwargs ¶
(Any, default: {} ) –

Additional backend plugin arguments forwarded unchanged.

Returns:

VideoNode | list[VideoNode] –

A single output clip, or a list of output clips when flexible is enabled.

Source code in vsscale/mlrt/backend/trt.py

@copy_signature(Backend.inference)
def inference(
    self,
    clips: vs.VideoNode | Sequence[vs.VideoNode],
    network_path: str | os.PathLike[str],
    /,
    overlap: tuple[int, int],
    tilesize: tuple[int, int],
    *,
    flexible: bool = False,
    **kwargs: Any,
) -> vs.VideoNode | list[vs.VideoNode]:
    UnsupportedSampleTypeError.check(clips, vs.FLOAT, self.__class__)

    clips = to_arr(clips)
    channels = sum(c.format.num_planes for c in clips)
    bitdepth = max(c.format.bits_per_sample for c in clips)

    engine_path = self.build_engine(Path(network_path), channels, tilesize)

    if self.fp16 or self.bf16:
        # Clips must be in fp16 format is fp16 or bf16 mode is enabled,
        # otherwise the TRT plugins error out.
        clips = [depth(c, 16, sample_type=vs.SampleType.FLOAT) for c in clips]
    else:
        clips = [depth(c, 32) for c in clips]

    res = super().inference(clips, engine_path, overlap, tilesize, flexible=flexible, **kwargs)

    return (
        depth(res, bitdepth, sample_type=vs.FLOAT)
        if isinstance(res, vs.VideoNode)
        else [depth(r, bitdepth, sample_type=vs.FLOAT) for r in res]
    )

setup_optimization_profile ¶

setup_optimization_profile(
    builder: Builder,
    network: INetworkDefinition,
    config: IBuilderConfig,
    channels: int,
    input_name: str,
    tilesize: Shape,
) -> None

Source code in vsscale/mlrt/backend/trt.py

def setup_optimization_profile(
    self,
    builder: trt.Builder,
    network: trt.INetworkDefinition,
    config: trt.IBuilderConfig,
    channels: int,
    input_name: str,
    tilesize: Shape,
) -> None:
    profile = builder.create_optimization_profile()
    opt_shapes = self.trt.Dims(self.opt_shapes or tilesize)
    max_shapes = self.trt.Dims(self.max_shapes or tilesize)

    input_names = [network.get_input(i).name for i in range(network.num_inputs)]
    if input_name not in input_names:
        logger.debug("input_name %r isn't in the input network", input_name)
        if network.num_inputs == 1:
            input_name = input_names[0]
        else:
            raise CustomValueError(f"Input name '{input_name}' not found in network inputs: {input_names}")

    if self.static_shape:
        shape = self.trt.Dims((1, channels, opt_shapes[1], opt_shapes[0]))

        for i in range(network.num_inputs):
            input_tensor = network.get_input(i)
            if input_tensor.name == input_name:
                input_tensor.shape = shape

        profile.set_shape(input_name, shape, shape, shape)
    else:
        profile.set_shape(
            input_name,
            self.trt.Dims((1, channels, self.min_shapes[1], self.min_shapes[0])),
            self.trt.Dims((1, channels, opt_shapes[1], opt_shapes[0])),
            self.trt.Dims((1, channels, max_shapes[1], max_shapes[0])),
        )

    config.add_optimization_profile(profile)

cuda_device ¶

cuda_device(id: int) -> Generator[None]

Set cuda device id within this context.

Source code in vsscale/mlrt/backend/trt.py

@contextmanager
def cuda_device(id: int) -> Generator[None]:
    """Set cuda device id within this context."""

    import cuda.core  # type: ignore[import-untyped]

    current_id = cuda.core.Device().device_id
    logger.debug("Current cuda device id is %s", current_id)
    try:
        cuda.core.Device(id).set_current()
        logger.debug("Set cuda device id to %s", id)
        yield
    finally:
        cuda.core.Device(current_id).set_current()
        logger.debug("Restore cuda device id to %s", current_id)

trt ¶

LOGGING_VERBOSITY_MAP module-attribute ¶

logger module-attribute ¶

Shape ¶

TRT dataclass ¶

avg_timing_iterations class-attribute instance-attribute ¶

bf16 class-attribute instance-attribute ¶

builder_optimization_level class-attribute instance-attribute ¶

device_id class-attribute instance-attribute ¶

edge_mask_convolutions class-attribute instance-attribute ¶

flexible_output_prop class-attribute ¶

force_rebuild class-attribute instance-attribute ¶

fp16 class-attribute instance-attribute ¶

fp16_blacklist_ops class-attribute instance-attribute ¶

jit_convolutions class-attribute instance-attribute ¶

l2_limit_for_tiling class-attribute instance-attribute ¶

logger property ¶

max_aux_streams class-attribute instance-attribute ¶

max_num_tactics class-attribute instance-attribute ¶

max_shapes class-attribute instance-attribute ¶

max_threads class-attribute instance-attribute ¶

min_shapes class-attribute instance-attribute ¶

num_streams class-attribute instance-attribute ¶

opt_shapes class-attribute instance-attribute ¶

plugin class-attribute ¶

sparse_weights class-attribute instance-attribute ¶

static_shape class-attribute instance-attribute ¶

strict_nans class-attribute instance-attribute ¶

tactic_dram class-attribute instance-attribute ¶

tf32 class-attribute instance-attribute ¶

tiling_optimization_level class-attribute instance-attribute ¶

use_cuda_graph class-attribute instance-attribute ¶

verbosity class-attribute instance-attribute ¶

version property ¶

weight_streaming class-attribute instance-attribute ¶

workspace class-attribute instance-attribute ¶

autoselect staticmethod ¶

device_id ¶

**kwargs ¶

build ¶

build_engine ¶

network_path ¶

channels ¶

tilesize ¶

input_name ¶

configure_builder_config ¶

configure_optimization_settings ¶

configure_tactic_sources ¶

get_args ¶

get_identity ¶

inference ¶

clips ¶

network_path ¶

overlap ¶

tilesize ¶

flexible ¶

**kwargs ¶

setup_optimization_profile ¶

TRT_RTX dataclass ¶

avg_timing_iterations class-attribute instance-attribute ¶

bf16 class-attribute instance-attribute ¶

builder_optimization_level class-attribute instance-attribute ¶

device_id class-attribute instance-attribute ¶

edge_mask_convolutions class-attribute instance-attribute ¶

flexible_output_prop class-attribute ¶

force_rebuild class-attribute instance-attribute ¶

fp16 class-attribute instance-attribute ¶

fp16_blacklist_ops class-attribute instance-attribute ¶

jit_convolutions class-attribute instance-attribute ¶

l2_limit_for_tiling class-attribute instance-attribute ¶

logger property ¶

max_aux_streams class-attribute instance-attribute ¶

max_num_tactics class-attribute instance-attribute ¶

max_shapes class-attribute instance-attribute ¶

max_threads class-attribute instance-attribute ¶

min_shapes class-attribute instance-attribute ¶

num_streams class-attribute instance-attribute ¶

opt_shapes class-attribute instance-attribute ¶

plugin class-attribute ¶

sparse_weights class-attribute instance-attribute ¶

LOGGING_VERBOSITY_MAP `module-attribute` ¶

logger `module-attribute` ¶

TRT `dataclass` ¶

avg_timing_iterations `class-attribute` `instance-attribute` ¶

bf16 `class-attribute` `instance-attribute` ¶

builder_optimization_level `class-attribute` `instance-attribute` ¶

device_id `class-attribute` `instance-attribute` ¶

edge_mask_convolutions `class-attribute` `instance-attribute` ¶

flexible_output_prop `class-attribute` ¶

force_rebuild `class-attribute` `instance-attribute` ¶

fp16 `class-attribute` `instance-attribute` ¶

fp16_blacklist_ops `class-attribute` `instance-attribute` ¶

jit_convolutions `class-attribute` `instance-attribute` ¶

l2_limit_for_tiling `class-attribute` `instance-attribute` ¶

logger `property` ¶

max_aux_streams `class-attribute` `instance-attribute` ¶

max_num_tactics `class-attribute` `instance-attribute` ¶

max_shapes `class-attribute` `instance-attribute` ¶

max_threads `class-attribute` `instance-attribute` ¶

min_shapes `class-attribute` `instance-attribute` ¶

num_streams `class-attribute` `instance-attribute` ¶

opt_shapes `class-attribute` `instance-attribute` ¶

plugin `class-attribute` ¶

sparse_weights `class-attribute` `instance-attribute` ¶

static_shape `class-attribute` `instance-attribute` ¶

strict_nans `class-attribute` `instance-attribute` ¶

tactic_dram `class-attribute` `instance-attribute` ¶

tf32 `class-attribute` `instance-attribute` ¶

tiling_optimization_level `class-attribute` `instance-attribute` ¶

use_cuda_graph `class-attribute` `instance-attribute` ¶

verbosity `class-attribute` `instance-attribute` ¶

version `property` ¶

weight_streaming `class-attribute` `instance-attribute` ¶

workspace `class-attribute` `instance-attribute` ¶

autoselect `staticmethod` ¶

`device_id` ¶

`kwargs`** ¶

`network_path` ¶

`channels` ¶

`tilesize` ¶

`input_name` ¶

`clips` ¶

`network_path` ¶

`overlap` ¶

`tilesize` ¶

`flexible` ¶

`kwargs`** ¶

TRT_RTX `dataclass` ¶

avg_timing_iterations `class-attribute` `instance-attribute` ¶

bf16 `class-attribute` `instance-attribute` ¶

builder_optimization_level `class-attribute` `instance-attribute` ¶

device_id `class-attribute` `instance-attribute` ¶

edge_mask_convolutions `class-attribute` `instance-attribute` ¶

flexible_output_prop `class-attribute` ¶

force_rebuild `class-attribute` `instance-attribute` ¶

fp16 `class-attribute` `instance-attribute` ¶

fp16_blacklist_ops `class-attribute` `instance-attribute` ¶

jit_convolutions `class-attribute` `instance-attribute` ¶

l2_limit_for_tiling `class-attribute` `instance-attribute` ¶

logger `property` ¶

max_aux_streams `class-attribute` `instance-attribute` ¶

max_num_tactics `class-attribute` `instance-attribute` ¶

max_shapes `class-attribute` `instance-attribute` ¶

max_threads `class-attribute` `instance-attribute` ¶

min_shapes `class-attribute` `instance-attribute` ¶

num_streams `class-attribute` `instance-attribute` ¶

opt_shapes `class-attribute` `instance-attribute` ¶

plugin `class-attribute` ¶

sparse_weights `class-attribute` `instance-attribute` ¶

static_shape `class-attribute` `instance-attribute` ¶

strict_nans `class-attribute` `instance-attribute` ¶

tactic_dram `class-attribute` `instance-attribute` ¶

tf32 `class-attribute` `instance-attribute` ¶

tiling_optimization_level `class-attribute` `instance-attribute` ¶

use_cuda_graph `class-attribute` `instance-attribute` ¶

verbosity `class-attribute` `instance-attribute` ¶

version `property` ¶

weight_streaming `class-attribute` `instance-attribute` ¶

workspace `class-attribute` `instance-attribute` ¶

autoselect `staticmethod` ¶