Navigation

NVIDIA DALI Setup

Install and use NVIDIA DALI for GPU-accelerated data loading. Speed up image preprocessing and data augmentation with DALI pipelines for PyTorch and TensorFlow deep learning.

Back to troubleshooting โ†’

Overview

NVIDIA DALI (Data Loading Library) is a GPU-accelerated library for data loading and preprocessing. This guide covers:

  • Installation for different CUDA versions
  • TensorFlow plugin setup
  • Common compatibility issues
  • Performance considerations

:::caution[Version Compatibility] DALI has strict version requirements with TensorFlow and PyTorch. Always check compatibility before installing. :::


What is DALI?

DALI accelerates data preprocessing by offloading operations to the GPU, potentially improving training throughput when data loading is a bottleneck.

Key Features:

  • GPU-accelerated image decoding and augmentation
  • Integration with TensorFlow and PyTorch
  • Pipeline-based data processing

Official Installation Guide

external

Installation

# Core DALI library
pip install nvidia-dali-cuda120

# TensorFlow plugin (if using TensorFlow)
pip install nvidia-dali-tf-plugin-cuda120

# PyTorch plugin (if using PyTorch)
pip install nvidia-dali-plugin-pytorch
# Core DALI library
pip install nvidia-dali-cuda110

# TensorFlow plugin (if using TensorFlow)
pip install nvidia-dali-tf-plugin-cuda110

# PyTorch plugin (if using PyTorch)
pip install nvidia-dali-plugin-pytorch

TensorFlow Compatibility Issues

Known Issues (as of April 2025)

:::caution[TensorFlow 2.16.2 Recommended] DALI has compatibility issues with TensorFlow versions:

  • TensorFlow > 2.16: Not yet supported (known bug tracked on GitHub)
  • TensorFlow < 2.16: Compatibility issues
  • Recommended: TensorFlow 2.16.2 + DALI 1.47 + Plugin

Working Configuration:

pip install tensorflow==2.16.2
pip install oauthlib==3.2.2
pip install nvidia-dali-cuda120==1.47
pip install nvidia-dali-tf-plugin-cuda120

:::

Installation Tips

  1. Use pip, not conda for TensorFlow when using DALI
  2. Check DALI GitHub releases for latest compatibility
  3. Verify DALI path configuration is correct

TensorFlow Integration Guide

external

Performance Considerations

When DALI May Not Help

:::note[DALI is not always faster] DALI doesnโ€™t guarantee faster performance in all scenarios:

Network Storage Limitations:

  • DALI uses a single thread for file reads
  • PyTorch DataLoader can read multiple files in parallel
  • Network latency may not be hidden by DALIโ€™s pipeline

Workarounds:

# Try disabling memory mapping if loading from network storage
reader = ops.readers.File(
    file_root=data_path,
    dont_use_mmap=True  # May help with network drives
)

When DALI Helps Most:

  • Local SSD storage
  • Heavy preprocessing (augmentations, decoding)
  • Large batch sizes
  • GPU has idle time during data loading :::

Benchmark First

Always benchmark your specific use case:

# Test with and without DALI
# Measure: samples/sec, GPU utilization, data loading time

Performance Discussion on GitHub

external