Complete UVR5 / audio-separator troubleshooting guide (GPU not used, CUDA, OOM, installation)

How to use this article

UVR5 / MDX-Net and audio-separator are tools that are instant in a working environment but get stuck endlessly in a non-working one. And the sticking points are nearly fixed — GPU not used, CUDA out of memory, cuDNN error, ffmpeg missing, the model downloaded every time.

This article is a troubleshooting collection you can look up in reverse from the symptom. For each symptom, it shows "what to run first to diagnose → cause → concrete fix," based on the official information of ONNX Runtime / PyTorch / audio-separator.

About the author (reliability disclosure): I have single-handedly designed, implemented, and operate in production an AI video-localization foundation with source separation as the first stage. The fix procedures in this article are a record of what I've actually stepped on and fixed in local, Colab, container, and production GPU environments. The scale design in production is summarized in the GPU-worker-foundation article, and the tool's big picture in the UVR5 guide.

Symptom quick reference (place a bet in 30 seconds)

Symptom	Most likely cause	Immediate move
Painfully slow despite GPU	coexistence of `onnxruntime` and `onnxruntime-gpu` / CUDA·cuDNN mismatch	① CPU fallback
CUDA out of memory	`segment_size` / `batch_size` too large, long clip	② OOM
cuDNN / CUDA error	ORT and CUDA/cuDNN version incompatibility	③ version mismatch
ffmpeg-related error	ffmpeg not installed	④ ffmpeg
Slow with model DL every time	`/tmp` volatile, cache not persisted	⑤ model DL
Slow/won't work on Mac	Apple Silicon doesn't support CUDA (MPS/CPU)	⑥ Apple Silicon

First, diagnose: confirm the environment mechanically

Before fixing by intuition, acquire the facts. The first commands to throw are these.

# audio-separator公式の環境診断（最優先）。CUDA/ffmpegの可否を一発で出す
audio-separator --env_info
# 期待する出力例:
#   "ONNXruntime has CUDAExecutionProvider available, enabling acceleration"
#   "FFmpeg installed"

# GPUドライバ自体が見えているか
nvidia-smi

Further, confirm both ONNX Runtime and PyTorch from Python. Because audio-separator runs on 2 lines — ONNX Runtime (.onnx for MDX-Net, etc.) and PyTorch (.ckpt for Demucs, etc.) — you need to look at both.

# diagnose.py — GPUが本当に効いているかを機械的に確認する
import onnxruntime as ort

print("ORT version :", ort.__version__)
print("ORT device  :", ort.get_device())                 # 'GPU' なら可、'CPU' なら未使用
print("ORT providers:", ort.get_available_providers())   # 'CUDAExecutionProvider' が含まれるか

try:
    import torch
    print("torch       :", torch.__version__)
    print("torch.cuda  :", torch.cuda.is_available())     # Demucs等のPyTorch側
    if torch.cuda.is_available():
        print("device name :", torch.cuda.get_device_name(0))
except ImportError:
    print("torch not installed (MDX-Net/.onnx のみ使うなら不要)")

If 'CUDAExecutionProvider' isn't in ort.get_available_providers(), the GPU isn't being used (CPU execution = painfully slow).
If ort.get_device() is 'CPU', the same.

If these two indicate "not used," the cause is almost certainly symptom ①.

[Most common] GPU not used, painfully slow on CPU

This is the most common and the hardest to notice problem. No error at all, just tens of times slower.

Why it silently falls back to CPU

ONNX Runtime runs on a priority list of Execution Providers. As the official explains —

['CUDAExecutionProvider', 'CPUExecutionProvider'] means execute a node using CUDAExecutionProvider if capable, otherwise execute using CPUExecutionProvider.

That is, if CUDA can't be used, it silently executes on CPU. This is the true identity of "it works but is slow."

Cause A: coexistence of `onnxruntime` and `onnxruntime-gpu` (most likely)

Install both the CPU onnxruntime and the GPU onnxruntime-gpu and they deploy to the same directory and overwrite last-wins, sometimes silently erasing CUDAExecutionProvider.

🔎 This is not a story with an explicit warning in ONNX Runtime's official documentation but a widely-known gotcha in the community like microsoft/onnxruntime's issue. The behavior (CPU fallback) can be explained from the providers spec.

Fix: remove the CPU version, then reinstall the GPU version.

pip uninstall -y onnxruntime          # CPU版を必ず先に消す
pip install --force-reinstall onnxruntime-gpu

Cause B: CUDA / cuDNN version mismatch

If the CUDA/cuDNN that onnxruntime-gpu's build requires doesn't mesh with the environment's, the CUDA EP can't load and it falls back to CPU. → To ③ version mismatch.

audio-separator's official reinstall procedure

audio-separator's README shows a clean reinstall procedure for when the GPU doesn't work (verbatim).

pip uninstall torch onnxruntime
pip cache purge
pip install --force-reinstall torch torchvision torchaudio
pip install --force-reinstall onnxruntime-gpu

If audio-separator --env_info outputs CUDAExecutionProvider available after this, it's resolved.

CUDA out of memory (crashes)

The case where RuntimeError: CUDA out of memory appears with a long clip / high-resolution setting.

The priority of fixes

Lower segment_size (most effective). The chunk put on the GPU becomes smaller, saving VRAM.
Set batch_size to 1. Throughput drops but it reliably reduces VRAM.
Switch to a smaller/lighter model (RoFormer → MDX-Net, etc. See the model-selection guide).
Move up to a GPU with larger VRAM (g4dn 16GB → g5/g6 24GB).

# OOM対策：チャンクを小さくしてVRAMを抑える
from audio_separator.separator import Separator

sep = Separator(mdx_params={
    "segment_size": 128,   # 既定256から下げる（OOMの一番効く対策）
    "overlap": 0.25,
    "batch_size": 1,       # まとめ処理をやめてVRAMを節約
    "hop_length": 1024,
    "enable_denoise": False,
})
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")

Beware the misconception about `empty_cache()`

When OOM appears in the PyTorch line (Demucs, etc.), you'll want to call torch.cuda.empty_cache(), but don't overtrust it. Per the PyTorch official —

Calling empty_cache() releases all unused cached memory from PyTorch ... However, the occupied GPU memory by tensors will not be freed so it can not increase the amount of GPU memory available for PyTorch.

That is, what's freed is only "unused cache," and the VRAM held by tensors doesn't return. The essential solution to OOM is to "reduce the usage in the first place" = make the chunk/batch smaller. If fragmentation is suspected, tuning the environment variable PYTORCH_CUDA_ALLOC_CONF is also an option.

CUDA/cuDNN version errors

LoadLibrary failed, libcudnn.so.X not found, the CUDA EP can't load, etc.

ONNX Runtime's version requirements (official table)

ONNX Runtime's CUDA EP has required CUDA/cuDNN fixed per ORT version (official table). Key points:

ONNX Runtime	CUDA	cuDNN	Note
1.20.x / 1.19.x	12.x	9.x	PyPI default (the PyPI default from 1.19 is CUDA 12.x)
1.18.1	12.x	9.x	cuDNN 9 required
1.18.0	12.x	8.x	—
1.20/1.19/1.18.x	11.8	8.x	(the CUDA 11.8 builds of 1.19/1.20 aren't provided on PyPI)

And the biggest pitfall is this (official verbatim).

ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa.

cuDNN 8.x and 9.x are not compatible. A newer ORT (1.19+) requires cuDNN 9, so if the environment has only cuDNN 8, the CUDA EP can't load and it falls back to CPU.

Per-environment remedies

Colab (when CUDA 12 is the default but ORT requires the CUDA 11 libraries, README verbatim):

apt update && apt install -y nvidia-cuda-toolkit   # 不足するCUDAライブラリを補う

ONNX Runtime for a CUDA 12 environment (the nightly the README guides to):

python -m pip install ort-nightly-gpu \
  --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/

The basic policy: match the version of onnxruntime-gpu to the environment's CUDA/cuDNN with the table. If you can't match, install the corresponding CUDA/cuDNN or swap the ORT side. audio-separator's supported CUDA is 11.8 and 12.2.

ffmpeg not found (audio I/O failure)

ffmpeg not found, crashes on audio read/write. audio-separator requires ffmpeg for audio I/O.

# Debian / Ubuntu
apt-get update && apt-get install -y ffmpeg
# macOS
brew install ffmpeg

OK if audio-separator --env_info outputs FFmpeg installed. In a container, always bundle ffmpeg into the image (runtime apt is slow and unstable).

The model is downloaded every time (slow)

"A few hundred MB DL runs on every startup" — frequent with containers/serverless.

The cause

The model is auto-downloaded on first use, and the default save location is /tmp/audio-separator-models/ (README verbatim). Because /tmp is volatile, it's re-downloaded on every cold start.

The fix

Persist the save location: point model_file_dir at a persistent volume.
Bake it into the image: DL once at build time and include it in the layer (fastest and most reliable in production).

from audio_separator.separator import Separator

sep = Separator(model_file_dir="/models")   # 永続ボリューム or イメージ同梱パス
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")

# Dockerfile（抜粋）：ビルド時にモデルを焼き込み、起動時の再DLをゼロに
ENV MODEL_DIR=/models
RUN python -c "from audio_separator.separator import Separator; \
  s=Separator(model_file_dir='${MODEL_DIR}'); \
  s.load_model(model_filename='UVR-MDX-NET-Inst_HQ_3.onnx')"

This "resident model + image baking" cuts both production cold starts and egress billing at once. The scale design is detailed in the GPU-worker-foundation article.

Doesn't work or is slow on Apple Silicon (Mac)

Apple Silicon (M1+) doesn't support NVIDIA CUDA. UVR5 itself supports MPS (GPU) acceleration, but in the audio-separator library, CPU execution (the [cpu] extra) is the basic.

pip install "audio-separator[cpu]"   # Apple SiliconはCPU版

Worrying that "the GPU isn't used" on a Mac is normal (because there's no CUDA). It runs on CPU but is slow, so the realistic answer for high-volume processing is to use a Linux environment with an NVIDIA GPU (including cloud).
Choosing a light model (MDX-Net) and matching segment_size to the environment gets it close to practical even on CPU.

Prevent "silent degradation" in production: a startup guard

Finally, a preventive measure to never step on the most common ① again. In production, verify the GPU at startup and fail-fast if it has fallen back to CPU. Because running a slow production unnoticed is the most expensive.

# guard.py — 起動時にCUDAが効いているか検証。本番ではCPUフォールバックで起動を止める
import os
import onnxruntime as ort

def assert_gpu(require: bool = True) -> None:
    providers = ort.get_available_providers()
    ok = "CUDAExecutionProvider" in providers and ort.get_device() == "GPU"
    if not ok:
        msg = (f"GPUが有効化されていません（providers={providers}, "
               f"device={ort.get_device()}）。onnxruntime-gpuとCUDA/cuDNNを確認。")
        if require:
            raise RuntimeError(msg)     # CIや本番起動で確実に気づく
        print(f"[WARN] {msg}")

if __name__ == "__main__":
    assert_gpu(require=os.environ.get("REQUIRE_GPU", "true").lower() == "true")

Just inserting this into Docker's HEALTHCHECK or service startup structurally prevents the worst accident of "painfully slow on CPU before you knew it."

Frequently asked questions (FAQ)

Q. I have a GPU but --env_info doesn't output CUDAExecutionProvider. A. First, pip uninstall -y onnxruntime to remove the CPU version and reinstall onnxruntime-gpu (coexistence is most likely). If that doesn't fix it, confirm the version table of ORT and CUDA/cuDNN and suspect a cuDNN 8/9 mix-up.

Q. The GPU works locally but it's CPU in a container. A. Confirm whether the CUDA/cuDNN runtime is correctly installed in the container, and whether onnxruntime (the CPU version) has slipped in. Also check whether nvidia-smi works inside the container (--gpus all / GPU runtime).

Q. Can I fix CUDA out of memory with empty_cache? A. Basically no. Because empty_cache() only frees unused cache. Lowering segment_size/batch_size is the right path. If VRAM is insufficient, move up the GPU.

Q. Why is ffmpeg needed? A. It's used for audio read/write (decode/encode). Without it installed, it crashes on audio I/O. Install it with apt-get install -y ffmpeg, etc.

Q. I want to use the GPU on a Mac. A. Apple Silicon doesn't support CUDA. UVR5 itself supports MPS, but library operation is basically CPU. For high-volume processing, use a Linux/cloud with an NVIDIA GPU.

Conclusion: have the pattern of symptom → diagnose → fix

UVR5 / audio-separator trouble is a swamp if you fix by guesswork, but if you take the facts with diagnostic commands, the cause is almost uniquely determined.

First, take the GPU/ffmpeg facts with audio-separator --env_info and diagnose.py.
The most common is the CPU fallback — remove the onnxruntime coexistence and match CUDA/cuDNN to the table.
For OOM, lower the chunk/batch (empty_cache isn't the essential solution).
In production, structurally prevent "silent degradation" with a startup GPU guard.

Not stepping into these "environment swamps" by design is where outsourcing makes a difference. If you want to build voice/video AI including source separation at production quality, consult along with the case study. With one person × generative AI, I support end-to-end from environment setup to production operation.

Sources / official resources

ONNX Runtime CUDA EP (requirement table, cuDNN compatibility): CUDA Execution Provider
ONNX Runtime Python API: API summary (get_available_providers / get_device) / Install
PyTorch CUDA memory management: CUDA semantics (empty_cache / PYTORCH_CUDA_ALLOC_CONF)
audio-separator: README (--env_info / reinstall procedure / CUDA 11.8·12.2 / model DL location)
The "onnxruntime coexistence" issue: microsoft/onnxruntime issue #7748

Library version requirements are updated. Always confirm primary sources before implementing. The "coexistence of onnxruntime and onnxruntime-gpu" issue is not official text but an explanation based on widely-known behavior in the Issue.

Complete UVR5 / audio-separator troubleshooting guide (GPU not used, CUDA, OOM, installation)

How to use this article

Symptom quick reference (place a bet in 30 seconds)

First, diagnose: confirm the environment mechanically

[Most common] GPU not used, painfully slow on CPU

Why it silently falls back to CPU

Cause A: coexistence of `onnxruntime` and `onnxruntime-gpu` (most likely)

Cause B: CUDA / cuDNN version mismatch

audio-separator's official reinstall procedure

CUDA out of memory (crashes)

The priority of fixes

Beware the misconception about `empty_cache()`

CUDA/cuDNN version errors

ONNX Runtime's version requirements (official table)

Per-environment remedies

ffmpeg not found (audio I/O failure)

The model is downloaded every time (slow)

The cause

The fix

Doesn't work or is slow on Apple Silicon (Mac)

Prevent "silent degradation" in production: a startup guard

Frequently asked questions (FAQ)

Conclusion: have the pattern of symptom → diagnose → fix

Sources / official resources

How to choose a source-separation tool: selecting Demucs / UVR5(MDX-Net) / Spleeter / Open-Unmix by requirements

Scaling audio source separation in production on AWS: a GPU batch-processing platform (SQS × ECS/Batch × S3)

Complete guide to BS-RoFormer / Mel-Band RoFormer: using 2026's highest-quality source separation in production

Demucs v4 Complete Guide: Running Meta's Source-Separation Model (HT Demucs) in Production, Faithful to the Official Docs

Also worth reading

Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

MuseTalk Complete Guide: Operating Realtime Lip Sync (Latent-Space Inpainting) in Production, Faithful to Official Sources

MuseTalk Production Deployment in Practice — Docker, GPU Serving, Autoscaling, Cost Optimization, Observability

How to use this article

Symptom quick reference (place a bet in 30 seconds)

First, diagnose: confirm the environment mechanically

[Most common] GPU not used, painfully slow on CPU

Why it silently falls back to CPU

Cause A: coexistence of onnxruntime and onnxruntime-gpu (most likely)

Cause B: CUDA / cuDNN version mismatch

audio-separator's official reinstall procedure

CUDA out of memory (crashes)

The priority of fixes

Beware the misconception about empty_cache()

CUDA/cuDNN version errors

ONNX Runtime's version requirements (official table)

Per-environment remedies

ffmpeg not found (audio I/O failure)

The model is downloaded every time (slow)

The cause

The fix

Doesn't work or is slow on Apple Silicon (Mac)

Prevent "silent degradation" in production: a startup guard

Frequently asked questions (FAQ)

Conclusion: have the pattern of symptom → diagnose → fix

Sources / official resources

Related articles

How to choose a source-separation tool: selecting Demucs / UVR5(MDX-Net) / Spleeter / Open-Unmix by requirements

Scaling audio source separation in production on AWS: a GPU batch-processing platform (SQS × ECS/Batch × S3)

Complete guide to BS-RoFormer / Mel-Band RoFormer: using 2026's highest-quality source separation in production

Demucs v4 Complete Guide: Running Meta's Source-Separation Model (HT Demucs) in Production, Faithful to the Official Docs

Also worth reading

Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

MuseTalk Complete Guide: Operating Realtime Lip Sync (Latent-Space Inpainting) in Production, Faithful to Official Sources

MuseTalk Production Deployment in Practice — Docker, GPU Serving, Autoscaling, Cost Optimization, Observability

Cause A: coexistence of `onnxruntime` and `onnxruntime-gpu` (most likely)

Beware the misconception about `empty_cache()`