Skip to main content
友田 陽大
Audio source separation & preprocessing
UVR5
audio-separator
GPU
CUDA
ONNX
音源分離
Python
AI音声

Complete UVR5 / audio-separator troubleshooting guide (GPU not used, CUDA, OOM, installation)

'GPU isn't used and it's painfully slow,' 'CUDA out of memory,' 'cuDNN errors,' 'ffmpeg is missing,' 'the model is downloaded every time' — solves the symptoms commonly stuck on in source separation with UVR5 and audio-separator, by symptom, based on ONNX Runtime/PyTorch official facts, from diagnostic commands to concrete fix procedures.

Published
Reading time
11 min read
Author
友田 陽大
Share

How to use this article

UVR5 / MDX-Net and audio-separator are tools that are instant in a working environment but get stuck endlessly in a non-working one. And the sticking points are nearly fixed — GPU not used, CUDA out of memory, cuDNN error, ffmpeg missing, the model downloaded every time.

This article is a troubleshooting collection you can look up in reverse from the symptom. For each symptom, it shows "what to run first to diagnose → cause → concrete fix," based on the official information of ONNX Runtime / PyTorch / audio-separator.

About the author (reliability disclosure): I have single-handedly designed, implemented, and operate in production an AI video-localization foundation with source separation as the first stage. The fix procedures in this article are a record of what I've actually stepped on and fixed in local, Colab, container, and production GPU environments. The scale design in production is summarized in the GPU-worker-foundation article, and the tool's big picture in the UVR5 guide.


Symptom quick reference (place a bet in 30 seconds)

SymptomMost likely causeImmediate move
Painfully slow despite GPUcoexistence of onnxruntime and onnxruntime-gpu / CUDA·cuDNN mismatch① CPU fallback
CUDA out of memorysegment_size / batch_size too large, long clip② OOM
cuDNN / CUDA errorORT and CUDA/cuDNN version incompatibility③ version mismatch
ffmpeg-related errorffmpeg not installed④ ffmpeg
Slow with model DL every time/tmp volatile, cache not persisted⑤ model DL
Slow/won't work on MacApple Silicon doesn't support CUDA (MPS/CPU)⑥ Apple Silicon

First, diagnose: confirm the environment mechanically

Before fixing by intuition, acquire the facts. The first commands to throw are these.

# audio-separator公式の環境診断(最優先)。CUDA/ffmpegの可否を一発で出す
audio-separator --env_info
# 期待する出力例:
#   "ONNXruntime has CUDAExecutionProvider available, enabling acceleration"
#   "FFmpeg installed"

# GPUドライバ自体が見えているか
nvidia-smi

Further, confirm both ONNX Runtime and PyTorch from Python. Because audio-separator runs on 2 lines — ONNX Runtime (.onnx for MDX-Net, etc.) and PyTorch (.ckpt for Demucs, etc.) — you need to look at both.

# diagnose.py — GPUが本当に効いているかを機械的に確認する
import onnxruntime as ort

print("ORT version :", ort.__version__)
print("ORT device  :", ort.get_device())                 # 'GPU' なら可、'CPU' なら未使用
print("ORT providers:", ort.get_available_providers())   # 'CUDAExecutionProvider' が含まれるか

try:
    import torch
    print("torch       :", torch.__version__)
    print("torch.cuda  :", torch.cuda.is_available())     # Demucs等のPyTorch側
    if torch.cuda.is_available():
        print("device name :", torch.cuda.get_device_name(0))
except ImportError:
    print("torch not installed (MDX-Net/.onnx のみ使うなら不要)")
  • If 'CUDAExecutionProvider' isn't in ort.get_available_providers(), the GPU isn't being used (CPU execution = painfully slow).
  • If ort.get_device() is 'CPU', the same.

If these two indicate "not used," the cause is almost certainly symptom ①.


[Most common] GPU not used, painfully slow on CPU

This is the most common and the hardest to notice problem. No error at all, just tens of times slower.

Why it silently falls back to CPU

ONNX Runtime runs on a priority list of Execution Providers. As the official explains —

['CUDAExecutionProvider', 'CPUExecutionProvider'] means execute a node using CUDAExecutionProvider if capable, otherwise execute using CPUExecutionProvider.

That is, if CUDA can't be used, it silently executes on CPU. This is the true identity of "it works but is slow."

Cause A: coexistence of onnxruntime and onnxruntime-gpu (most likely)

Install both the CPU onnxruntime and the GPU onnxruntime-gpu and they deploy to the same directory and overwrite last-wins, sometimes silently erasing CUDAExecutionProvider.

🔎 This is not a story with an explicit warning in ONNX Runtime's official documentation but a widely-known gotcha in the community like microsoft/onnxruntime's issue. The behavior (CPU fallback) can be explained from the providers spec.

Fix: remove the CPU version, then reinstall the GPU version.

pip uninstall -y onnxruntime          # CPU版を必ず先に消す
pip install --force-reinstall onnxruntime-gpu

Cause B: CUDA / cuDNN version mismatch

If the CUDA/cuDNN that onnxruntime-gpu's build requires doesn't mesh with the environment's, the CUDA EP can't load and it falls back to CPU. → To ③ version mismatch.

audio-separator's official reinstall procedure

audio-separator's README shows a clean reinstall procedure for when the GPU doesn't work (verbatim).

pip uninstall torch onnxruntime
pip cache purge
pip install --force-reinstall torch torchvision torchaudio
pip install --force-reinstall onnxruntime-gpu

If audio-separator --env_info outputs CUDAExecutionProvider available after this, it's resolved.


CUDA out of memory (crashes)

The case where RuntimeError: CUDA out of memory appears with a long clip / high-resolution setting.

The priority of fixes

  1. Lower segment_size (most effective). The chunk put on the GPU becomes smaller, saving VRAM.
  2. Set batch_size to 1. Throughput drops but it reliably reduces VRAM.
  3. Switch to a smaller/lighter model (RoFormer → MDX-Net, etc. See the model-selection guide).
  4. Move up to a GPU with larger VRAM (g4dn 16GB → g5/g6 24GB).
# OOM対策:チャンクを小さくしてVRAMを抑える
from audio_separator.separator import Separator

sep = Separator(mdx_params={
    "segment_size": 128,   # 既定256から下げる(OOMの一番効く対策)
    "overlap": 0.25,
    "batch_size": 1,       # まとめ処理をやめてVRAMを節約
    "hop_length": 1024,
    "enable_denoise": False,
})
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")

Beware the misconception about empty_cache()

When OOM appears in the PyTorch line (Demucs, etc.), you'll want to call torch.cuda.empty_cache(), but don't overtrust it. Per the PyTorch official —

Calling empty_cache() releases all unused cached memory from PyTorch ... However, the occupied GPU memory by tensors will not be freed so it can not increase the amount of GPU memory available for PyTorch.

That is, what's freed is only "unused cache," and the VRAM held by tensors doesn't return. The essential solution to OOM is to "reduce the usage in the first place" = make the chunk/batch smaller. If fragmentation is suspected, tuning the environment variable PYTORCH_CUDA_ALLOC_CONF is also an option.


CUDA/cuDNN version errors

LoadLibrary failed, libcudnn.so.X not found, the CUDA EP can't load, etc.

ONNX Runtime's version requirements (official table)

ONNX Runtime's CUDA EP has required CUDA/cuDNN fixed per ORT version (official table). Key points:

ONNX RuntimeCUDAcuDNNNote
1.20.x / 1.19.x12.x9.xPyPI default (the PyPI default from 1.19 is CUDA 12.x)
1.18.112.x9.xcuDNN 9 required
1.18.012.x8.x
1.20/1.19/1.18.x11.88.x(the CUDA 11.8 builds of 1.19/1.20 aren't provided on PyPI)

And the biggest pitfall is this (official verbatim).

ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa.

cuDNN 8.x and 9.x are not compatible. A newer ORT (1.19+) requires cuDNN 9, so if the environment has only cuDNN 8, the CUDA EP can't load and it falls back to CPU.

Per-environment remedies

  • Colab (when CUDA 12 is the default but ORT requires the CUDA 11 libraries, README verbatim):
apt update && apt install -y nvidia-cuda-toolkit   # 不足するCUDAライブラリを補う
  • ONNX Runtime for a CUDA 12 environment (the nightly the README guides to):
python -m pip install ort-nightly-gpu \
  --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
  • The basic policy: match the version of onnxruntime-gpu to the environment's CUDA/cuDNN with the table. If you can't match, install the corresponding CUDA/cuDNN or swap the ORT side. audio-separator's supported CUDA is 11.8 and 12.2.

ffmpeg not found (audio I/O failure)

ffmpeg not found, crashes on audio read/write. audio-separator requires ffmpeg for audio I/O.

# Debian / Ubuntu
apt-get update && apt-get install -y ffmpeg
# macOS
brew install ffmpeg

OK if audio-separator --env_info outputs FFmpeg installed. In a container, always bundle ffmpeg into the image (runtime apt is slow and unstable).


The model is downloaded every time (slow)

"A few hundred MB DL runs on every startup" — frequent with containers/serverless.

The cause

The model is auto-downloaded on first use, and the default save location is /tmp/audio-separator-models/ (README verbatim). Because /tmp is volatile, it's re-downloaded on every cold start.

The fix

  • Persist the save location: point model_file_dir at a persistent volume.
  • Bake it into the image: DL once at build time and include it in the layer (fastest and most reliable in production).
from audio_separator.separator import Separator

sep = Separator(model_file_dir="/models")   # 永続ボリューム or イメージ同梱パス
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")
# Dockerfile(抜粋):ビルド時にモデルを焼き込み、起動時の再DLをゼロに
ENV MODEL_DIR=/models
RUN python -c "from audio_separator.separator import Separator; \
  s=Separator(model_file_dir='${MODEL_DIR}'); \
  s.load_model(model_filename='UVR-MDX-NET-Inst_HQ_3.onnx')"

This "resident model + image baking" cuts both production cold starts and egress billing at once. The scale design is detailed in the GPU-worker-foundation article.


Doesn't work or is slow on Apple Silicon (Mac)

Apple Silicon (M1+) doesn't support NVIDIA CUDA. UVR5 itself supports MPS (GPU) acceleration, but in the audio-separator library, CPU execution (the [cpu] extra) is the basic.

pip install "audio-separator[cpu]"   # Apple SiliconはCPU版
  • Worrying that "the GPU isn't used" on a Mac is normal (because there's no CUDA). It runs on CPU but is slow, so the realistic answer for high-volume processing is to use a Linux environment with an NVIDIA GPU (including cloud).
  • Choosing a light model (MDX-Net) and matching segment_size to the environment gets it close to practical even on CPU.

Prevent "silent degradation" in production: a startup guard

Finally, a preventive measure to never step on the most common ① again. In production, verify the GPU at startup and fail-fast if it has fallen back to CPU. Because running a slow production unnoticed is the most expensive.

# guard.py — 起動時にCUDAが効いているか検証。本番ではCPUフォールバックで起動を止める
import os
import onnxruntime as ort

def assert_gpu(require: bool = True) -> None:
    providers = ort.get_available_providers()
    ok = "CUDAExecutionProvider" in providers and ort.get_device() == "GPU"
    if not ok:
        msg = (f"GPUが有効化されていません(providers={providers}, "
               f"device={ort.get_device()})。onnxruntime-gpuとCUDA/cuDNNを確認。")
        if require:
            raise RuntimeError(msg)     # CIや本番起動で確実に気づく
        print(f"[WARN] {msg}")

if __name__ == "__main__":
    assert_gpu(require=os.environ.get("REQUIRE_GPU", "true").lower() == "true")

Just inserting this into Docker's HEALTHCHECK or service startup structurally prevents the worst accident of "painfully slow on CPU before you knew it."


Frequently asked questions (FAQ)

Q. I have a GPU but --env_info doesn't output CUDAExecutionProvider. A. First, pip uninstall -y onnxruntime to remove the CPU version and reinstall onnxruntime-gpu (coexistence is most likely). If that doesn't fix it, confirm the version table of ORT and CUDA/cuDNN and suspect a cuDNN 8/9 mix-up.

Q. The GPU works locally but it's CPU in a container. A. Confirm whether the CUDA/cuDNN runtime is correctly installed in the container, and whether onnxruntime (the CPU version) has slipped in. Also check whether nvidia-smi works inside the container (--gpus all / GPU runtime).

Q. Can I fix CUDA out of memory with empty_cache? A. Basically no. Because empty_cache() only frees unused cache. Lowering segment_size/batch_size is the right path. If VRAM is insufficient, move up the GPU.

Q. Why is ffmpeg needed? A. It's used for audio read/write (decode/encode). Without it installed, it crashes on audio I/O. Install it with apt-get install -y ffmpeg, etc.

Q. I want to use the GPU on a Mac. A. Apple Silicon doesn't support CUDA. UVR5 itself supports MPS, but library operation is basically CPU. For high-volume processing, use a Linux/cloud with an NVIDIA GPU.


Conclusion: have the pattern of symptom → diagnose → fix

UVR5 / audio-separator trouble is a swamp if you fix by guesswork, but if you take the facts with diagnostic commands, the cause is almost uniquely determined.

  1. First, take the GPU/ffmpeg facts with audio-separator --env_info and diagnose.py.
  2. The most common is the CPU fallback — remove the onnxruntime coexistence and match CUDA/cuDNN to the table.
  3. For OOM, lower the chunk/batch (empty_cache isn't the essential solution).
  4. In production, structurally prevent "silent degradation" with a startup GPU guard.

Not stepping into these "environment swamps" by design is where outsourcing makes a difference. If you want to build voice/video AI including source separation at production quality, consult along with the case study. With one person × generative AI, I support end-to-end from environment setup to production operation.


Sources / official resources

  • Library version requirements are updated. Always confirm primary sources before implementing. The "coexistence of onnxruntime and onnxruntime-gpu" issue is not official text but an explanation based on widely-known behavior in the Issue.
友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading