How to use this article
UVR5 / MDX-Net and audio-separator are tools that are instant in a working environment but get stuck endlessly in a non-working one. And the sticking points are nearly fixed — GPU not used, CUDA out of memory, cuDNN error, ffmpeg missing, the model downloaded every time.
This article is a troubleshooting collection you can look up in reverse from the symptom. For each symptom, it shows "what to run first to diagnose → cause → concrete fix," based on the official information of ONNX Runtime / PyTorch / audio-separator.
About the author (reliability disclosure): I have single-handedly designed, implemented, and operate in production an AI video-localization foundation with source separation as the first stage. The fix procedures in this article are a record of what I've actually stepped on and fixed in local, Colab, container, and production GPU environments. The scale design in production is summarized in the GPU-worker-foundation article, and the tool's big picture in the UVR5 guide.
Symptom quick reference (place a bet in 30 seconds)
| Symptom | Most likely cause | Immediate move |
|---|---|---|
| Painfully slow despite GPU | coexistence of onnxruntime and onnxruntime-gpu / CUDA·cuDNN mismatch | ① CPU fallback |
| CUDA out of memory | segment_size / batch_size too large, long clip | ② OOM |
| cuDNN / CUDA error | ORT and CUDA/cuDNN version incompatibility | ③ version mismatch |
| ffmpeg-related error | ffmpeg not installed | ④ ffmpeg |
| Slow with model DL every time | /tmp volatile, cache not persisted | ⑤ model DL |
| Slow/won't work on Mac | Apple Silicon doesn't support CUDA (MPS/CPU) | ⑥ Apple Silicon |
First, diagnose: confirm the environment mechanically
Before fixing by intuition, acquire the facts. The first commands to throw are these.
# audio-separator公式の環境診断(最優先)。CUDA/ffmpegの可否を一発で出す
audio-separator --env_info
# 期待する出力例:
# "ONNXruntime has CUDAExecutionProvider available, enabling acceleration"
# "FFmpeg installed"
# GPUドライバ自体が見えているか
nvidia-smi
Further, confirm both ONNX Runtime and PyTorch from Python. Because audio-separator runs on 2 lines — ONNX Runtime (.onnx for MDX-Net, etc.) and PyTorch (.ckpt for Demucs, etc.) — you need to look at both.
# diagnose.py — GPUが本当に効いているかを機械的に確認する
import onnxruntime as ort
print("ORT version :", ort.__version__)
print("ORT device :", ort.get_device()) # 'GPU' なら可、'CPU' なら未使用
print("ORT providers:", ort.get_available_providers()) # 'CUDAExecutionProvider' が含まれるか
try:
import torch
print("torch :", torch.__version__)
print("torch.cuda :", torch.cuda.is_available()) # Demucs等のPyTorch側
if torch.cuda.is_available():
print("device name :", torch.cuda.get_device_name(0))
except ImportError:
print("torch not installed (MDX-Net/.onnx のみ使うなら不要)")
- If
'CUDAExecutionProvider'isn't inort.get_available_providers(), the GPU isn't being used (CPU execution = painfully slow). - If
ort.get_device()is'CPU', the same.
If these two indicate "not used," the cause is almost certainly symptom ①.
[Most common] GPU not used, painfully slow on CPU
This is the most common and the hardest to notice problem. No error at all, just tens of times slower.
Why it silently falls back to CPU
ONNX Runtime runs on a priority list of Execution Providers. As the official explains —
['CUDAExecutionProvider', 'CPUExecutionProvider']means execute a node using CUDAExecutionProvider if capable, otherwise execute using CPUExecutionProvider.
That is, if CUDA can't be used, it silently executes on CPU. This is the true identity of "it works but is slow."
Cause A: coexistence of onnxruntime and onnxruntime-gpu (most likely)
Install both the CPU onnxruntime and the GPU onnxruntime-gpu and they deploy to the same directory and overwrite last-wins, sometimes silently erasing CUDAExecutionProvider.
🔎 This is not a story with an explicit warning in ONNX Runtime's official documentation but a widely-known gotcha in the community like microsoft/onnxruntime's issue. The behavior (CPU fallback) can be explained from the providers spec.
Fix: remove the CPU version, then reinstall the GPU version.
pip uninstall -y onnxruntime # CPU版を必ず先に消す
pip install --force-reinstall onnxruntime-gpu
Cause B: CUDA / cuDNN version mismatch
If the CUDA/cuDNN that onnxruntime-gpu's build requires doesn't mesh with the environment's, the CUDA EP can't load and it falls back to CPU. → To ③ version mismatch.
audio-separator's official reinstall procedure
audio-separator's README shows a clean reinstall procedure for when the GPU doesn't work (verbatim).
pip uninstall torch onnxruntime
pip cache purge
pip install --force-reinstall torch torchvision torchaudio
pip install --force-reinstall onnxruntime-gpu
If audio-separator --env_info outputs CUDAExecutionProvider available after this, it's resolved.
CUDA out of memory (crashes)
The case where RuntimeError: CUDA out of memory appears with a long clip / high-resolution setting.
The priority of fixes
- Lower
segment_size(most effective). The chunk put on the GPU becomes smaller, saving VRAM. - Set
batch_sizeto 1. Throughput drops but it reliably reduces VRAM. - Switch to a smaller/lighter model (RoFormer → MDX-Net, etc. See the model-selection guide).
- Move up to a GPU with larger VRAM (g4dn 16GB → g5/g6 24GB).
# OOM対策:チャンクを小さくしてVRAMを抑える
from audio_separator.separator import Separator
sep = Separator(mdx_params={
"segment_size": 128, # 既定256から下げる(OOMの一番効く対策)
"overlap": 0.25,
"batch_size": 1, # まとめ処理をやめてVRAMを節約
"hop_length": 1024,
"enable_denoise": False,
})
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")
Beware the misconception about empty_cache()
When OOM appears in the PyTorch line (Demucs, etc.), you'll want to call torch.cuda.empty_cache(), but don't overtrust it. Per the PyTorch official —
Calling empty_cache() releases all unused cached memory from PyTorch ... However, the occupied GPU memory by tensors will not be freed so it can not increase the amount of GPU memory available for PyTorch.
That is, what's freed is only "unused cache," and the VRAM held by tensors doesn't return. The essential solution to OOM is to "reduce the usage in the first place" = make the chunk/batch smaller. If fragmentation is suspected, tuning the environment variable PYTORCH_CUDA_ALLOC_CONF is also an option.
CUDA/cuDNN version errors
LoadLibrary failed, libcudnn.so.X not found, the CUDA EP can't load, etc.
ONNX Runtime's version requirements (official table)
ONNX Runtime's CUDA EP has required CUDA/cuDNN fixed per ORT version (official table). Key points:
| ONNX Runtime | CUDA | cuDNN | Note |
|---|---|---|---|
| 1.20.x / 1.19.x | 12.x | 9.x | PyPI default (the PyPI default from 1.19 is CUDA 12.x) |
| 1.18.1 | 12.x | 9.x | cuDNN 9 required |
| 1.18.0 | 12.x | 8.x | — |
| 1.20/1.19/1.18.x | 11.8 | 8.x | (the CUDA 11.8 builds of 1.19/1.20 aren't provided on PyPI) |
And the biggest pitfall is this (official verbatim).
ONNX Runtime built with cuDNN 8.x is not compatible with cuDNN 9.x, and vice versa.
cuDNN 8.x and 9.x are not compatible. A newer ORT (1.19+) requires cuDNN 9, so if the environment has only cuDNN 8, the CUDA EP can't load and it falls back to CPU.
Per-environment remedies
- Colab (when CUDA 12 is the default but ORT requires the CUDA 11 libraries, README verbatim):
apt update && apt install -y nvidia-cuda-toolkit # 不足するCUDAライブラリを補う
- ONNX Runtime for a CUDA 12 environment (the nightly the README guides to):
python -m pip install ort-nightly-gpu \
--index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
- The basic policy: match the version of
onnxruntime-gputo the environment's CUDA/cuDNN with the table. If you can't match, install the corresponding CUDA/cuDNN or swap the ORT side. audio-separator's supported CUDA is 11.8 and 12.2.
ffmpeg not found (audio I/O failure)
ffmpeg not found, crashes on audio read/write. audio-separator requires ffmpeg for audio I/O.
# Debian / Ubuntu
apt-get update && apt-get install -y ffmpeg
# macOS
brew install ffmpeg
OK if audio-separator --env_info outputs FFmpeg installed. In a container, always bundle ffmpeg into the image (runtime apt is slow and unstable).
The model is downloaded every time (slow)
"A few hundred MB DL runs on every startup" — frequent with containers/serverless.
The cause
The model is auto-downloaded on first use, and the default save location is /tmp/audio-separator-models/ (README verbatim). Because /tmp is volatile, it's re-downloaded on every cold start.
The fix
- Persist the save location: point
model_file_dirat a persistent volume. - Bake it into the image: DL once at build time and include it in the layer (fastest and most reliable in production).
from audio_separator.separator import Separator
sep = Separator(model_file_dir="/models") # 永続ボリューム or イメージ同梱パス
sep.load_model(model_filename="UVR-MDX-NET-Inst_HQ_3.onnx")
# Dockerfile(抜粋):ビルド時にモデルを焼き込み、起動時の再DLをゼロに
ENV MODEL_DIR=/models
RUN python -c "from audio_separator.separator import Separator; \
s=Separator(model_file_dir='${MODEL_DIR}'); \
s.load_model(model_filename='UVR-MDX-NET-Inst_HQ_3.onnx')"
This "resident model + image baking" cuts both production cold starts and egress billing at once. The scale design is detailed in the GPU-worker-foundation article.
Doesn't work or is slow on Apple Silicon (Mac)
Apple Silicon (M1+) doesn't support NVIDIA CUDA. UVR5 itself supports MPS (GPU) acceleration, but in the audio-separator library, CPU execution (the [cpu] extra) is the basic.
pip install "audio-separator[cpu]" # Apple SiliconはCPU版
- Worrying that "the GPU isn't used" on a Mac is normal (because there's no CUDA). It runs on CPU but is slow, so the realistic answer for high-volume processing is to use a Linux environment with an NVIDIA GPU (including cloud).
- Choosing a light model (MDX-Net) and matching
segment_sizeto the environment gets it close to practical even on CPU.
Prevent "silent degradation" in production: a startup guard
Finally, a preventive measure to never step on the most common ① again. In production, verify the GPU at startup and fail-fast if it has fallen back to CPU. Because running a slow production unnoticed is the most expensive.
# guard.py — 起動時にCUDAが効いているか検証。本番ではCPUフォールバックで起動を止める
import os
import onnxruntime as ort
def assert_gpu(require: bool = True) -> None:
providers = ort.get_available_providers()
ok = "CUDAExecutionProvider" in providers and ort.get_device() == "GPU"
if not ok:
msg = (f"GPUが有効化されていません(providers={providers}, "
f"device={ort.get_device()})。onnxruntime-gpuとCUDA/cuDNNを確認。")
if require:
raise RuntimeError(msg) # CIや本番起動で確実に気づく
print(f"[WARN] {msg}")
if __name__ == "__main__":
assert_gpu(require=os.environ.get("REQUIRE_GPU", "true").lower() == "true")
Just inserting this into Docker's HEALTHCHECK or service startup structurally prevents the worst accident of "painfully slow on CPU before you knew it."
Frequently asked questions (FAQ)
Q. I have a GPU but --env_info doesn't output CUDAExecutionProvider.
A. First, pip uninstall -y onnxruntime to remove the CPU version and reinstall onnxruntime-gpu (coexistence is most likely). If that doesn't fix it, confirm the version table of ORT and CUDA/cuDNN and suspect a cuDNN 8/9 mix-up.
Q. The GPU works locally but it's CPU in a container.
A. Confirm whether the CUDA/cuDNN runtime is correctly installed in the container, and whether onnxruntime (the CPU version) has slipped in. Also check whether nvidia-smi works inside the container (--gpus all / GPU runtime).
Q. Can I fix CUDA out of memory with empty_cache?
A. Basically no. Because empty_cache() only frees unused cache. Lowering segment_size/batch_size is the right path. If VRAM is insufficient, move up the GPU.
Q. Why is ffmpeg needed?
A. It's used for audio read/write (decode/encode). Without it installed, it crashes on audio I/O. Install it with apt-get install -y ffmpeg, etc.
Q. I want to use the GPU on a Mac. A. Apple Silicon doesn't support CUDA. UVR5 itself supports MPS, but library operation is basically CPU. For high-volume processing, use a Linux/cloud with an NVIDIA GPU.
Conclusion: have the pattern of symptom → diagnose → fix
UVR5 / audio-separator trouble is a swamp if you fix by guesswork, but if you take the facts with diagnostic commands, the cause is almost uniquely determined.
- First, take the GPU/ffmpeg facts with
audio-separator --env_infoanddiagnose.py. - The most common is the CPU fallback — remove the
onnxruntimecoexistence and match CUDA/cuDNN to the table. - For OOM, lower the chunk/batch (
empty_cacheisn't the essential solution). - In production, structurally prevent "silent degradation" with a startup GPU guard.
Not stepping into these "environment swamps" by design is where outsourcing makes a difference. If you want to build voice/video AI including source separation at production quality, consult along with the case study. With one person × generative AI, I support end-to-end from environment setup to production operation.
Sources / official resources
- ONNX Runtime CUDA EP (requirement table, cuDNN compatibility): CUDA Execution Provider
- ONNX Runtime Python API: API summary (get_available_providers / get_device) / Install
- PyTorch CUDA memory management: CUDA semantics (empty_cache / PYTORCH_CUDA_ALLOC_CONF)
- audio-separator: README (--env_info / reinstall procedure / CUDA 11.8·12.2 / model DL location)
- The "onnxruntime coexistence" issue: microsoft/onnxruntime issue #7748
- Library version requirements are updated. Always confirm primary sources before implementing. The "coexistence of onnxruntime and onnxruntime-gpu" issue is not official text but an explanation based on widely-known behavior in the Issue.