# How to choose a source-separation tool: selecting Demucs / UVR5(MDX-Net) / Spleeter / Open-Unmix by requirements

> A cross-comparison of the major music-source-separation OSS — Demucs v4, UVR5(MDX-Net), Spleeter, Open-Unmix — by quality, speed, license, setup difficulty, and memory. It explains, with real code, a decision framework you can reverse-look-up from requirements ('which to choose for which project') and the license pitfalls you must always confirm for commercial use.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: 音源分離, Demucs, UVR5, 技術選定, 音声処理, Python, 生成AI
- URL: https://tomodahinata.com/en/blog/music-source-separation-tool-selection-demucs-uvr-spleeter
- Category: Audio source separation & preprocessing

## Key points

- There's no 'one all-purpose' source-separation OSS. For multi-stem × highest quality, Demucs v4; for 2-way vocal/instrumental, UVR5(MDX-Net); for fast/bulk, Spleeter; for a lightweight research baseline, Open-Unmix — choosing by requirements is the right answer.
- Quality rule of thumb: Demucs v4 is SOTA-class among public models at 9.0–9.20 dB SDR on MUSDB HQ. Spleeter is fast but a notch lower in quality; Open-Unmix (UMX) is the reference implementation. The highest accuracy is a Demucs+MDX ensemble (UVR implements it).
- The biggest pitfall in commercial projects is that 'the code license' and 'the weights license' are separate. Open-Unmix's UMXL has non-commercial weights (CC BY-NC-SA 4.0). Demucs/Spleeter/UVR are MIT, but the master rights of the separated audio are always a separate matter.
- SDR is a relative value depending on the dataset, version, and evaluation method. The iron rule is 'measure on your own material' rather than swallowing benchmark numbers. Quantify the evaluation with museval (BSSEval v4).
- For a non-engineer to use by hand, the UVR5 GUI; for server automation, the Demucs API / python-audio-separator. The production design (GPU workers × job queue × idempotency) is separated into another article.

---

## The goal of this article

"I want to split audio into voice and instrumental," "I want to pull just the drums" — when you try to start with music source separation (MSS), the first wall you hit is the problem of **too many tools.** Demucs, UVR5, MDX-Net, Spleeter, Open-Unmix… all claim to be "cutting-edge" and tout benchmark numbers.

This piece provides **a decision framework that compares them cross-sectionally and lets you reverse-look-up "which you should choose for your project" from requirements.** Individual tool usage is left to each dedicated article ([Demucs](/blog/demucs-v4-music-source-separation-production-guide) / [UVR5・MDX-Net](/blog/uvr5-mdx-net-vocal-separation-production-guide)); this piece concentrates on **the axes of selection.** When you finish reading, the goal is a state where you can do these three:

1. **Explain in one line the strengths/weaknesses** of the four major OSS (Demucs / UVR5・MDX-Net / Spleeter / Open-Unmix).
2. From your own requirements (quality, speed, stem count, budget, operations setup), **narrow to one without hesitation.**
3. In a commercial project, preemptively eliminate **license pitfalls that could become litigation risk.**

> **About the author (reliability disclosure)**: I have **single-handedly designed, implemented, and run in production an AI video-localization platform** that fully automates, just by uploading a video, "**audio separation → transcription → translation → multilingual dubbing → lip-sync.**" **Which tool to use in its first stage (audio separation)** was an actual decision that determined the quality and cost of everything downstream. The selection axes here aren't a patchwork of catalog specs but **the judgment criteria I gained while switching tools in real operations.**

---

## 30-second summary (requirement → recommended tool)

| Your requirement | Recommendation | Reason |
| --- | --- | --- |
| **4-way split of drums/bass/vocals/other at highest quality** | **Demucs v4** (`htdemucs` / `htdemucs_ft`) | SOTA-class among public models (9.0–9.20 dB SDR) |
| **2-way vocal/instrumental split is the main goal** (karaoke, a cappella) | **UVR5 (MDX-Net)** | Rich vocal/instrumental-specialized models with little residue |
| **Fast and bulk above all, quality is decent** | **Spleeter** | 100× real-time on GPU. Ideal for preprocessing prep |
| **A research baseline / lightweight reference implementation** | **Open-Unmix (UMX)** | A straightforward BiLSTM. High reproducibility and readability |
| **A non-engineer using it by hand** | **UVR5 GUI** | Drag & drop. Win/Mac/Linux supported |
| **Automating on a server / making it an API** | **Demucs API** / **python-audio-separator** | Callable from code in 3 lines |
| **Absolutely the highest quality** (delivery, mastering) | **Demucs + MDX ensemble** | Combines multiple models' estimates. UVR implements it |

"The setup procedure for individual tools" goes to each dedicated article. After this, this piece digs one level deeper into **the basis for selection.**

---

## The big picture of the four major OSS (one line each)

First, grasp "what each tool is" in the shortest form.

- **Demucs v4 (Meta)**: a hybrid model that looks at **both waveform and spectrogram** and bridges them with a Transformer. The **overall No.1** at high-quality separation of **4 stems (+ a 6-stem version with guitar/piano).** MIT license. [→ details](/blog/demucs-v4-music-source-separation-production-guide)
- **UVR5 (Ultimate Vocal Remover) / MDX-Net**: a GUI + model group **specialized in 2-way vocal/instrumental separation.** MDX-Net is a two-stage configuration combining the frequency and time domains, with a reputation for **little residue.** Anyone can use it via the GUI, and code automation is also possible with `python-audio-separator`. MIT license. [→ details](/blog/uvr5-mdx-net-vocal-separation-production-guide)
- **Spleeter (Deezer)**: **made with TensorFlow.** It bundles pre-trained 2/4/5-stem models, and its weapon is the overwhelming speed of **about 100× real-time on GPU.** The quality yields a step to the latest generation, but it's ideal for **bulk prep.** MIT license.
- **Open-Unmix (UMX, sigsep)**: **a PyTorch reference implementation.** A simple structure estimating the mask with a 3-layer bidirectional LSTM, widely used as **a research baseline.** Lightweight and readable, but the quality yields to the specially optimized ones above.

---

## Cross-comparison table

The numbers are "rules of thumb at writing time, based on official information." **SDR is a relative value that changes with evaluation conditions**, so always make the final judgment by [measuring on your own material](#dont-swallow-the-benchmark-numbers-the-sdr-trap).

| Aspect | **Demucs v4** | **UVR5 / MDX-Net** | **Spleeter** | **Open-Unmix** |
| --- | --- | --- | --- | --- |
| Development | Meta | Anjok07 et al. (OSS) | Deezer | sigsep |
| Method | Waveform + spectrum + Transformer | Two-stage frequency + time | CNN (spectrogram) | BiLSTM (spectrogram) |
| Stems | **4 / 6** | Mainly 2 (Vocals/Inst) | 2 / 4 / 5 | 4 |
| Quality (rule of thumb) | **Highest (9.0–9.20 dB)** | High (vocal separation especially strong) | Mid (speed-prioritized) | Mid (reference implementation) |
| Speed | Mid (GPU recommended, CPU possible) | Mid (GPU recommended) | **Fastest (100× real-time on GPU)** | Fairly fast |
| Setup | `pip install demucs` | GUI / `pip install audio-separator` | `pip install spleeter` | `pip install openunmix` |
| GUI | None (CLI/API) | **Yes** | None | None |
| Code license | MIT | MIT | MIT | MIT |
| **Weights license** | MIT | Confirm per model | MIT | **UMXL is non-commercial (CC BY-NC-SA)** |
| Suited use | Multi-stem, overall quality | Vocal/instrumental separation | Bulk, fast prep | Research, baseline |

---

## A decision flowchart that selects from requirements

I've shaped "which one in the end" so you can **narrow it down just by answering questions.**

```text
Q1. 何を分けたい？
├─ ボーカル と 伴奏 の2つだけ（カラオケ・アカペラ・吹替前処理）
│    └─ Q2. エンジニアが自動化する？
│         ├─ NO（手作業でOK）         → UVR5 GUI
│         └─ YES（サーバー/CIに組む）  → UVR5系を python-audio-separator で
│                                        ／ または Demucs --two-stems=vocals
│
└─ drums / bass / vocals / other に4分割（リミックス・耳コピ・教育）
     └─ Q3. 品質と速度、どちらを優先？
          ├─ 品質最優先（納品・主役） → Demucs v4（htdemucs_ft）
          ├─ バランス重視           → Demucs v4（htdemucs）
          └─ 速度・大量最優先        → Spleeter（4stems）で下ごしらえ
                                        → 採用分だけ Demucs で本処理（二段構え）

※ 研究のベースライン比較が目的 → Open-Unmix
※ 最後の数%まで品質を詰めたい  → Demucs + MDX のアンサンブル（UVR）
```

**There are two key points in the decision.**

- The path branches greatly on **"2-way" vs. "multi-way."** If you want **only voice and instrumental**, as in karaoke or dubbing preprocessing, a multi-stem model is overspec. The vocal-specialized UVR5/MDX-Net or Demucs's `--two-stems=vocals` is enough.
- **Quality and speed are a trade-off.** Running every song at highest quality is wasteful. The two-tier setup of **"prep all with Spleeter → final-process only the adopted ones with Demucs"** is the standard for balancing quality and cost.

---

## Always effective in commercial projects: license pitfalls

This is where **the biggest difference is made in B2B projects.** If you crudely judge "it's OSS so commercial use is fine," you'll be tripped up later. **There are three layers to confirm.**

### Layer 1: the code license

Demucs, UVR5, Spleeter, and Open-Unmix all have **MIT-licensed code** that supports commercial use. This is mostly safe.

### Layer 2: the weights (trained model) license ← easy to overlook

**Even if the code is MIT, the license of the distributed trained model is separate.** The most typical trap is **Open-Unmix's UMXL** — the high-quality weights are under a **non-commercial license (CC BY-NC-SA 4.0)**, and **embedding them in a commercial product is a violation** (for commercial use, choose the plain `UMX` / `UMXHQ`).

> 🔑 **The senior's iron rule**: not limited to source separation, when introducing an AI model commercially, **always confirm "the code license" and "the weights license" separately.** Read not only the README license field but the **license on the model distribution page (HuggingFace / each model card).** Whether you've made this a habit changes a project's safety.

### Layer 3: the rights of the "separated audio itself" ← most important

Even if the tool is commercial-OK, **the copyright and master rights of the input song are a separate matter.** Separating a commercial song and **distributing/selling it as a karaoke track or a cappella requires the rights holder's permission.** Freedom of the tool ≠ freedom of the material. Hold this down as **a rights-processing issue, not a technical one**, under the responsibility of the user (and the client).

| Layer to confirm | Demucs | UVR5 | Spleeter | Open-Unmix |
| --- | --- | --- | --- | --- |
| Code | MIT ✅ | MIT ✅ | MIT ✅ | MIT ✅ |
| Weights | MIT ✅ | Confirm per model ⚠️ | MIT ✅ | **UMXL is non-commercial ❌** |
| Separated audio | **Always needs separate rights processing** ⚠️ | Same as left | Same as left | Same as left |

---

## Don't swallow the benchmark numbers: the SDR trap

When you see an SDR comparison like "Demucs is 9.2 dB, Spleeter is…," **take a step back.** **SDR (Signal-to-Distortion Ratio) is a relative value that strongly depends on the evaluation dataset, the model's version, and the evaluation method (how frame aggregation is taken).** It's not uncommon that paper A's 9.2 dB and article B's number **weren't measured on the same footing in the first place.**

The correct attitude in practice is this.

- **Use official benchmarks as "a rule of thumb for ordering"** (you can trust the magnitude relation Demucs > Spleeter).
- **Always make the final judgment by measuring on "your own material."** J-POP, narration, podcasts, live recordings — strengths/weaknesses change with genre and mic environment.
- Measure **not by ear alone but with numbers.** Produce SDR/SIR/SAR with `museval` (BSSEval v4) and **line up candidate tools on the same material and the same metric.** The concrete procedure is collected in the [article on measuring source-separation quality with numbers](/blog/music-source-separation-quality-evaluation-sdr-museval).

> When I choose a tool in a project, I take a two-stage process: **"narrow to 2–3 with official benchmarks → compare with museval on 10 of the customer's real materials → confirm by ear too."** Deciding by catalog numbers alone causes the accident of "more residue than expected" on production material.

---

## Build vs. buy: OSS self-host or commercial API

"Whether to run it yourself in the first place, or use a SaaS API" is also part of selection.

| Aspect | OSS self-host (Demucs, etc.) | Commercial API / SaaS |
| --- | --- | --- |
| Initial cost | Environment setup, GPU procurement | Nearly zero (same day) |
| Unit price | **Cheap at high volume** (fill the GPU) | Usage-based piles up |
| Data sovereignty | **Completes in-house** (sensitive audio is safe) | External transmission required |
| Customization | Models and parameters freely | Depends on the provided range |
| Operations load | **You watch it** (where this article group comes in) | Can be left to them |

**Rules of thumb for the judgment**:

- **Small / prototype / low data sensitivity** → first confirm quality with a commercial API or a local CLI.
- **Steadily high volume / high data sensitivity / want to lower the unit price** → **self-host OSS.** Run it with GPU workers. The production design for that is detailed in the [article on making source separation a production API](/blog/music-source-separation-production-api-gpu-worker-queue).

My stance, advancing development with **one person × generative AI**, is "**first confirm quality and demand with OSS locally, and once the volume is visible, put it on a self-hosted GPU worker.**" This is the realistic solution that doesn't pay wasteful fixed costs and avoids lock-in.

---

## Frequently asked questions (FAQ)

**Q. In the end, what should I install first?**
A. **Demucs for multi-stem separation, UVR5 for just vocal/instrumental.** Holding down these two covers most projects. If in doubt, start with Demucs.

**Q. Which is better, Demucs or UVR5(MDX-Net)?**
A. **Depends on the purpose.** For overall power wanting drums/bass/other, Demucs; for little residue in vocal/instrumental, the MDX-Net family is strong in many situations. To aim for the highest quality, **ensemble both** (UVR implements it).

**Q. Is Spleeter already old?**
A. It yields to the latest generation in quality, but **its speed is still top-class.** It's active as the front stage of "prep everything fast → final-process only the adopted ones with a high-quality model."

**Q. Is it OK to embed in a commercial product?**
A. The code is all MIT and mostly OK. But always confirm **the weights license (especially Open-Unmix UMXL is non-commercial)** and **the rights processing of the input song** separately ([the license chapter](#always-effective-in-commercial-projects-license-pitfalls)).

**Q. I only have a CPU.**
A. It works (slowly). Demucs is about 1.5× real-time, Spleeter is relatively light. GPU is recommended for bulk, but CPU is enough for small verification.

**Q. Should I just choose the one with the highest SDR on the benchmark?**
A. **That's dangerous.** SDR is a relative value depending on evaluation conditions. The iron rule is to **measure on your own material with museval** and choose ([the SDR trap](#dont-swallow-the-benchmark-numbers-the-sdr-trap)).

---

## Conclusion: tool selection goes "requirement → constraint → measurement"

There's no "one all-purpose" source-separation OSS. That's exactly why selection should be done **with a framework, not by feel.**

1. **Requirement**: what you want to split (2-way or multi-way), whether you prioritize quality or speed.
2. **Constraint**: license (the three layers of code, weights, material), budget, operations setup, data sensitivity.
3. **Measurement**: narrow candidates to 2–3 and decide after **comparing on your own material with museval.**

Nail it down in this order and you avoid "install the famous one for now and regret it later." And — **this very selection is where outsourcing produces value.** Anyone can just run a tool, but **reading the requirements and constraints to choose the optimal one (or combination), and eliminating even license risk**, turns experience directly into quality.

> I've decided tools by using the selection axes here on the **AI video-localization platform I actually run in production.** If you're considering technology selection, PoC, or productionization of voice/video AI including source separation, take a look at my [track record](/case-studies/ai-video-localization-lipsync) and feel free to consult me. With **one person × generative AI**, I accompany you from decision-making to implementation — fast, cheap, and safe.

---

## Sources / official resources

- **Demucs**: [adefossez/demucs](https://github.com/adefossez/demucs) — model list, SDR, license ([explanation article](/blog/demucs-v4-music-source-separation-production-guide))
- **UVR5 / MDX-Net**: [Anjok07/ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui) / [MDX-Net paper arXiv:2111.12203](https://arxiv.org/abs/2111.12203) ([explanation article](/blog/uvr5-mdx-net-vocal-separation-production-guide))
- **Spleeter**: [deezer/spleeter](https://github.com/deezer/spleeter) — TensorFlow, 2/4/5 stems, MIT
- **Open-Unmix**: [sigsep/open-unmix-pytorch](https://github.com/sigsep/open-unmix-pytorch) — UMX/UMXHQ/UMXL (**UMXL is a non-commercial license**)
- **Evaluation tool**: [sigsep/sigsep-mus-eval (museval)](https://github.com/sigsep/sigsep-mus-eval) — BSSEval v4 (SDR/ISR/SIR/SAR)

* Licenses, quality, and pricing get updated. **For commercial use, always confirm primary sources (especially the weights license).** The numbers here are rules of thumb based on official information at writing time.
