# Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

> Solve in one shot the mmcv/mmdet/mmpose dependency hell everyone gets stuck on in MuseTalk setup, with an official-compliant 'working combination.' It covers the correct install order of Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1, and the cause and remedy for No module named mmcv._ext, CUDA is not available, the missing libGL.so.1, onnxruntime's CPU fallback, and new-GPU (Blackwell) support — plus ensuring reproducibility with Docker.

- Published: 2026-06-25
- Author: 友田 陽大
- Tags: MuseTalk, トラブルシューティング, mmcv, CUDA, Python, GPU, 環境構築, リップシンク
- URL: https://tomodahinata.com/en/blog/musetalk-installation-troubleshooting-mmcv-mmdet-mmpose-cuda
- Category: Lip-sync & digital humans
- Pillar guide: https://tomodahinata.com/en/blog/ai-lip-sync-talking-head-model-selection-guide-2026

## Key points

- What makes MuseTalk hard is not the model but the mmlab-family dependencies. mmcv/mmdet/mmpose are tightly version-coupled; bump one and all break. Installing the official pinned versions via mim is the only stable answer.
- The correct install order is 'system deps → Python 3.10 env → PyTorch 2.0.1 (cu117) → requirements → mim install mmcv==2.0.1/mmdet==3.1.0/mmpose==1.1.0 → fetch weights.' Following the order is 90% of it.
- Common errors have fixed causes: No module named mmcv._ext = build mismatch, CUDA is not available = CPU torch / driver mismatch, libGL.so.1 = missing libgl1, painfully slow = onnxruntime's CPU fallback.
- New GPUs (Blackwell/RTX 50xx) sometimes won't run on the official pinned versions (cu117). Community reports say you need a newer torch + patches to Python/mediapipe. Confirm primary sources.
- So you never burn out on environment setup again, bake the working combination into Docker. Reproducibility is the premise of production operation.

---

## The goal of this article

You try out [MuseTalk](/blog/musetalk-realtime-lip-sync-production-guide) and **melt away days on environment setup before reaching the model's substance** — this is beyond a "common experience"; it's almost a rite of passage. The cause is not MuseTalk itself, but the **dependency hell of the mmlab ecosystem (mmcv / mmdet / mmpose)** used for face/pose detection.

This article is a practical guide to **escaping that hell in one shot with an official-compliant "working combination."** It shows the **correct install order**, **crushes common errors by cause**, and finally takes you to **never doing environment setup again with Docker.** It aims for a state where someone whose "spirit was broken by `No module named 'mmcv._ext'`" can **run inference by today.**

> **About the author (reliability disclosure)**: I **self-host and operate in production** multiple lip-sync models including MuseTalk. The error remedies in this article aren't a copy from the docs but **a record of the mines I stepped on actually rebuilding this environment many times.**

---

## 30-second summary (conclusion first)

| Point | Conclusion |
| --- | --- |
| **Why it's hard** | Because not MuseTalk itself but **the mmlab family (mmcv/mmdet/mmpose) dependencies are tightly coupled** |
| **The only stable answer** | Install **the official pinned versions via `mim`.** Installing the latest with `pip install mmcv` breaks it |
| **The correct order** | system deps → Python 3.10 → **PyTorch 2.0.1 (cu117)** → requirements → mim install → fetch weights |
| **Pinned versions** | `mmcv==2.0.1` / `mmdet==3.1.0` / `mmpose==1.1.0` (these three are a set) |
| **Common errors** | missing `mmcv._ext` = build mismatch, `CUDA is not available` = CPU torch / driver, `libGL.so.1` = missing libgl1 |
| **New GPU** | Blackwell (RTX 50xx), etc., sometimes won't run on the official pinned versions. **A newer torch + patches are needed** (confirm) |
| **Permanent fix** | **Bake it into Docker.** Ensure reproducibility and never burn out again |

---

## Why is MuseTalk installation hard

Internally MuseTalk uses **dwpose (face/body pose)** and **face detection/parsing**, and those depend on the **mmlab (OpenMMLab)** library family — `mmcv`, `mmdet`, `mmpose`. These three are **strongly coupled in each other's versions**, and

- `mmcv` **builds C++/CUDA extensions matched to the PyTorch and CUDA versions** (`mmcv._ext`).
- `mmdet` / `mmpose` accept only **a specific `mmcv` version range.**

In other words, **it only runs once all five — "torch ↔ cuda ↔ mmcv ↔ mmdet ↔ mmpose" — mesh.** Bump even one to the latest and it collapses like dominoes. This is the true identity of "I `pip install mmcv` and mmdet dies on import."

**Conclusion: don't try to resolve versions yourself. Install the fixed combination the official team verified, in the correct order.** That's all there is to it.

---

## The golden path: install these versions in this order

This is the procedure compliant with the official ([GitHub README](https://github.com/TMElyralab/MuseTalk)). **Following the order** is 90% of success.

### Step 0: system dependencies (Ubuntu family)

```bash
# OpenCVが必要とするlibGLと、動画I/Oのffmpeg
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg
```

> Forget to install `libgl1` and you'll always trip later on `ImportError: libGL.so.1: cannot open shared object file`. Install it first.

### Step 1: an isolated Python 3.10 environment

```bash
conda create -n MuseTalk python==3.10
conda activate MuseTalk
```

> **Stick to 3.10.** On 3.11/3.12 you can get stuck without finding mmlab-family wheels (the new-GPU exception is below).

### Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)

```bash
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
  --index-url https://download.pytorch.org/whl/cu117
```

> Omit `--index-url` here and **a CPU build or a different CUDA build of torch** gets installed, so `torch.cuda.is_available()` later becomes `False` and you fall into "it runs but is painfully slow" or "it doesn't use the GPU." **Make CUDA 11.7 explicit.**

### Step 3: app dependencies

```bash
pip install -r requirements.txt
```

### Step 4: install mmlab pinned versions with `mim` (most important)

```bash
pip install -U openmim
mim install "mmengine"
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
mim install "mmpose==1.1.0"
```

> **Why `mim`**: `mim` (OpenMMLab Installs More) resolves and installs **a prebuilt `mmcv` wheel matched to your current torch/CUDA.** With `pip install mmcv`, it grabs the latest version or a source build and tends to fail building `mmcv._ext`. **Always install with `mim`, with versions pinned.** Also follow the `mmcv → mmdet → mmpose` order.

### Step 5: fetch the model weights

```bash
# 公式スクリプト（Linux）。Windowsは download_weights.bat
sh download_weights.sh
```

The final tree (main parts):

```text
./models/
├── musetalkV15/   (unet.pth, musetalk.json)   # latest v1.5 main
├── musetalk/      (pytorch_model.bin, ...)     # v1.0
├── sd-vae/        (diffusion_pytorch_model.bin, config.json)
├── whisper/       (pytorch_model.bin, ...)      # audio features
├── dwpose/        (dw-ll_ucoco_384.pth)
├── face-parse-bisent/ (79999_iter.pth, resnet18-5c106cde.pth)
└── syncnet/       (latentsync_syncnet.pt)       # sync evaluation
```

### Step 6: verify operation (always do this)

```bash
# ① GPUが見えているか（False なら Step 2 をやり直す）
python -c "import torch; print('cuda:', torch.cuda.is_available())"

# ② mmcvのCUDA拡張が読めるか（ここが通れば山は越えた）
python -c "from mmcv.ops import RoIAlign; print('mmcv._ext OK')"

# ③ デモ推論（v1.5・通常）
sh inference.sh v1.5 normal
```

If `cuda: True` and `mmcv._ext OK` appear, you've **broken through dependency hell.**

---

## Common-error quick reference (cause → remedy)

The errors you actually get can almost all be explained by this table.

| Error / symptom | Cause | Remedy |
| --- | --- | --- |
| `No module named 'mmcv._ext'` | mmcv is a **build mismatched** with the current torch/CUDA | after `pip uninstall mmcv mmcv-full -y`, reinstall with **`mim install "mmcv==2.0.1"`** |
| `mmdet` / `mmpose` dies on import | mmcv **version mismatch** | align the three **at pinned versions** (2.0.1 / 3.1.0 / 1.1.0). Order too: mmcv→mmdet→mmpose |
| `torch.cuda.is_available()` is `False` | a **CPU torch** got installed / driver mismatch | reinstall Step 2 with `--index-url .../cu117`. Check the driver with `nvidia-smi` |
| `ImportError: libGL.so.1` | the system has **no libgl1** | `sudo apt-get install -y libgl1 libglib2.0-0` |
| `ffmpeg: command not found` / video write fails | ffmpeg not installed / path unknown | `apt install ffmpeg`; on Windows pass `--ffmpeg_path` |
| Inference is **abnormally slow** (CPU-like despite GPU) | **onnxruntime-gpu CPU fallback** or torch is CPU | confirm the consistency of onnxruntime-gpu with CUDA/cuDNN. Recheck ① torch CUDA too |
| `CUDA out of memory` | long clip / big batch / fp32 | add `--use_float16`, lower `--batch_size`, segment long clips |
| Weights not found (`FileNotFoundError`) | download incomplete / wrong path | rerun `download_weights.sh` and check the `models/` tree |
| `huggingface` DL stops midway | network/auth | rerun (resume), if needed `huggingface-cli login` / use a mirror |
| Gradio starts but no face is detected | profile/occlusion/multiple faces/low resolution | make the material **frontal, single-face.** Guard with face detection in preprocessing ([pitfalls chapter](/blog/musetalk-realtime-lip-sync-production-guide#本番で必ず詰まる落とし穴と回復性設計)) |

### Deep dive: `No module named 'mmcv._ext'` (most common)

`mmcv._ext` is **mmcv's C++/CUDA extension.** Its absence = a sign that **a build matched to your current torch/CUDA isn't installed.** What to do is fixed.

```bash
# 中途半端なmmcvを完全に消してから、mimで“今の環境に合う”版を入れ直す
pip uninstall -y mmcv mmcv-full mmcv-lite
mim install "mmcv==2.0.1"
python -c "from mmcv.ops import RoIAlign; print('OK')"
```

**Starting to build from source is a danger sign** (you enter the swamp of compilers / the CUDA toolkit). Installing torch correctly first works to let `mim` find a **prebuilt wheel.**

### Deep dive: `CUDA is not available`

Isolate in this order.

```bash
nvidia-smi   # ドライバ/GPUが見えるか。出なければホスト側の問題（ドライバ未導入）
python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"
# 期待: 2.0.1+cu117 / 11.7 / True
```

If `torch.version.cuda` is `None`, **a CPU build is installed** = redo Step 2 with `--index-url`. If `nvidia-smi` doesn't appear, fix the host's **NVIDIA driver** first (host side if in a container).

---

## When it won't run on a new GPU (Blackwell / RTX 50xx)

The official pinned versions assume **CUDA 11.7 / PyTorch 2.0.1.** But **new GPU architectures (e.g., the Blackwell generation, RTX 50xx, compute capability sm_120, etc.)** are **not supported by the cu117 build of torch**, and you can get an error like `CUDA error: no kernel image is available for execution on the device`.

In this case, you need a **response that departs from the official pinned versions.** Per community reports —

- Bump to a **newer CUDA-supporting PyTorch** (a newer cu12x build).
- Along with that, adjustments are needed such as moving **Python to 3.12, etc.** and **patching dependencies like mediapipe.**

> ⚠️ **A note for accuracy**: this is **outside the official procedure**, and you need to **re-resolve compatible versions of mmcv/mmdet/mmpose** (bump torch and match mmcv too). The latest correct versions are a **moving target**, so **confirm the [official repo's Issues/Discussions](https://github.com/TMElyralab/MuseTalk/issues) as the primary source.** In production, the iron rule is to **immediately pin the working combination into Docker** and never re-resolve it.

---

## Permanent fix: never do this "again" with Docker

Once you reach a working combination, **bake it into Docker to ensure reproducibility.** This is the only way to escape dependency hell forever.

```dockerfile
# 動いた組み合わせを固定（詳細は本番デプロイ記事へ）
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
      python3.10 python3-pip git ffmpeg libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*
RUN pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
      --index-url https://download.pytorch.org/whl/cu117
COPY requirements.txt .
RUN pip3 install -r requirements.txt \
    && pip3 install -U openmim \
    && mim install "mmengine" "mmcv==2.0.1" "mmdet==3.1.0" "mmpose==1.1.0"
```

The **full picture of production deployment**, including Docker, GPU serving, autoscaling, and cost optimization, is summarized in [MuseTalk production-deployment practice](/blog/musetalk-self-host-production-deployment-docker-gpu-autoscaling).

---

## Frequently asked questions (FAQ)

**Q. Isn't `pip install mmcv` fine?**
A. Often not. It grabs the latest version or a source build and fails to build `mmcv._ext` or becomes inconsistent with mmdet/mmpose. **Always install with `mim` at a pinned version** (2.0.1).

**Q. I want to install with Python 3.11 / 3.12.**
A. The official recommendation is **3.10.** Newer Python tends to get stuck without finding mmlab wheels. Except **when you must bump torch for a new GPU**, sticking to 3.10 is safe.

**Q. Does it work on Windows too?**
A. It works. Use `download_weights.bat`, and pass the ffmpeg distributed binary via `--ffmpeg_path`. But because mmlab's build situation is smoother on Linux, **WSL2 or Docker** is recommended.

**Q. Can I run it on CPU only?**
A. It's not impossible but **extremely slow** (even the official notes about 5 minutes for an 8-second video on an RTX 3050 Ti = fp16). A GPU is the premise for practical use. Always confirm `torch.cuda.is_available()` is True.

**Q. What about macOS (Apple Silicon)?**
A. Because it assumes CUDA, it's **not straightforward.** It's realistic to verify on **Linux + NVIDIA GPU** (including a cloud GPU instance) or use a third-party API (fal.ai, etc.).

**Q. So what's the shortest way to try it?**
A. If you want to **skip** environment setup, first check the quality with a [third-party API](/blog/musetalk-realtime-lip-sync-production-guide#使い方a試すだけなら-apifalai--replicate自前gpu不要), and **Dockerize when you reach the stage of operating seriously** — that's the shortest route.

---

## Conclusion: order and pinned versions, and Docker

MuseTalk installation is a **straight road once you know the tricks.**

1. Follow the order of **system deps (libgl1/ffmpeg) → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights.**
2. Pin **mmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0** with **`mim`.** The latest from `pip install mmcv` is a mine.
3. Errors have fixed causes. Crush them with the **quick reference** (`mmcv._ext` = build mismatch, `CUDA is not available` = CPU torch).
4. New GPUs are outside the official pinned versions. **Use Issues as the primary source**, and pin as soon as it works.
5. **Bake the working combination into Docker.** Reproducibility is the premise of production operation.

Get this far and you'll no longer burn out on environment setup. Next, **actually use it** — head to the mechanism and tuning in the [complete MuseTalk guide](/blog/musetalk-realtime-lip-sync-production-guide).

> I **operate the environment setup / Dockerization of this article in an actual production GPU pipeline.** If you "always get stuck on environment setup" or "want to build a reproducible production environment," see the [case study](/case-studies/ai-video-localization-lipsync) and reach out. With **one person × generative AI**, I build end-to-end from PoC to production — fast, cheap, and safe.

---

## Sources / related resources

- **MuseTalk official**: [GitHub (README, download_weights, requirements)](https://github.com/TMElyralab/MuseTalk) / [Issues (primary source for new-GPU support, etc.)](https://github.com/TMElyralab/MuseTalk/issues)
- **OpenMMLab**: mmcv / mmdet / mmpose (installation via `mim` is officially recommended)
- **Usage / tuning**: [complete MuseTalk guide](/blog/musetalk-realtime-lip-sync-production-guide)
- **Productionization**: [MuseTalk production-deployment practice (Docker/GPU/autoscaling)](/blog/musetalk-self-host-production-deployment-docker-gpu-autoscaling)
- **Model selection**: [AI lip-sync / talking-head model selection guide 2026](/blog/ai-lip-sync-talking-head-model-selection-guide-2026)

* Versions, supported GPUs, and dependencies are updated. Especially **new-GPU support is fluid**, so **always confirm the official repo's Issues/README as the primary source.** This article's pinned versions (Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1 / mmdet 3.1.0 / mmpose 1.1.0) are official-compliant as of writing.
