Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

The goal of this article

You try out MuseTalk and melt away days on environment setup before reaching the model's substance — this is beyond a "common experience"; it's almost a rite of passage. The cause is not MuseTalk itself, but the dependency hell of the mmlab ecosystem (mmcv / mmdet / mmpose) used for face/pose detection.

This article is a practical guide to escaping that hell in one shot with an official-compliant "working combination." It shows the correct install order, crushes common errors by cause, and finally takes you to never doing environment setup again with Docker. It aims for a state where someone whose "spirit was broken by No module named 'mmcv._ext'" can run inference by today.

About the author (reliability disclosure): I self-host and operate in production multiple lip-sync models including MuseTalk. The error remedies in this article aren't a copy from the docs but a record of the mines I stepped on actually rebuilding this environment many times.

30-second summary (conclusion first)

Point	Conclusion
Why it's hard	Because not MuseTalk itself but the mmlab family (mmcv/mmdet/mmpose) dependencies are tightly coupled
The only stable answer	Install the official pinned versions via `mim`. Installing the latest with `pip install mmcv` breaks it
The correct order	system deps → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights
Pinned versions	`mmcv==2.0.1` / `mmdet==3.1.0` / `mmpose==1.1.0` (these three are a set)
Common errors	missing `mmcv._ext` = build mismatch, `CUDA is not available` = CPU torch / driver, `libGL.so.1` = missing libgl1
New GPU	Blackwell (RTX 50xx), etc., sometimes won't run on the official pinned versions. A newer torch + patches are needed (confirm)
Permanent fix	Bake it into Docker. Ensure reproducibility and never burn out again

Why is MuseTalk installation hard

Internally MuseTalk uses dwpose (face/body pose) and face detection/parsing, and those depend on the mmlab (OpenMMLab) library family — mmcv, mmdet, mmpose. These three are strongly coupled in each other's versions, and

mmcv builds C++/CUDA extensions matched to the PyTorch and CUDA versions (mmcv._ext).
mmdet / mmpose accept only a specific mmcv version range.

In other words, it only runs once all five — "torch ↔ cuda ↔ mmcv ↔ mmdet ↔ mmpose" — mesh. Bump even one to the latest and it collapses like dominoes. This is the true identity of "I pip install mmcv and mmdet dies on import."

Conclusion: don't try to resolve versions yourself. Install the fixed combination the official team verified, in the correct order. That's all there is to it.

The golden path: install these versions in this order

This is the procedure compliant with the official (GitHub README). Following the order is 90% of success.

Step 0: system dependencies (Ubuntu family)

# OpenCVが必要とするlibGLと、動画I/Oのffmpeg
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg

Forget to install libgl1 and you'll always trip later on ImportError: libGL.so.1: cannot open shared object file. Install it first.

Step 1: an isolated Python 3.10 environment

conda create -n MuseTalk python==3.10
conda activate MuseTalk

Stick to 3.10. On 3.11/3.12 you can get stuck without finding mmlab-family wheels (the new-GPU exception is below).

Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
  --index-url https://download.pytorch.org/whl/cu117

Omit --index-url here and a CPU build or a different CUDA build of torch gets installed, so torch.cuda.is_available() later becomes False and you fall into "it runs but is painfully slow" or "it doesn't use the GPU." Make CUDA 11.7 explicit.

Step 3: app dependencies

pip install -r requirements.txt

Step 4: install mmlab pinned versions with `mim` (most important)

pip install -U openmim
mim install "mmengine"
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
mim install "mmpose==1.1.0"

Why mim: mim (OpenMMLab Installs More) resolves and installs a prebuilt mmcv wheel matched to your current torch/CUDA. With pip install mmcv, it grabs the latest version or a source build and tends to fail building mmcv._ext. Always install with mim, with versions pinned. Also follow the mmcv → mmdet → mmpose order.

Step 5: fetch the model weights

# 公式スクリプト（Linux）。Windowsは download_weights.bat
sh download_weights.sh

The final tree (main parts):

./models/
├── musetalkV15/   (unet.pth, musetalk.json)   # latest v1.5 main
├── musetalk/      (pytorch_model.bin, ...)     # v1.0
├── sd-vae/        (diffusion_pytorch_model.bin, config.json)
├── whisper/       (pytorch_model.bin, ...)      # audio features
├── dwpose/        (dw-ll_ucoco_384.pth)
├── face-parse-bisent/ (79999_iter.pth, resnet18-5c106cde.pth)
└── syncnet/       (latentsync_syncnet.pt)       # sync evaluation

Step 6: verify operation (always do this)

# ① GPUが見えているか（False なら Step 2 をやり直す）
python -c "import torch; print('cuda:', torch.cuda.is_available())"

# ② mmcvのCUDA拡張が読めるか（ここが通れば山は越えた）
python -c "from mmcv.ops import RoIAlign; print('mmcv._ext OK')"

# ③ デモ推論（v1.5・通常）
sh inference.sh v1.5 normal

If cuda: True and mmcv._ext OK appear, you've broken through dependency hell.

Common-error quick reference (cause → remedy)

The errors you actually get can almost all be explained by this table.

Error / symptom	Cause	Remedy
`No module named 'mmcv._ext'`	mmcv is a build mismatched with the current torch/CUDA	after `pip uninstall mmcv mmcv-full -y`, reinstall with `mim install "mmcv==2.0.1"`
`mmdet` / `mmpose` dies on import	mmcv version mismatch	align the three at pinned versions (2.0.1 / 3.1.0 / 1.1.0). Order too: mmcv→mmdet→mmpose
`torch.cuda.is_available()` is `False`	a CPU torch got installed / driver mismatch	reinstall Step 2 with `--index-url .../cu117`. Check the driver with `nvidia-smi`
`ImportError: libGL.so.1`	the system has no libgl1	`sudo apt-get install -y libgl1 libglib2.0-0`
`ffmpeg: command not found` / video write fails	ffmpeg not installed / path unknown	`apt install ffmpeg`; on Windows pass `--ffmpeg_path`
Inference is abnormally slow (CPU-like despite GPU)	onnxruntime-gpu CPU fallback or torch is CPU	confirm the consistency of onnxruntime-gpu with CUDA/cuDNN. Recheck ① torch CUDA too
`CUDA out of memory`	long clip / big batch / fp32	add `--use_float16`, lower `--batch_size`, segment long clips
Weights not found (`FileNotFoundError`)	download incomplete / wrong path	rerun `download_weights.sh` and check the `models/` tree
`huggingface` DL stops midway	network/auth	rerun (resume), if needed `huggingface-cli login` / use a mirror
Gradio starts but no face is detected	profile/occlusion/multiple faces/low resolution	make the material frontal, single-face. Guard with face detection in preprocessing (pitfalls chapter)

Deep dive: `No module named 'mmcv._ext'` (most common)

mmcv._ext is mmcv's C++/CUDA extension. Its absence = a sign that a build matched to your current torch/CUDA isn't installed. What to do is fixed.

# 中途半端なmmcvを完全に消してから、mimで“今の環境に合う”版を入れ直す
pip uninstall -y mmcv mmcv-full mmcv-lite
mim install "mmcv==2.0.1"
python -c "from mmcv.ops import RoIAlign; print('OK')"

Starting to build from source is a danger sign (you enter the swamp of compilers / the CUDA toolkit). Installing torch correctly first works to let mim find a prebuilt wheel.

Deep dive: `CUDA is not available`

Isolate in this order.

nvidia-smi   # ドライバ/GPUが見えるか。出なければホスト側の問題（ドライバ未導入）
python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"
# 期待: 2.0.1+cu117 / 11.7 / True

If torch.version.cuda is None, a CPU build is installed = redo Step 2 with --index-url. If nvidia-smi doesn't appear, fix the host's NVIDIA driver first (host side if in a container).

When it won't run on a new GPU (Blackwell / RTX 50xx)

The official pinned versions assume CUDA 11.7 / PyTorch 2.0.1. But new GPU architectures (e.g., the Blackwell generation, RTX 50xx, compute capability sm_120, etc.) are not supported by the cu117 build of torch, and you can get an error like CUDA error: no kernel image is available for execution on the device.

In this case, you need a response that departs from the official pinned versions. Per community reports —

Bump to a newer CUDA-supporting PyTorch (a newer cu12x build).
Along with that, adjustments are needed such as moving Python to 3.12, etc. and patching dependencies like mediapipe.

⚠️ A note for accuracy: this is outside the official procedure, and you need to re-resolve compatible versions of mmcv/mmdet/mmpose (bump torch and match mmcv too). The latest correct versions are a moving target, so confirm the official repo's Issues/Discussions as the primary source. In production, the iron rule is to immediately pin the working combination into Docker and never re-resolve it.

Permanent fix: never do this "again" with Docker

Once you reach a working combination, bake it into Docker to ensure reproducibility. This is the only way to escape dependency hell forever.

# 動いた組み合わせを固定（詳細は本番デプロイ記事へ）
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
      python3.10 python3-pip git ffmpeg libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*
RUN pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
      --index-url https://download.pytorch.org/whl/cu117
COPY requirements.txt .
RUN pip3 install -r requirements.txt \
    && pip3 install -U openmim \
    && mim install "mmengine" "mmcv==2.0.1" "mmdet==3.1.0" "mmpose==1.1.0"

The full picture of production deployment, including Docker, GPU serving, autoscaling, and cost optimization, is summarized in MuseTalk production-deployment practice.

Frequently asked questions (FAQ)

Q. Isn't pip install mmcv fine? A. Often not. It grabs the latest version or a source build and fails to build mmcv._ext or becomes inconsistent with mmdet/mmpose. Always install with mim at a pinned version (2.0.1).

Q. I want to install with Python 3.11 / 3.12. A. The official recommendation is 3.10. Newer Python tends to get stuck without finding mmlab wheels. Except when you must bump torch for a new GPU, sticking to 3.10 is safe.

Q. Does it work on Windows too? A. It works. Use download_weights.bat, and pass the ffmpeg distributed binary via --ffmpeg_path. But because mmlab's build situation is smoother on Linux, WSL2 or Docker is recommended.

Q. Can I run it on CPU only? A. It's not impossible but extremely slow (even the official notes about 5 minutes for an 8-second video on an RTX 3050 Ti = fp16). A GPU is the premise for practical use. Always confirm torch.cuda.is_available() is True.

Q. What about macOS (Apple Silicon)? A. Because it assumes CUDA, it's not straightforward. It's realistic to verify on Linux + NVIDIA GPU (including a cloud GPU instance) or use a third-party API (fal.ai, etc.).

Q. So what's the shortest way to try it? A. If you want to skip environment setup, first check the quality with a third-party API, and Dockerize when you reach the stage of operating seriously — that's the shortest route.

Conclusion: order and pinned versions, and Docker

MuseTalk installation is a straight road once you know the tricks.

Follow the order of system deps (libgl1/ffmpeg) → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights.
Pin mmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0 with mim. The latest from pip install mmcv is a mine.
Errors have fixed causes. Crush them with the quick reference (mmcv._ext = build mismatch, CUDA is not available = CPU torch).
New GPUs are outside the official pinned versions. Use Issues as the primary source, and pin as soon as it works.
Bake the working combination into Docker. Reproducibility is the premise of production operation.

Get this far and you'll no longer burn out on environment setup. Next, actually use it — head to the mechanism and tuning in the complete MuseTalk guide.

I operate the environment setup / Dockerization of this article in an actual production GPU pipeline. If you "always get stuck on environment setup" or "want to build a reproducible production environment," see the case study and reach out. With one person × generative AI, I build end-to-end from PoC to production — fast, cheap, and safe.

MuseTalk official: GitHub (README, download_weights, requirements) / Issues (primary source for new-GPU support, etc.)
OpenMMLab: mmcv / mmdet / mmpose (installation via mim is officially recommended)
Usage / tuning: complete MuseTalk guide
Productionization: MuseTalk production-deployment practice (Docker/GPU/autoscaling)
Model selection: AI lip-sync / talking-head model selection guide 2026

Versions, supported GPUs, and dependencies are updated. Especially new-GPU support is fluid, so always confirm the official repo's Issues/README as the primary source. This article's pinned versions (Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1 / mmdet 3.1.0 / mmpose 1.1.0) are official-compliant as of writing.

Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

The goal of this article

30-second summary (conclusion first)

Why is MuseTalk installation hard

The golden path: install these versions in this order

Step 0: system dependencies (Ubuntu family)

Step 1: an isolated Python 3.10 environment

Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)

Step 3: app dependencies

Step 4: install mmlab pinned versions with `mim` (most important)

Step 5: fetch the model weights

Step 6: verify operation (always do this)

Common-error quick reference (cause → remedy)

Deep dive: `No module named 'mmcv._ext'` (most common)

Deep dive: `CUDA is not available`

When it won't run on a new GPU (Blackwell / RTX 50xx)

Permanent fix: never do this "again" with Docker

Frequently asked questions (FAQ)

Conclusion: order and pinned versions, and Docker

AI lip-sync / talking-head model selection guide 2026 — choosing MuseTalk, LatentSync, Wav2Lip, SadTalker by commercial license, quality, speed, and production operation

Building real-time AI-avatar customer service with MuseTalk — production streaming design for ASR→LLM→TTS→lip-sync

MuseTalk Complete Guide: Operating Realtime Lip Sync (Latent-Space Inpainting) in Production, Faithful to Official Sources

MuseTalk Production Deployment in Practice — Docker, GPU Serving, Autoscaling, Cost Optimization, Observability

Also worth reading

Complete UVR5 / audio-separator troubleshooting guide (GPU not used, CUDA, OOM, installation)

A production-quality AI video-localization platform: designing a long GPU pipeline to run to completion 'without crashing, cheaply, and naturally'

Scaling audio source separation in production on AWS: a GPU batch-processing platform (SQS × ECS/Batch × S3)

The goal of this article

30-second summary (conclusion first)

Why is MuseTalk installation hard

The golden path: install these versions in this order

Step 0: system dependencies (Ubuntu family)

Step 1: an isolated Python 3.10 environment

Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)

Step 3: app dependencies

Step 4: install mmlab pinned versions with mim (most important)

Step 5: fetch the model weights

Step 6: verify operation (always do this)

Common-error quick reference (cause → remedy)

Deep dive: No module named 'mmcv._ext' (most common)

Deep dive: CUDA is not available

When it won't run on a new GPU (Blackwell / RTX 50xx)

Permanent fix: never do this "again" with Docker

Frequently asked questions (FAQ)

Conclusion: order and pinned versions, and Docker

Sources / related resources

Related articles

AI lip-sync / talking-head model selection guide 2026 — choosing MuseTalk, LatentSync, Wav2Lip, SadTalker by commercial license, quality, speed, and production operation

Building real-time AI-avatar customer service with MuseTalk — production streaming design for ASR→LLM→TTS→lip-sync

MuseTalk Complete Guide: Operating Realtime Lip Sync (Latent-Space Inpainting) in Production, Faithful to Official Sources

MuseTalk Production Deployment in Practice — Docker, GPU Serving, Autoscaling, Cost Optimization, Observability

Also worth reading

Complete UVR5 / audio-separator troubleshooting guide (GPU not used, CUDA, OOM, installation)

A production-quality AI video-localization platform: designing a long GPU pipeline to run to completion 'without crashing, cheaply, and naturally'

Scaling audio source separation in production on AWS: a GPU batch-processing platform (SQS × ECS/Batch × S3)

Step 4: install mmlab pinned versions with `mim` (most important)

Deep dive: `No module named 'mmcv._ext'` (most common)

Deep dive: `CUDA is not available`