The goal of this article
You try out MuseTalk and melt away days on environment setup before reaching the model's substance — this is beyond a "common experience"; it's almost a rite of passage. The cause is not MuseTalk itself, but the dependency hell of the mmlab ecosystem (mmcv / mmdet / mmpose) used for face/pose detection.
This article is a practical guide to escaping that hell in one shot with an official-compliant "working combination." It shows the correct install order, crushes common errors by cause, and finally takes you to never doing environment setup again with Docker. It aims for a state where someone whose "spirit was broken by No module named 'mmcv._ext'" can run inference by today.
About the author (reliability disclosure): I self-host and operate in production multiple lip-sync models including MuseTalk. The error remedies in this article aren't a copy from the docs but a record of the mines I stepped on actually rebuilding this environment many times.
30-second summary (conclusion first)
| Point | Conclusion |
|---|---|
| Why it's hard | Because not MuseTalk itself but the mmlab family (mmcv/mmdet/mmpose) dependencies are tightly coupled |
| The only stable answer | Install the official pinned versions via mim. Installing the latest with pip install mmcv breaks it |
| The correct order | system deps → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights |
| Pinned versions | mmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0 (these three are a set) |
| Common errors | missing mmcv._ext = build mismatch, CUDA is not available = CPU torch / driver, libGL.so.1 = missing libgl1 |
| New GPU | Blackwell (RTX 50xx), etc., sometimes won't run on the official pinned versions. A newer torch + patches are needed (confirm) |
| Permanent fix | Bake it into Docker. Ensure reproducibility and never burn out again |
Why is MuseTalk installation hard
Internally MuseTalk uses dwpose (face/body pose) and face detection/parsing, and those depend on the mmlab (OpenMMLab) library family — mmcv, mmdet, mmpose. These three are strongly coupled in each other's versions, and
mmcvbuilds C++/CUDA extensions matched to the PyTorch and CUDA versions (mmcv._ext).mmdet/mmposeaccept only a specificmmcvversion range.
In other words, it only runs once all five — "torch ↔ cuda ↔ mmcv ↔ mmdet ↔ mmpose" — mesh. Bump even one to the latest and it collapses like dominoes. This is the true identity of "I pip install mmcv and mmdet dies on import."
Conclusion: don't try to resolve versions yourself. Install the fixed combination the official team verified, in the correct order. That's all there is to it.
The golden path: install these versions in this order
This is the procedure compliant with the official (GitHub README). Following the order is 90% of success.
Step 0: system dependencies (Ubuntu family)
# OpenCVが必要とするlibGLと、動画I/Oのffmpeg
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg
Forget to install
libgl1and you'll always trip later onImportError: libGL.so.1: cannot open shared object file. Install it first.
Step 1: an isolated Python 3.10 environment
conda create -n MuseTalk python==3.10
conda activate MuseTalk
Stick to 3.10. On 3.11/3.12 you can get stuck without finding mmlab-family wheels (the new-GPU exception is below).
Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
--index-url https://download.pytorch.org/whl/cu117
Omit
--index-urlhere and a CPU build or a different CUDA build of torch gets installed, sotorch.cuda.is_available()later becomesFalseand you fall into "it runs but is painfully slow" or "it doesn't use the GPU." Make CUDA 11.7 explicit.
Step 3: app dependencies
pip install -r requirements.txt
Step 4: install mmlab pinned versions with mim (most important)
pip install -U openmim
mim install "mmengine"
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
mim install "mmpose==1.1.0"
Why
mim:mim(OpenMMLab Installs More) resolves and installs a prebuiltmmcvwheel matched to your current torch/CUDA. Withpip install mmcv, it grabs the latest version or a source build and tends to fail buildingmmcv._ext. Always install withmim, with versions pinned. Also follow themmcv → mmdet → mmposeorder.
Step 5: fetch the model weights
# 公式スクリプト(Linux)。Windowsは download_weights.bat
sh download_weights.sh
The final tree (main parts):
./models/
├── musetalkV15/ (unet.pth, musetalk.json) # latest v1.5 main
├── musetalk/ (pytorch_model.bin, ...) # v1.0
├── sd-vae/ (diffusion_pytorch_model.bin, config.json)
├── whisper/ (pytorch_model.bin, ...) # audio features
├── dwpose/ (dw-ll_ucoco_384.pth)
├── face-parse-bisent/ (79999_iter.pth, resnet18-5c106cde.pth)
└── syncnet/ (latentsync_syncnet.pt) # sync evaluation
Step 6: verify operation (always do this)
# ① GPUが見えているか(False なら Step 2 をやり直す)
python -c "import torch; print('cuda:', torch.cuda.is_available())"
# ② mmcvのCUDA拡張が読めるか(ここが通れば山は越えた)
python -c "from mmcv.ops import RoIAlign; print('mmcv._ext OK')"
# ③ デモ推論(v1.5・通常)
sh inference.sh v1.5 normal
If cuda: True and mmcv._ext OK appear, you've broken through dependency hell.
Common-error quick reference (cause → remedy)
The errors you actually get can almost all be explained by this table.
| Error / symptom | Cause | Remedy |
|---|---|---|
No module named 'mmcv._ext' | mmcv is a build mismatched with the current torch/CUDA | after pip uninstall mmcv mmcv-full -y, reinstall with mim install "mmcv==2.0.1" |
mmdet / mmpose dies on import | mmcv version mismatch | align the three at pinned versions (2.0.1 / 3.1.0 / 1.1.0). Order too: mmcv→mmdet→mmpose |
torch.cuda.is_available() is False | a CPU torch got installed / driver mismatch | reinstall Step 2 with --index-url .../cu117. Check the driver with nvidia-smi |
ImportError: libGL.so.1 | the system has no libgl1 | sudo apt-get install -y libgl1 libglib2.0-0 |
ffmpeg: command not found / video write fails | ffmpeg not installed / path unknown | apt install ffmpeg; on Windows pass --ffmpeg_path |
| Inference is abnormally slow (CPU-like despite GPU) | onnxruntime-gpu CPU fallback or torch is CPU | confirm the consistency of onnxruntime-gpu with CUDA/cuDNN. Recheck ① torch CUDA too |
CUDA out of memory | long clip / big batch / fp32 | add --use_float16, lower --batch_size, segment long clips |
Weights not found (FileNotFoundError) | download incomplete / wrong path | rerun download_weights.sh and check the models/ tree |
huggingface DL stops midway | network/auth | rerun (resume), if needed huggingface-cli login / use a mirror |
| Gradio starts but no face is detected | profile/occlusion/multiple faces/low resolution | make the material frontal, single-face. Guard with face detection in preprocessing (pitfalls chapter) |
Deep dive: No module named 'mmcv._ext' (most common)
mmcv._ext is mmcv's C++/CUDA extension. Its absence = a sign that a build matched to your current torch/CUDA isn't installed. What to do is fixed.
# 中途半端なmmcvを完全に消してから、mimで“今の環境に合う”版を入れ直す
pip uninstall -y mmcv mmcv-full mmcv-lite
mim install "mmcv==2.0.1"
python -c "from mmcv.ops import RoIAlign; print('OK')"
Starting to build from source is a danger sign (you enter the swamp of compilers / the CUDA toolkit). Installing torch correctly first works to let mim find a prebuilt wheel.
Deep dive: CUDA is not available
Isolate in this order.
nvidia-smi # ドライバ/GPUが見えるか。出なければホスト側の問題(ドライバ未導入)
python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"
# 期待: 2.0.1+cu117 / 11.7 / True
If torch.version.cuda is None, a CPU build is installed = redo Step 2 with --index-url. If nvidia-smi doesn't appear, fix the host's NVIDIA driver first (host side if in a container).
When it won't run on a new GPU (Blackwell / RTX 50xx)
The official pinned versions assume CUDA 11.7 / PyTorch 2.0.1. But new GPU architectures (e.g., the Blackwell generation, RTX 50xx, compute capability sm_120, etc.) are not supported by the cu117 build of torch, and you can get an error like CUDA error: no kernel image is available for execution on the device.
In this case, you need a response that departs from the official pinned versions. Per community reports —
- Bump to a newer CUDA-supporting PyTorch (a newer cu12x build).
- Along with that, adjustments are needed such as moving Python to 3.12, etc. and patching dependencies like mediapipe.
⚠️ A note for accuracy: this is outside the official procedure, and you need to re-resolve compatible versions of mmcv/mmdet/mmpose (bump torch and match mmcv too). The latest correct versions are a moving target, so confirm the official repo's Issues/Discussions as the primary source. In production, the iron rule is to immediately pin the working combination into Docker and never re-resolve it.
Permanent fix: never do this "again" with Docker
Once you reach a working combination, bake it into Docker to ensure reproducibility. This is the only way to escape dependency hell forever.
# 動いた組み合わせを固定(詳細は本番デプロイ記事へ)
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.10 python3-pip git ffmpeg libgl1 libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
--index-url https://download.pytorch.org/whl/cu117
COPY requirements.txt .
RUN pip3 install -r requirements.txt \
&& pip3 install -U openmim \
&& mim install "mmengine" "mmcv==2.0.1" "mmdet==3.1.0" "mmpose==1.1.0"
The full picture of production deployment, including Docker, GPU serving, autoscaling, and cost optimization, is summarized in MuseTalk production-deployment practice.
Frequently asked questions (FAQ)
Q. Isn't pip install mmcv fine?
A. Often not. It grabs the latest version or a source build and fails to build mmcv._ext or becomes inconsistent with mmdet/mmpose. Always install with mim at a pinned version (2.0.1).
Q. I want to install with Python 3.11 / 3.12. A. The official recommendation is 3.10. Newer Python tends to get stuck without finding mmlab wheels. Except when you must bump torch for a new GPU, sticking to 3.10 is safe.
Q. Does it work on Windows too?
A. It works. Use download_weights.bat, and pass the ffmpeg distributed binary via --ffmpeg_path. But because mmlab's build situation is smoother on Linux, WSL2 or Docker is recommended.
Q. Can I run it on CPU only?
A. It's not impossible but extremely slow (even the official notes about 5 minutes for an 8-second video on an RTX 3050 Ti = fp16). A GPU is the premise for practical use. Always confirm torch.cuda.is_available() is True.
Q. What about macOS (Apple Silicon)? A. Because it assumes CUDA, it's not straightforward. It's realistic to verify on Linux + NVIDIA GPU (including a cloud GPU instance) or use a third-party API (fal.ai, etc.).
Q. So what's the shortest way to try it? A. If you want to skip environment setup, first check the quality with a third-party API, and Dockerize when you reach the stage of operating seriously — that's the shortest route.
Conclusion: order and pinned versions, and Docker
MuseTalk installation is a straight road once you know the tricks.
- Follow the order of system deps (libgl1/ffmpeg) → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights.
- Pin mmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0 with
mim. The latest frompip install mmcvis a mine. - Errors have fixed causes. Crush them with the quick reference (
mmcv._ext= build mismatch,CUDA is not available= CPU torch). - New GPUs are outside the official pinned versions. Use Issues as the primary source, and pin as soon as it works.
- Bake the working combination into Docker. Reproducibility is the premise of production operation.
Get this far and you'll no longer burn out on environment setup. Next, actually use it — head to the mechanism and tuning in the complete MuseTalk guide.
I operate the environment setup / Dockerization of this article in an actual production GPU pipeline. If you "always get stuck on environment setup" or "want to build a reproducible production environment," see the case study and reach out. With one person × generative AI, I build end-to-end from PoC to production — fast, cheap, and safe.
Sources / related resources
- MuseTalk official: GitHub (README, download_weights, requirements) / Issues (primary source for new-GPU support, etc.)
- OpenMMLab: mmcv / mmdet / mmpose (installation via
mimis officially recommended) - Usage / tuning: complete MuseTalk guide
- Productionization: MuseTalk production-deployment practice (Docker/GPU/autoscaling)
- Model selection: AI lip-sync / talking-head model selection guide 2026
- Versions, supported GPUs, and dependencies are updated. Especially new-GPU support is fluid, so always confirm the official repo's Issues/README as the primary source. This article's pinned versions (Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1 / mmdet 3.1.0 / mmpose 1.1.0) are official-compliant as of writing.