Skip to main content
友田 陽大
Lip-sync & digital humans
MuseTalk
トラブルシューティング
mmcv
CUDA
Python
GPU
環境構築
リップシンク

Complete MuseTalk installation walkthrough — solving the mmcv/mmdet/mmpose dependency hell, CUDA mismatches, new-GPU support, and every common error

Solve in one shot the mmcv/mmdet/mmpose dependency hell everyone gets stuck on in MuseTalk setup, with an official-compliant 'working combination.' It covers the correct install order of Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1, and the cause and remedy for No module named mmcv._ext, CUDA is not available, the missing libGL.so.1, onnxruntime's CPU fallback, and new-GPU (Blackwell) support — plus ensuring reproducibility with Docker.

Published
Reading time
10 min read
Author
友田 陽大
Share

The goal of this article

You try out MuseTalk and melt away days on environment setup before reaching the model's substance — this is beyond a "common experience"; it's almost a rite of passage. The cause is not MuseTalk itself, but the dependency hell of the mmlab ecosystem (mmcv / mmdet / mmpose) used for face/pose detection.

This article is a practical guide to escaping that hell in one shot with an official-compliant "working combination." It shows the correct install order, crushes common errors by cause, and finally takes you to never doing environment setup again with Docker. It aims for a state where someone whose "spirit was broken by No module named 'mmcv._ext'" can run inference by today.

About the author (reliability disclosure): I self-host and operate in production multiple lip-sync models including MuseTalk. The error remedies in this article aren't a copy from the docs but a record of the mines I stepped on actually rebuilding this environment many times.


30-second summary (conclusion first)

PointConclusion
Why it's hardBecause not MuseTalk itself but the mmlab family (mmcv/mmdet/mmpose) dependencies are tightly coupled
The only stable answerInstall the official pinned versions via mim. Installing the latest with pip install mmcv breaks it
The correct ordersystem deps → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights
Pinned versionsmmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0 (these three are a set)
Common errorsmissing mmcv._ext = build mismatch, CUDA is not available = CPU torch / driver, libGL.so.1 = missing libgl1
New GPUBlackwell (RTX 50xx), etc., sometimes won't run on the official pinned versions. A newer torch + patches are needed (confirm)
Permanent fixBake it into Docker. Ensure reproducibility and never burn out again

Why is MuseTalk installation hard

Internally MuseTalk uses dwpose (face/body pose) and face detection/parsing, and those depend on the mmlab (OpenMMLab) library family — mmcv, mmdet, mmpose. These three are strongly coupled in each other's versions, and

  • mmcv builds C++/CUDA extensions matched to the PyTorch and CUDA versions (mmcv._ext).
  • mmdet / mmpose accept only a specific mmcv version range.

In other words, it only runs once all five — "torch ↔ cuda ↔ mmcv ↔ mmdet ↔ mmpose" — mesh. Bump even one to the latest and it collapses like dominoes. This is the true identity of "I pip install mmcv and mmdet dies on import."

Conclusion: don't try to resolve versions yourself. Install the fixed combination the official team verified, in the correct order. That's all there is to it.


The golden path: install these versions in this order

This is the procedure compliant with the official (GitHub README). Following the order is 90% of success.

Step 0: system dependencies (Ubuntu family)

# OpenCVが必要とするlibGLと、動画I/Oのffmpeg
sudo apt-get update
sudo apt-get install -y libgl1 libglib2.0-0 ffmpeg

Forget to install libgl1 and you'll always trip later on ImportError: libGL.so.1: cannot open shared object file. Install it first.

Step 1: an isolated Python 3.10 environment

conda create -n MuseTalk python==3.10
conda activate MuseTalk

Stick to 3.10. On 3.11/3.12 you can get stuck without finding mmlab-family wheels (the new-GPU exception is below).

Step 2: PyTorch 2.0.1 ("explicitly" the CUDA 11.7 build)

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
  --index-url https://download.pytorch.org/whl/cu117

Omit --index-url here and a CPU build or a different CUDA build of torch gets installed, so torch.cuda.is_available() later becomes False and you fall into "it runs but is painfully slow" or "it doesn't use the GPU." Make CUDA 11.7 explicit.

Step 3: app dependencies

pip install -r requirements.txt

Step 4: install mmlab pinned versions with mim (most important)

pip install -U openmim
mim install "mmengine"
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
mim install "mmpose==1.1.0"

Why mim: mim (OpenMMLab Installs More) resolves and installs a prebuilt mmcv wheel matched to your current torch/CUDA. With pip install mmcv, it grabs the latest version or a source build and tends to fail building mmcv._ext. Always install with mim, with versions pinned. Also follow the mmcv → mmdet → mmpose order.

Step 5: fetch the model weights

# 公式スクリプト(Linux)。Windowsは download_weights.bat
sh download_weights.sh

The final tree (main parts):

./models/
├── musetalkV15/   (unet.pth, musetalk.json)   # latest v1.5 main
├── musetalk/      (pytorch_model.bin, ...)     # v1.0
├── sd-vae/        (diffusion_pytorch_model.bin, config.json)
├── whisper/       (pytorch_model.bin, ...)      # audio features
├── dwpose/        (dw-ll_ucoco_384.pth)
├── face-parse-bisent/ (79999_iter.pth, resnet18-5c106cde.pth)
└── syncnet/       (latentsync_syncnet.pt)       # sync evaluation

Step 6: verify operation (always do this)

# ① GPUが見えているか(False なら Step 2 をやり直す)
python -c "import torch; print('cuda:', torch.cuda.is_available())"

# ② mmcvのCUDA拡張が読めるか(ここが通れば山は越えた)
python -c "from mmcv.ops import RoIAlign; print('mmcv._ext OK')"

# ③ デモ推論(v1.5・通常)
sh inference.sh v1.5 normal

If cuda: True and mmcv._ext OK appear, you've broken through dependency hell.


Common-error quick reference (cause → remedy)

The errors you actually get can almost all be explained by this table.

Error / symptomCauseRemedy
No module named 'mmcv._ext'mmcv is a build mismatched with the current torch/CUDAafter pip uninstall mmcv mmcv-full -y, reinstall with mim install "mmcv==2.0.1"
mmdet / mmpose dies on importmmcv version mismatchalign the three at pinned versions (2.0.1 / 3.1.0 / 1.1.0). Order too: mmcv→mmdet→mmpose
torch.cuda.is_available() is Falsea CPU torch got installed / driver mismatchreinstall Step 2 with --index-url .../cu117. Check the driver with nvidia-smi
ImportError: libGL.so.1the system has no libgl1sudo apt-get install -y libgl1 libglib2.0-0
ffmpeg: command not found / video write failsffmpeg not installed / path unknownapt install ffmpeg; on Windows pass --ffmpeg_path
Inference is abnormally slow (CPU-like despite GPU)onnxruntime-gpu CPU fallback or torch is CPUconfirm the consistency of onnxruntime-gpu with CUDA/cuDNN. Recheck ① torch CUDA too
CUDA out of memorylong clip / big batch / fp32add --use_float16, lower --batch_size, segment long clips
Weights not found (FileNotFoundError)download incomplete / wrong pathrerun download_weights.sh and check the models/ tree
huggingface DL stops midwaynetwork/authrerun (resume), if needed huggingface-cli login / use a mirror
Gradio starts but no face is detectedprofile/occlusion/multiple faces/low resolutionmake the material frontal, single-face. Guard with face detection in preprocessing (pitfalls chapter)

Deep dive: No module named 'mmcv._ext' (most common)

mmcv._ext is mmcv's C++/CUDA extension. Its absence = a sign that a build matched to your current torch/CUDA isn't installed. What to do is fixed.

# 中途半端なmmcvを完全に消してから、mimで“今の環境に合う”版を入れ直す
pip uninstall -y mmcv mmcv-full mmcv-lite
mim install "mmcv==2.0.1"
python -c "from mmcv.ops import RoIAlign; print('OK')"

Starting to build from source is a danger sign (you enter the swamp of compilers / the CUDA toolkit). Installing torch correctly first works to let mim find a prebuilt wheel.

Deep dive: CUDA is not available

Isolate in this order.

nvidia-smi   # ドライバ/GPUが見えるか。出なければホスト側の問題(ドライバ未導入)
python -c "import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available())"
# 期待: 2.0.1+cu117 / 11.7 / True

If torch.version.cuda is None, a CPU build is installed = redo Step 2 with --index-url. If nvidia-smi doesn't appear, fix the host's NVIDIA driver first (host side if in a container).


When it won't run on a new GPU (Blackwell / RTX 50xx)

The official pinned versions assume CUDA 11.7 / PyTorch 2.0.1. But new GPU architectures (e.g., the Blackwell generation, RTX 50xx, compute capability sm_120, etc.) are not supported by the cu117 build of torch, and you can get an error like CUDA error: no kernel image is available for execution on the device.

In this case, you need a response that departs from the official pinned versions. Per community reports —

  • Bump to a newer CUDA-supporting PyTorch (a newer cu12x build).
  • Along with that, adjustments are needed such as moving Python to 3.12, etc. and patching dependencies like mediapipe.

⚠️ A note for accuracy: this is outside the official procedure, and you need to re-resolve compatible versions of mmcv/mmdet/mmpose (bump torch and match mmcv too). The latest correct versions are a moving target, so confirm the official repo's Issues/Discussions as the primary source. In production, the iron rule is to immediately pin the working combination into Docker and never re-resolve it.


Permanent fix: never do this "again" with Docker

Once you reach a working combination, bake it into Docker to ensure reproducibility. This is the only way to escape dependency hell forever.

# 動いた組み合わせを固定(詳細は本番デプロイ記事へ)
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
      python3.10 python3-pip git ffmpeg libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*
RUN pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 \
      --index-url https://download.pytorch.org/whl/cu117
COPY requirements.txt .
RUN pip3 install -r requirements.txt \
    && pip3 install -U openmim \
    && mim install "mmengine" "mmcv==2.0.1" "mmdet==3.1.0" "mmpose==1.1.0"

The full picture of production deployment, including Docker, GPU serving, autoscaling, and cost optimization, is summarized in MuseTalk production-deployment practice.


Frequently asked questions (FAQ)

Q. Isn't pip install mmcv fine? A. Often not. It grabs the latest version or a source build and fails to build mmcv._ext or becomes inconsistent with mmdet/mmpose. Always install with mim at a pinned version (2.0.1).

Q. I want to install with Python 3.11 / 3.12. A. The official recommendation is 3.10. Newer Python tends to get stuck without finding mmlab wheels. Except when you must bump torch for a new GPU, sticking to 3.10 is safe.

Q. Does it work on Windows too? A. It works. Use download_weights.bat, and pass the ffmpeg distributed binary via --ffmpeg_path. But because mmlab's build situation is smoother on Linux, WSL2 or Docker is recommended.

Q. Can I run it on CPU only? A. It's not impossible but extremely slow (even the official notes about 5 minutes for an 8-second video on an RTX 3050 Ti = fp16). A GPU is the premise for practical use. Always confirm torch.cuda.is_available() is True.

Q. What about macOS (Apple Silicon)? A. Because it assumes CUDA, it's not straightforward. It's realistic to verify on Linux + NVIDIA GPU (including a cloud GPU instance) or use a third-party API (fal.ai, etc.).

Q. So what's the shortest way to try it? A. If you want to skip environment setup, first check the quality with a third-party API, and Dockerize when you reach the stage of operating seriously — that's the shortest route.


Conclusion: order and pinned versions, and Docker

MuseTalk installation is a straight road once you know the tricks.

  1. Follow the order of system deps (libgl1/ffmpeg) → Python 3.10 → PyTorch 2.0.1 (cu117) → requirements → mim install → fetch weights.
  2. Pin mmcv==2.0.1 / mmdet==3.1.0 / mmpose==1.1.0 with mim. The latest from pip install mmcv is a mine.
  3. Errors have fixed causes. Crush them with the quick reference (mmcv._ext = build mismatch, CUDA is not available = CPU torch).
  4. New GPUs are outside the official pinned versions. Use Issues as the primary source, and pin as soon as it works.
  5. Bake the working combination into Docker. Reproducibility is the premise of production operation.

Get this far and you'll no longer burn out on environment setup. Next, actually use it — head to the mechanism and tuning in the complete MuseTalk guide.

I operate the environment setup / Dockerization of this article in an actual production GPU pipeline. If you "always get stuck on environment setup" or "want to build a reproducible production environment," see the case study and reach out. With one person × generative AI, I build end-to-end from PoC to production — fast, cheap, and safe.


  • Versions, supported GPUs, and dependencies are updated. Especially new-GPU support is fluid, so always confirm the official repo's Issues/README as the primary source. This article's pinned versions (Python 3.10 / PyTorch 2.0.1 / CUDA 11.7 / mmcv 2.0.1 / mmdet 3.1.0 / mmpose 1.1.0) are official-compliant as of writing.
友田

友田 陽大

Developer of a METI Minister's Award–winning product. With TypeScript + Python + AWS, I deliver SaaS, industry DX, and production-grade generative AI (RAG) end to end — from requirements to infrastructure and operations — single-handedly.

Got a challenge?

From design to implementation and operations — solo × generative AI

Implementation like this article's, end to end from requirements to production. Start with a free 30-minute technical consult and tell me about your situation.

Available for both project-based (contract) and advisory engagements. Start with a free 30-minute consult.

Also worth reading