Looking for an internship in 2025! If you enjoy my work and would like to collaborate, please contact me.

MixerMDM: Learnable Composition of Human Motion Diffusion Models

CVPR 2025

1Unversidad de Alicante, Spain2Universitat de Barcelona and Computer Vision Center, Spain3King's College London, UK
MixerMDM

📌 Abstract 📌

Generating human motion guided by conditions such as textual descriptions is challenging due to the need for datasets with pairs of high-quality motion and their corresponding conditions. The difficulty increases when aiming for finer control in the generation. To that end, prior works have proposed to combine several motion diffusion models pre-trained on datasets with different types of conditions, thus allowing control with multiple conditions. However, the proposed merging strategies overlook that the optimal way to combine the generation processes might depend on the particularities of each pre-trained generative model and also the specific textual descriptions. In this context, we introduce MixerMDM, the first learnable model composition technique for combining pre-trained text-conditioned human motion diffusion models. Unlike previous approaches, MixerMDM provides a dynamic mixing strategy that is trained in an adversarial fashion to learn to combine the denoising process of each model depending on the set of conditions driving the generation. By using MixerMDM to combine single- and multi-person motion diffusion models, we achieve fine-grained control on the dynamics of every person individually, and also on the overall interaction. Furthermore, we propose a new evaluation technique that, for the first time in this task, measures the interaction and individual quality by computing the alignment between the mixed generated motions and their conditions as well as the capabilities of MixerMDM to adapt the mixing throughout the denoising process depending on the motions to mix.

💃 Method 💃

MixerMDM creates new motion sequences by seamlessly blending, at each step of the denoising process, motions produced by two pre-trained models. This fusion is driven by the Mixing procedure, guided by a mixing weight dynamically predicted by the Mixer. Trained through a novel Adversarial Training approach, MixerMDM delivers consistent and highly controllable mixed motions that preserve the essential characteristics of the original pre-trained models, outperforming all prior methods.

Mixer

🥇 Results 🥇

Quantitatively, we’ve developed a robust methodology to assess model composition techniques, where MixerMDM outperforms all previous methods in both Alignment andAdaptability. Qualitatively, MixerMDM stands out for its remarkable consistency in producing mixed motions that align with their conditioning. MixerMDM also excels at controllability, by generating finely-grained individual variations to interaction motions. We’ve validated all these claims through an extensive user study.

🕹️ Controllability 🕹️

Text Interaction

Two persons are in a boxing match when suddenly one person throws a kick

Text Individual 1

An individual throws a kick with his right leg

Text Individual 2

An individual is boxing

MixerMDM

Diff.Blending

DualMDM

in2IN

Finetuned

Text Interaction

Two people salute to each other

Text Individual 1

An individual bows forward

Text Individual 2

An individual raises their right arm and waves it

MixerMDM

Diff.Blending

DualMDM

in2IN

Finetuned

🔁 Consistency 🔁

Text Interaction

Two persons are in a boxing match when suddenly one person throws a kick

Text Individual 1

An individual throws a kick with his right leg

Text Individual 2

An individual is boxing

MixerMDM

MixerMDM

MixerMDM

DualMDM

DualMDM

DualMDM

Text Interaction

Two people salute to each other

Text Individual 1

An individual bows forward

Text Individual 2

An individual raises their right arm and waves it

MixerMDM

MixerMDM

MixerMDM

DualMDM

DualMDM

DualMDM

⛓️‍💥 BibTeX ⛓️‍💥

@misc{ruizponce2025mixermdmlearnablecompositionhuman,
      title={MixerMDM: Learnable Composition of Human Motion Diffusion Models}, 
      author={Pablo Ruiz-Ponce and German Barquero and Cristina Palmero and Sergio Escalera and José García-Rodríguez},
      year={2025},
      eprint={2504.01019},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.01019}, 
}