MAMMA: Markerless Accurate Multi-person Motion Acquisition

Hanz Cuevas Velasquez1*, Anastasios Yiannakidis1*, Soyong Shin2, Giorgio Becherini1, Markus Höschle1, Joachim Tesch1, Taylor Obersat1, Tsvetelina Alexiadis1, Eni Halilaj2, Michael J. Black1

1Max Planck Institute for Intelligent Systems, Tübingen 2Carnegie Mellon University

*Equal contribution

[CVPR 2026 Oral] | Paper | arXiv | Project Page | Datasets

</div>

News

[2026-06] 🎉 MAMMA being presented at CVPR 2026
[2026-06] Code released (inference + training)

Install

bash
git clone https://github.com/cuevhv/mamma.git
cd mamma

Full env + CUDA + weights setup: docs/INSTALL.md.

bash
micromamba activate mamma          # or: conda activate mamma
python -m inference doctor         # verify env vars + weight paths

The pipeline is zero-config when weights live under data/.

Quick demo

Bundled 4-cam example, ~56 MB:

bash
bash data/download_example.sh                                       # fetches videos to data/mamma_example/

bash
python -m inference run \
  --cfg      configs/examples/presets/quick.yaml \
  --footage  data/mamma_example \
  --seq_name pushing_and_lifting_from_ground \
  --calib    configs/examples/calib/iphones_outdoors.yaml \
  --out-tag  demo -v

Outputs land under output/ma_*/demo/mamma_example/….

Prefer a browser UI? Run bash gui/scripts/dev.sh, open http://localhost:3000, and click Run demo. It's the same pipeline but friendlier UX!

Pipeline

ma_cap → ma_masks → ma_2d → ma_3d → ma_vis

Step	What it does
`ma_cap`	Loads multi-view capture
`ma_masks`	Per-person segmentation (SAM + YOLO)
`ma_2d`	2D landmark detection (MammaNet)
`ma_3d`	Multi-view SMPL-X optimization
`ma_vis`	Per-camera overlays + interactive scene

Entry point: python -m inference run (source: inference/cli/run.py).

Argument	What it is
`--cfg` / `--preset`	Pipeline-configuration YAML — declares which steps run and their hyperparameters. Capture-independent. (what a preset is + how to modify one)
`--footage`	Dataset root containing sequence subdirs (use with `--seq_name` + `--calib`). (layout reference)
`--seq_name`	One sequence subdirectory name under `--footage` to process (one run = one sequence).
`--calib`	Calibration file (`.yaml` / `.xcp` / OpenCV `.json`); applies to every sequence under `--footage`. (format reference)
`--capture`	Advanced: capture JSON pointing at footage, calibration, sequences, and camera names — used to iterate over many sequences in one invocation. (schema reference)
`--out-tag`	Output sub-directory tag under `output/ma_*/<tag>/` (default: `local`).
`-v`	Verbose runner logs.

Run the pipeline

Three things are needed:

A calibration file (how to make one)
A folder with your sequence (how to set it up)
A preset — use a shipped one: configs/examples/presets/quick.yaml (~5 min smoke) or configs/examples/presets/full.yaml (full-frame). See docs/CONFIGS.md to modify or author your own.

Then:

bash
python -m inference run \
  --cfg      <path/to/preset>.yaml \
  --footage  <path/to/footage> \
  --seq_name <seq_name> \
  --calib    <path/to/calib>.yaml \
  --out-tag  run01 -v

Alternative — iterate over many sequences in one invocation. A capture JSON enumerates sequences, cameras, and the calibration in one file; the runner walks them automatically:

bash
python -m inference run \
  --cfg     <path/to/preset>.yaml \
  --capture <path/to/capture>.json \
  --out-tag run01 -v

GUI

Browser UI for submitting and inspecting runs. It uses the same mamma python env.

bash
gui/scripts/dev.sh        # dev: Flask :8000 + Vite :3000 (auto-reload)
gui/scripts/prod.sh       # prod: single Flask process on :8000

Setup and deployment: gui/README.md.

MAMMA datasets

The paper's released captures, evaluation data, and synthetic training data live on the MAMMA project page and require a free account.

Register at https://mamma.is.tue.mpg.de/ and confirm your email.
Either use the GUI's Pipeline assets panel (sign in once, click to download), or run the per-dataset shell scripts under data/:
```
bash
bash data/download_mamma_dance.sh --bachata --meta --pred --videos_crf24
```

Five dataset families ship: dance, multi-person, iPhone, eval, and synthetic. Per-dataset sizes, video encodings, and the full script flag surface live in docs/DATASETS.md.

Just running on your own footage? You don't need any of this — see Run the pipeline above.

Layout

.
├── inference/       runner, step builders, doctor CLI
├── capture/         ma_cap step
├── segmentation/    ma_masks step
├── landmarks/       ma_2d step
├── optimization/    ma_3d step
├── visualization/   ma_vis step
├── configs/         presets + capture manifests
├── data/            body models + weights + datasets (gitignored)
├── output/          run outputs (gitignored)
├── gui/             browser UI (Flask + React)
└── scripts/         smoke tests + utilities

TODO

Release the evaluation scripts (2D landmark + benchmark evaluation) and the processed evaluation datasets.

Citation

@inproceedings{cuevas2026mamma,
  title     = {{MAMMA}: {Markerless Accurate Multi-person Motion Acquisition}},
  author    = {Cuevas Velasquez, Hanz and Yiannakidis, Anastasios and Shin, Soyong and Becherini, Giorgio and H{\"o}schle, Markus and Tesch, Joachim and Obersat, Taylor and Alexiadis, Tsvetelina and Halilaj, Eni and Black, Michael J.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}