The Great
March 100
We don't aim to create just another benchmark; instead, GM-100 serves as a foundational task list for evaluating embodied AI systems in real-world settings.

Comprehensive
Task Design Pipeline
"GM-100 comprises 100+ carefully curated tasks derived from a rigorous analysis of human-object interaction primitives and object affordances. This design philosophy ensures coverage of diverse, long-tail, and rare behaviors that critically test the generalization limits of robotic agents. "
Diverse Skills
Challenging Cases
Feasible to Execute
Multi-camera
For each task, we provide over 130 expert teleoperation demonstrations.
We also showcase benchmark results from at least 3 baseline models, accompanied by evaluation videos to help assess task complexity and scoring standards.
This website is currently in a preview phase. Some data may be missing, but we are working to complete it in the coming days. Thank you for your understanding.
Recent UpdatesNEW
Added language prompts for each task in the dataset to facilitate easier understanding and usage.
Task Showcase
Explore the 100 diverse manipulation tasks included in the benchmark. Click on any card to view detailed performance metrics and download data.
Interactive Leaderboard
Compare model performance across benchmark tasks. Hover for details, click to explore.