Stay Tuned for Updates

The Great
March 100

By RHOS AI Team

We don't aim to create just another benchmark; instead, GM-100 serves as a foundational task list for evaluating embodied AI systems in real-world settings.

Pipeline Diagram

Comprehensive
Task Design Pipeline

"GM-100 comprises 100+ carefully curated tasks derived from a rigorous analysis of human-object interaction primitives and object affordances. This design philosophy ensures coverage of diverse, long-tail, and rare behaviors that critically test the generalization limits of robotic agents. "

100+Tasks

Diverse Skills

HighDifficulty

Challenging Cases

3+Baselines

Feasible to Execute

3+Views

Multi-camera

For each task, we provide over 130 expert teleoperation demonstrations.

We also showcase benchmark results from at least 3 baseline models, accompanied by evaluation videos to help assess task complexity and scoring standards.

This website is currently in a preview phase. Some data may be missing, but we are working to complete it in the coming days. Thank you for your understanding.

Recent UpdatesNEW

2026.02.03Task Object Purchase Links Added

A comprehensive list of purchase links for the physical objects used in each task has been added. You can access it via the 'Task Object Purchase Links' button. We will continue to update and maintain this list to ensure accessibility.

2026.01.30Agilex Cobot Magic Dataset

Dataset(Preview version, Lerobot2.1 Format, Trainset and Testset all in one) has been fully uploaded. Welcome to download and use! A few task prompts may contain minor issues, which we will address in the upcoming full release.

Task Showcase

Explore the 100 diverse manipulation tasks included in the benchmark. Click on any card to view detailed performance metrics and download data.

Task Object Purchase Links (Google Sheets)
00:15
#001

Hitting Ball into Goal

TrainTest
00:15
#002

Slicing the Object

TrainTest
00:15
#003

Stamping with Seal

TrainTest
00:15
#004

Box Building

TrainTest
00:15
#005

Closing Desktop Drawer

TrainTest
00:15
#006

Threading Hawthorn Skewers

TrainTest
00:15
#007

Trash Disposal

TrainTest
00:15
#008

Transferring Test Tubes

TrainTest
00:15
#009

Turning on Desk Lamp

TrainTest
00:15
#010

Sorting Cubes by Size

TrainTest
00:15
#011

Wiping Whiteboard

TrainTest
00:15
#012

Hammering Button

TrainTest

Interactive Leaderboard

Compare model performance across benchmark tasks. Hover for details, click to explore.

Robot Arm
Evaluation Metric
Select Models to Compare
Please select at least one model to view results.
0%
100%
Performance Scale