24/7 Space News
ROBO SPACE
A faster, better way to train general-purpose robots
illustration only
A faster, better way to train general-purpose robots
by Adam Zewe | MIT News
Boston MA (SPX) Oct 29, 2024

In the classic cartoon "The Jetsons," Rosie the robotic maid seamlessly switches from vacuuming the house to cooking dinner to taking out the trash. But in real life, training a general-purpose robot remains a major challenge.

Typically, engineers collect data that are specific to a certain robot and task, which they use to train the robot in a controlled environment. However, gathering these data is costly and time-consuming, and the robot will likely struggle to adapt to environments or tasks it hasn't seen before.

To train better general-purpose robots, MIT researchers developed a versatile technique that combines a huge amount of heterogeneous data from many of sources into one system that can teach any robot a wide range of tasks.

Their method involves aligning data from varied domains, like simulations and real robots, and multiple modalities, including vision sensors and robotic arm position encoders, into a shared "language" that a generative AI model can process.

By combining such an enormous amount of data, this approach can be used to train a robot to perform a variety of tasks without the need to start training it from scratch each time.

This method could be faster and less expensive than traditional techniques because it requires far fewer task-specific data. In addition, it outperformed training from scratch by more than 20 percent in simulation and real-world experiments.

"In robotics, people often claim that we don't have enough training data. But in my view, another big problem is that the data come from so many different domains, modalities, and robot hardware. Our work shows how you'd be able to train a robot with all of them put together," says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

Wang's co-authors include fellow EECS graduate student Jialiang Zhao; Xinlei Chen, a research scientist at Meta; and senior author Kaiming He, an associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Conference on Neural Information Processing Systems.

Inspired by LLMs
A robotic "policy" takes in sensor observations, like camera images or proprioceptive measurements that track the speed and position a robotic arm, and then tells a robot how and where to move.

Policies are typically trained using imitation learning, meaning a human demonstrates actions or teleoperates a robot to generate data, which are fed into an AI model that learns the policy. Because this method uses a small amount of task-specific data, robots often fail when their environment or task changes.

To develop a better approach, Wang and his collaborators drew inspiration from large language models like GPT-4.

These models are pretrained using an enormous amount of diverse language data and then fine-tuned by feeding them a small amount of task-specific data. Pretraining on so much data helps the models adapt to perform well on a variety of tasks.

"In the language domain, the data are all just sentences. In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture," he says.

Robotic data take many forms, from camera images to language instructions to depth maps. At the same time, each robot is mechanically unique, with a different number and orientation of arms, grippers, and sensors. Plus, the environments where data are collected vary widely.

The MIT researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from these varied modalities and domains.

They put a machine-learning model known as a transformer into the middle of their architecture, which processes vision and proprioception inputs. A transformer is the same type of model that forms the backbone of large language models.

The researchers align data from vision and proprioception into the same type of input, called a token, which the transformer can process. Each input is represented with the same fixed number of tokens.

Then the transformer maps all inputs into one shared space, growing into a huge, pretrained model as it processes and learns from more data. The larger the transformer becomes, the better it will perform.

A user only needs to feed HPT a small amount of data on their robot's design, setup, and the task they want it to perform. Then HPT transfers the knowledge the transformer grained during pretraining to learn the new task.

Enabling dexterous motions
One of the biggest challenges of developing HPT was building the massive dataset to pretrain the transformer, which included 52 datasets with more than 200,000 robot trajectories in four categories, including human demo videos and simulation.

The researchers also needed to develop an efficient way to turn raw proprioception signals from an array of sensors into data the transformer could handle.

"Proprioception is key to enable a lot of dexterous motions. Because the number of tokens is in our architecture always the same, we place the same importance on proprioception and vision," Wang explains.

When they tested HPT, it improved robot performance by more than 20 percent on simulation and real-world tasks, compared with training from scratch each time. Even when the task was very different from the pretraining data, HPT still improved performance.

"This paper provides a novel approach to training a single policy across multiple robot embodiments. This enables training across diverse datasets, enabling robot learning methods to significantly scale up the size of datasets that they can train on. It also allows the model to quickly adapt to new robot embodiments, which is important as new robot designs are continuously being produced," says David Held, associate professor at the Carnegie Mellon University Robotics Institute, who was not involved with this work.

In the future, the researchers want to study how data diversity could boost the performance of HPT. They also want to enhance HPT so it can process unlabeled data like GPT-4 and other large language models.

"Our dream is to have a universal robot brain that you could download and use for your robot without any training at all. While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models," he says.

Research Report:Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

Related Links
Computer Science and Artificial Intelligence Laboratory
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
CrewMate vehicle shows potential in aiding solar field workers with PV panel installations
Los Angeles CA (SPX) Oct 28, 2024
CrewMate, a semi-autonomous lift-assist vehicle created by Moog Construction, has completed a successful field trial near Niagara Falls, N.Y., showcasing its ability to support solar field workers in installing photovoltaic (PV) panels. Buffalo-based Montante Solar, specializing in the design and construction of solar projects, coordinated this trial with Moog Construction on a solar site located on a former landfill. The site's rugged terrain, with slopes up to 10 percent, provided a solid test for Cre ... read more

ROBO SPACE
Samsonite's Proxis Suitcase reaches new heights with space launch

Astronauts return to Earth after seven months of research on ISS

NASA astronaut released from hospital after return from ISS

Chinese company to sell tickets for space tourism flights in 2027

ROBO SPACE
Kremlin denies report of Musk-Putin secret talks

SpaceX sends 22 Starlink satellites into orbit in record-setting launch

NASA Administrator says Musk, Putin contacts 'concerning' as Kremlin denies WSJ report

NASA Stennis expands range operations with new Skydweller Aero agreement

ROBO SPACE
Red Rocks with Green Spots at 'Serpentine Rapids'

NASA selects crew for 45-day simulated Mars mission in Houston

Potential microbial habitats in Martian ice

Perseverance just keeps roving across Mars

ROBO SPACE
China delivers scientific payloads from reusable satellite Shijian-19 to users

China to launch 14th manned mission to Tiangong Space Station

China sets ambitious space science development goals through 2050

China successfully retrieves first reusable test satellite Shijian-19

ROBO SPACE
Hawkeye 360 enhances global monitoring with Clusters 9 and 10 now in opeation

Boeing exploring sale of space business: report

Eutelsat America and OneWeb to provide Enhanced Satellite Services for US Govt

ST Engineering iDirect selected for second phase of Indonesia's Satria-1 satellite expansion

ROBO SPACE
Amazon results beat expectations, powered by cloud

New 3D printed metal alloy enhances durability for space exploration

Advances in 3D-printed concrete boost strength, durability, and eco-friendly potential

Countdown to Busan: is a plastic pollution treaty in reach?

ROBO SPACE
SwRI and JPL study reveals liquid brine flows on airless worlds

It's twins mystery of famed brown dwarf solved

Astronomers Use New Technique to Search for Alien Signals Between Planets

Using AI to find the smallest and closest exoplanets around sun-like stars

ROBO SPACE
NASA and SpaceX Set for Europa Clipper Launch on October 14

NASA probe Europa Clipper lifts off for Jupiter's icy moon

Is life possible on a Jupiter moon? NASA goes to investigate

NASA launches probe to study if life possible on icy Jupiter moon

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.