24/7 Space News
ROBO SPACE
New method uses crowdsourced feedback to help train robots
Human Guided Exploration (HuGE) enables AI agents to learn quickly with some help from humans, even if the humans make mistakes.
New method uses crowdsourced feedback to help train robots
by Adam Zewe for MIT News
Boston MA (SPX) Nov 28, 2023

To teach an AI agent a new task, like how to open a kitchen cabinet, researchers often use reinforcement learning - a trial-and-error process where the agent is rewarded for taking actions that get it closer to the goal.

In many instances, a human expert must carefully design a reward function, which is an incentive mechanism that gives the agent motivation to explore. The human expert must iteratively update that reward function as the agent explores and tries different actions. This can be time-consuming, inefficient, and difficult to scale up, especially when the task is complex and involves many steps.

Researchers from MIT, Harvard University, and the University of Washington have developed a new reinforcement learning approach that doesn't rely on an expertly designed reward function. Instead, it leverages crowdsourced feedback, gathered from many nonexpert users, to guide the agent as it learns to reach its goal.

While some other methods also attempt to utilize nonexpert feedback, this new approach enables the AI agent to learn more quickly, despite the fact that data crowdsourced from users are often full of errors. These noisy data might cause other methods to fail.

In addition, this new approach allows feedback to be gathered asynchronously, so nonexpert users around the world can contribute to teaching the agent.

"One of the most time-consuming and challenging parts in designing a robotic agent today is engineering the reward function. Today reward functions are designed by expert researchers - a paradigm that is not scalable if we want to teach our robots many different tasks. Our work proposes a way to scale robot learning by crowdsourcing the design of reward function and by making it possible for nonexperts to provide useful feedback," says Pulkit Agrawal, an assistant professor in the MIT Department of Electrical Engineering and Computer Science (EECS) who leads the Improbable AI Lab in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

In the future, this method could help a robot learn to perform specific tasks in a user's home quickly, without the owner needing to show the robot physical examples of each task. The robot could explore on its own, with crowdsourced nonexpert feedback guiding its exploration.

"In our method, the reward function guides the agent to what it should explore, instead of telling it exactly what it should do to complete the task. So, even if the human supervision is somewhat inaccurate and noisy, the agent is still able to explore, which helps it learn much better," explains lead author Marcel Torne '23, a research assistant in the Improbable AI Lab.

Torne is joined on the paper by his MIT advisor, Agrawal; senior author Abhishek Gupta, assistant professor at the University of Washington; as well as others at the University of Washington and MIT. The research will be presented at the Conference on Neural Information Processing Systems next month.

Noisy feedback
One way to gather user feedback for reinforcement learning is to show a user two photos of states achieved by the agent, and then ask that user which state is closer to a goal. For instance, perhaps a robot's goal is to open a kitchen cabinet. One image might show that the robot opened the cabinet, while the second might show that it opened the microwave. A user would pick the photo of the "better" state.

Some previous approaches try to use this crowdsourced, binary feedback to optimize a reward function that the agent would use to learn the task. However, because nonexperts are likely to make mistakes, the reward function can become very noisy, so the agent might get stuck and never reach its goal.

"Basically, the agent would take the reward function too seriously. It would try to match the reward function perfectly. So, instead of directly optimizing over the reward function, we just use it to tell the robot which areas it should be exploring," Torne says.

He and his collaborators decoupled the process into two separate parts, each directed by its own algorithm. They call their new reinforcement learning method HuGE (Human Guided Exploration).

On one side, a goal selector algorithm is continuously updated with crowdsourced human feedback. The feedback is not used as a reward function, but rather to guide the agent's exploration. In a sense, the nonexpert users drop breadcrumbs that incrementally lead the agent toward its goal.

On the other side, the agent explores on its own, in a self-supervised manner guided by the goal selector. It collects images or videos of actions that it tries, which are then sent to humans and used to update the goal selector.

This narrows down the area for the agent to explore, leading it to more promising areas that are closer to its goal. But if there is no feedback, or if feedback takes a while to arrive, the agent will keep learning on its own, albeit in a slower manner. This enables feedback to be gathered infrequently and asynchronously.

"The exploration loop can keep going autonomously, because it is just going to explore and learn new things. And then when you get some better signal, it is going to explore in more concrete ways. You can just keep them turning at their own pace," adds Torne.

And because the feedback is just gently guiding the agent's behavior, it will eventually learn to complete the task even if users provide incorrect answers.

Faster learning
The researchers tested this method on a number of simulated and real-world tasks. In simulation, they used HuGE to effectively learn tasks with long sequences of actions, such as stacking blocks in a particular order or navigating a large maze.

In real-world tests, they utilized HuGE to train robotic arms to draw the letter "U" and pick and place objects. For these tests, they crowdsourced data from 109 nonexpert users in 13 different countries spanning three continents.

In real-world and simulated experiments, HuGE helped agents learn to achieve the goal faster than other methods.

The researchers also found that data crowdsourced from nonexperts yielded better performance than synthetic data, which were produced and labeled by the researchers. For nonexpert users, labeling 30 images or videos took fewer than two minutes.

"This makes it very promising in terms of being able to scale up this method," Torne adds.

In a related paper, which the researchers presented at the recent Conference on Robot Learning, they enhanced HuGE so an AI agent can learn to perform the task, and then autonomously reset the environment to continue learning. For instance, if the agent learns to open a cabinet, the method also guides the agent to close the cabinet.

"Now we can have it learn completely autonomously without needing human resets," he says.

The researchers also emphasize that, in this and other learning approaches, it is critical to ensure that AI agents are aligned with human values.

In the future, they want to continue refining HuGE so the agent can learn from other forms of communication, such as natural language and physical interactions with the robot. They are also interested in applying this method to teach multiple agents at once.

Research Report:"Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback"

Related Links
Computer Science and Artificial Intelligence Laboratory
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Autonomous excavator creates 3D map of rocks to build 19-foot-tall wall
Washington DC (UPI) Nov 22, 2023
An autonomous excavator developed by researchers at the Federal Institute of Technology (ETH) in Zurich, Switzerland, has successfully constructed 19.6-foot-tall and 130-foot-long wall. The excavator, named HEAP, scans rocks to determine their size, shape and center of gravity. An algorithm then determines the best location to place each rock. "Using sensors, the excavator can autonomously draw a 3D map of the construction site and localize existing building blocks and stones for the wal ... read more

ROBO SPACE
Big bang: Dutch firm eyes space baby

Cosmic currents: Preserving water quality for astronauts during space exploration

GreenOnyx's Wanna Greens Makes Space Debut Aboard SpaceX CRS-29 Mission

AI-Powered Space Situational Awareness Boosted by Neuraspace-Deimos Collaboration

ROBO SPACE
US 'strongly condemns' N. Korean space launch

SpaceX Starship disintegrates after successful stage separation

Progress in Starship test launch, but ship and booster explode

Starship Test Flies Higher: SpaceX Marks Progress Despite Late Test Incident

ROBO SPACE
NASA uses two worlds to test future Mars helicopter designs

Spacecraft fall silent as Mars disappears behind the Sun

The Long Wait

Here Comes the Sun: Perseverance Readies for Solar Conjunction

ROBO SPACE
China's BeiDou and Fengyun Satellites Elevate Global Weather Forecasting Capabilities

New scientific experimental samples from China's space station return to Earth

Shenzhou XVI crew return after 'very cool journey'

Chinese astronauts return to Earth with fruitful experimental results

ROBO SPACE
Embry-Riddle's Innovative Mission Control Lab prepares students for booming space sector

MDA initiates work on a new digital satellite constellation

Maxar hands over JUPITER 3, to EchoStar

Amazon's Project Kuiper completes successful tests of broadband connectivity

ROBO SPACE
Map highlights environmental and social costs of rare earths extraction

Japan PM says experts to talk in China seafood row

NASA's Deep Space Optical Comm Demo Sends, Receives First Data

ReOrbit's Report Highlights Software-First Satellites as Key Growth Drivers in Space Industry

ROBO SPACE
Hubble measures the size of the nearest transiting Earth-sized planet

Webb detects water vapor, sulfur dioxide and sand clouds in the atmosphere of a nearby exoplanet

Webb follows neon signs toward new thinking on planet formation

Supporting the search for alien life by exploring geologic faulting on icy moons

ROBO SPACE
Juice burns hard towards first-ever Earth-Moon flyby

Fall into an ice giant's atmosphere

Juno finds Jupiter's winds penetrate in cylindrical layers

Salts and organics observed on Ganymede's surface by June

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.