24/7 Space News
ROBO SPACE
Using language to give robots a better grasp of an open-ended world
Feature Fields for Robotic Manipulation (F3RM) enables robots to interpret open-ended text prompts using natural language, helping the machines manipulate unfamiliar objects. The system's 3D feature fields could be helpful in environments that contain thousands of objects, such as warehouses.
Using language to give robots a better grasp of an open-ended world
by Alex Shipps | MIT CSAIL
Boston MA (SPX) Nov 03, 2023

Imagine you're visiting a friend abroad, and you look inside their fridge to see what would make for a great breakfast. Many of the items initially appear foreign to you, with each one encased in unfamiliar packaging and containers. Despite these visual distinctions, you begin to understand what each one is used for and pick them up as needed.

Inspired by humans' ability to handle unfamiliar objects, a group from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) designed Feature Fields for Robotic Manipulation (F3RM), a system that blends 2D images with foundation model features into 3D scenes to help robots identify and grasp nearby items. F3RM can interpret open-ended language prompts from humans, making the method helpful in real-world environments that contain thousands of objects, like warehouses and households.

F3RM offers robots the ability to interpret open-ended text prompts using natural language, helping the machines manipulate objects. As a result, the machines can understand less-specific requests from humans and still complete the desired task. For example, if a user asks the robot to "pick up a tall mug," the robot can locate and grab the item that best fits that description.

"Making robots that can actually generalize in the real world is incredibly hard," says Ge Yang, postdoc at the National Science Foundation AI Institute for Artificial Intelligence and Fundamental Interactions and MIT CSAIL. "We really want to figure out how to do that, so with this project, we try to push for an aggressive level of generalization, from just three or four objects to anything we find in MIT's Stata Center. We wanted to learn how to make robots as flexible as ourselves, since we can grasp and place objects even though we've never seen them before."

Learning "what's where by looking"
The method could assist robots with picking items in large fulfillment centers with inevitable clutter and unpredictability. In these warehouses, robots are often given a description of the inventory that they're required to identify. The robots must match the text provided to an object, regardless of variations in packaging, so that customers' orders are shipped correctly.

For example, the fulfillment centers of major online retailers can contain millions of items, many of which a robot will have never encountered before. To operate at such a scale, robots need to understand the geometry and semantics of different items, with some being in tight spaces. With F3RM's advanced spatial and semantic perception abilities, a robot could become more effective at locating an object, placing it in a bin, and then sending it along for packaging. Ultimately, this would help factory workers ship customers' orders more efficiently.

"One thing that often surprises people with F3RM is that the same system also works on a room and building scale, and can be used to build simulation environments for robot learning and large maps," says Yang. "But before we scale up this work further, we want to first make this system work really fast. This way, we can use this type of representation for more dynamic robotic control tasks, hopefully in real-time, so that robots that handle more dynamic tasks can use it for perception."

The MIT team notes that F3RM's ability to understand different scenes could make it useful in urban and household environments. For example, the approach could help personalized robots identify and pick up specific items. The system aids robots in grasping their surroundings - both physically and perceptively.

"Visual perception was defined by David Marr as the problem of knowing 'what is where by looking,'" says senior author Phillip Isola, MIT associate professor of electrical engineering and computer science and CSAIL principal investigator. "Recent foundation models have gotten really good at knowing what they are looking at; they can recognize thousands of object categories and provide detailed text descriptions of images. At the same time, radiance fields have gotten really good at representing where stuff is in a scene. The combination of these two approaches can create a representation of what is where in 3D, and what our work shows is that this combination is especially useful for robotic tasks, which require manipulating objects in 3D."

Creating a "digital twin"
F3RM begins to understand its surroundings by taking pictures on a selfie stick. The mounted camera snaps 50 images at different poses, enabling it to build a neural radiance field (NeRF), a deep learning method that takes 2D images to construct a 3D scene. This collage of RGB photos creates a "digital twin" of its surroundings in the form of a 360-degree representation of what's nearby.

In addition to a highly detailed neural radiance field, F3RM also builds a feature field to augment geometry with semantic information. The system uses CLIP, a vision foundation model trained on hundreds of millions of images to efficiently learn visual concepts. By reconstructing the 2D CLIP features for the images taken by the selfie stick, F3RM effectively lifts the 2D features into a 3D representation.

Keeping things open-ended
After receiving a few demonstrations, the robot applies what it knows about geometry and semantics to grasp objects it has never encountered before. Once a user submits a text query, the robot searches through the space of possible grasps to identify those most likely to succeed in picking up the object requested by the user. Each potential option is scored based on its relevance to the prompt, similarity to the demonstrations the robot has been trained on, and if it causes any collisions. The highest-scored grasp is then chosen and executed.

To demonstrate the system's ability to interpret open-ended requests from humans, the researchers prompted the robot to pick up Baymax, a character from Disney's "Big Hero 6." While F3RM had never been directly trained to pick up a toy of the cartoon superhero, the robot used its spatial awareness and vision-language features from the foundation models to decide which object to grasp and how to pick it up.

F3RM also enables users to specify which object they want the robot to handle at different levels of linguistic detail. For example, if there is a metal mug and a glass mug, the user can ask the robot for the "glass mug." If the bot sees two glass mugs and one of them is filled with coffee and the other with juice, the user can ask for the "glass mug with coffee." The foundation model features embedded within the feature field enable this level of open-ended understanding.

"If I showed a person how to pick up a mug by the lip, they could easily transfer that knowledge to pick up objects with similar geometries such as bowls, measuring beakers, or even rolls of tape. For robots, achieving this level of adaptability has been quite challenging," says MIT PhD student, CSAIL affiliate, and co-lead author William Shen. "F3RM combines geometric understanding with semantics from foundation models trained on internet-scale data to enable this level of aggressive generalization from just a small number of demonstrations."

Shen and Yang wrote the paper under the supervision of Isola, with MIT professor and CSAIL principal investigator Leslie Pack Kaelbling and undergraduate students Alan Yu and Jansen Wong as co-authors. The team was supported, in part, by Amazon.com Services, the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research's Multidisciplinary University Initiative, the Army Research Office, the MIT-IBM Watson Lab, and the MIT Quest for Intelligence. Their work will be presented at the 2023 Conference on Robot Learning.

Research Report:"Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation"

ai.spacedaily.com analysis

Relevance Scores:

1. Robotics Industry Analyst: 9/10
2. Stock and Finance Market Analyst: 6/10
3. Government Policy Analyst: 7/10

Analyst Summary:

From the perspective of a Robotics Industry Analyst, the development of the Feature Fields for Robotic Manipulation (F3RM) system by MIT's CSAIL is a breakthrough with significant implications. It advances the capabilities of robots to generalize from human language prompts to physical actions, showing a high degree of adaptability and learning which are essential in household and industrial applications. This technology is poised to enhance the flexibility and efficiency of robotic systems, contributing to the evolution of the robotics sector which has increasingly aimed to integrate natural language processing and object manipulation over the past 25 years. The shift from specialized to more general-purpose robots marks a major trend in robotics, reflecting the evolution from simple programmable machines to AI-driven systems with advanced perception and decision-making abilities.

For Stock and Finance Market Analysts, this development indicates potential for increased investment in robotics and AI technologies. Companies that can integrate F3RM-like systems could gain a competitive edge in logistics, manufacturing, and consumer robotics-a sector that has seen exponential growth and interest from venture capitalists and public markets alike. However, the direct impact on stock valuations is more complex, as it depends on market readiness, scalability of the technology, and its adoption curve.

Government Policy Analysts would view F3RM as an innovation that may require new frameworks for safety, labor, and technology regulation. With the increase in AI capabilities, there could be implications for workforce development and education, intellectual property law, and even national security considerations due to the dual-use potential of such technologies. Over the past decades, governments have struggled to keep pace with rapid advancements in robotics, and F3RM represents another layer of complexity in this ongoing challenge.

Comparison with Sector Trends:

In the last quarter-century, robotics has advanced from rudimentary automation in manufacturing to complex AI-integrated systems that are capable of performing a wide range of tasks. F3RM represents a continuation of this trend towards greater adaptability and generalization in robotics, moving away from task-specific programming to systems capable of learning and adapting in real-time. A notable similarity is the integration of machine learning, which has been a staple in robotics development recently.

Investigative Questions:

1. What are the specific technical limitations of F3RM in its current form, and what is the roadmap for overcoming these challenges?

2. How does F3RM's capability compare to current industry-standard robotic manipulation systems in terms of speed, accuracy, and reliability?

3. What are the implications of F3RM for workforce displacement and upskilling, particularly in sectors like logistics and manufacturing?

4. How could F3RM's language interpretation capabilities be extended to more complex tasks beyond object manipulation?

5. What cybersecurity measures are necessary to ensure the safe integration of systems like F3RM into critical supply chain infrastructure?

Related Links
Computer Science and Artificial Intelligence Laboratory (CSAIL)
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
UK, US, China sign AI safety pledge at UK summit
Bletchley Park, United Kingdom (AFP) Nov 1, 2023
Countries including the UK, United States and China on Wednesday agreed the "need for international action" as political and tech leaders gathered for the world's first summit on artificial intelligence (AI) safety. The UK government kicked off the two-day event at Bletchley Park, north of London, by publishing the "Bletchley Declaration" signed by 28 countries and the European Union. In it, they agreed on "the urgent need to understand and collectively manage potential risks through a new joint ... read more

ROBO SPACE
SwRI's Dr. Alan Stern conducts space research during suborbital spaceflight aboard Virgin Galactic's VSS Unity

Apollo astronaut Thomas K. Mattingly dies aged 87

NASA astronauts Moghbeli and O'Hara embark on rare all-female spacewalk

Australian school students are experimenting with 'space veggies' in a NASA initiative

ROBO SPACE
SQX-2Y rocket demonstrates vertical take-off and landing capabilities

SpinLaunch announces new leadership roles

SpaceX launches 23 Starlink Internet satellites after aborted mission

Hot summer for Europe's reusable rocket engine

ROBO SPACE
Estimating depositional timing on Mars using cosmogenic radionuclide data

Mars Climate Sounder data reveals new cloud trends, study shows

Bewitched Battery: Sols 3994-3995

Scientists discover molten layer covering Martian core

ROBO SPACE
New scientific experimental samples from China's space station return to Earth

Shenzhou XVI crew return after 'very cool journey'

Chinese astronauts return to Earth with fruitful experimental results

Chinese astronauts return to Earth after 'successful' mission

ROBO SPACE
InSPA collaborates with multi-sector partners to fast-track space commercialization

New technologies for the future of European space

Follow NASA's Starling Swarm in Real Time

Fugro SpAARC's operations set to grow with new funding from Western Australian Govt

ROBO SPACE
'Call of Duty', the stalwart video game veteran, turns 20

NRL ISS Mission seeks new bioinspired materials

Panama bans new mining contracts in response to mass protests

NASA's InSPA Aims to Stimulate Commercial Manufacturing in Low Earth Orbit

ROBO SPACE
Scorching, seven-planet system revealed by new Kepler Exoplanet list

Jurassic worlds might be easier to spot than modern Earth

Giant planets cast a deadly pall

ET phone Dublin? Astrophysicists scan the Galaxy for signs of life

ROBO SPACE
Salts and organics observed on Ganymede's surface by June

New jet stream discovered in Jupiter's upper atmosphere

Uranus aurora discovery offers clues to habitable icy worlds

How NASA is protecting Europa Clipper from space radiation

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.