24/7 Space News
ROBO SPACE
New datasets aim to teach AI models cross-disciplinary scientific thinking
illustration only
New datasets aim to teach AI models cross-disciplinary scientific thinking
by Clarence Oxford
Los Angeles CA (SPX) Dec 03, 2024

What can exploding stars reveal about blood flow in arteries, or how might swimming bacteria inform our understanding of ocean dynamics? Researchers from leading institutions have taken a major step forward in training artificial intelligence (AI) models to draw insights across disciplines to unlock scientific discoveries.

The initiative, known as Polymathic AI, leverages advanced technology similar to large language models like ChatGPT, but instead of processing text, it uses datasets from fields such as astrophysics, biology, chemistry, and fluid dynamics. This approach equips the models with cross-disciplinary scientific capabilities.

"These groundbreaking datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields," said Michael McCabe, a research engineer at the Flatiron Institute in New York City and a member of Polymathic AI. "Curating these datasets is a critical step in creating multidisciplinary AI models that will enable new discoveries about our universe."

The Polymathic AI team has released two open-source datasets, collectively comprising 115 terabytes of data sourced from dozens of contributors. This massive resource is available to the public and is expected to accelerate the development of AI models capable of solving complex scientific problems. For comparison, GPT-3 required only 45 terabytes of unfiltered data during its training phase.

"The freely available datasets are an unprecedented resource for developing sophisticated machine learning models that can then tackle a wide range of scientific problems," added Ruben Ohana, a research fellow at the Flatiron Institute's Center for Computational Mathematics. "Open-sourcing this data benefits both the machine learning and scientific communities, creating a win-win situation."

The datasets are hosted on HuggingFace, a popular platform for AI models and data, and detailed in papers accepted for presentation at the prestigious NeurIPS conference in Vancouver, Canada.

"We've seen again and again that the most effective way to advance machine learning is to take difficult challenges and make them accessible to the wider research community," said McCabe. "When a new benchmark is released, it initially seems insurmountable. But opening access accelerates progress far beyond what any individual group could achieve."

Polymathic AI is a collaborative effort involving researchers from institutions such as the Simons Foundation, Flatiron Institute, New York University, and the Lawrence Berkeley National Laboratory.

The first dataset, named the Multimodal Universe, focuses on astrophysics and includes hundreds of millions of observations, such as images from NASA's James Webb Space Telescope and stellar data from ESA's Gaia spacecraft. "Machine learning has been happening for around 10 years in astrophysics, but it's still very hard to use across instruments, missions, and disciplines," said Polymathic AI researcher Francois Lanusse. "Datasets like the Multimodal Universe allow us to create models that natively understand this data and act as a Swiss Army knife for astrophysics."

The second dataset, dubbed the Well, spans 15 terabytes of data across 16 diverse datasets. It features simulations of biological systems, fluid dynamics, supernovae, and more, all rooted in mathematical equations called partial differential equations. These equations appear in a wide array of scientific problems but are notoriously difficult to solve. "This dataset encompasses a diverse range of physics simulations designed to address key limitations of current machine learning models," said Polymathic AI member Rudy Morel.

Building these datasets required extensive collaboration. "The creators of numerical simulations are sometimes skeptical of machine learning because of the hype, but they're curious about how it can benefit their research," Ohana explained.

The team is now using the datasets to train AI models, with early results showing promise. "Understanding how machine learning models generalize and interpolate across datasets from different physical systems is an exciting research challenge," said Polymathic AI member Regaldo-Saint Blancard.

Shirley Ho, project lead and group leader at the Flatiron Institute, noted, "Just like the Protein Data Bank spawned AlphaFold, I'm excited to see what the Well and the Multimodal Universe will help create." Ho will present Polymathic AI's findings at NeurIPS.

Related Links
Polymathic AI
Simons Foundation
All about the robots on Earth and beyond!

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
ROBO SPACE
Altman says Trump will keep US in AI lead; as Musk trolls OpenAI with profiteering suit
Washington (AFP) Dec 1, 2024
OpenAI CEO Sam Altman on Sunday expressed confidence that US President-elect Donald Trump's administration would support the artificial intelligence sector to ensure the United States and its allies continue to lead it. Speaking to conservative US broadcaster Fox News on Sunday, Altman said AI technology needed massive infrastructure support and that he believed Trump would be good at providing it. "We need to build that here and we need to be able to have the best AI infrastructure in the world ... read more

ROBO SPACE
ISS crew members prepare space botany study and pack Dragon capsule for return

McGill Professor leads AXIS mission in final phase of NASA selection process

NASA Voyager 1 returns to full operations after communication issue

Slingshot Aerospace secures $13M NOAA contract for Space Traffic Platform Interface

ROBO SPACE
SpaceX reaches milestone with 300th successful booster landing

ESA launches spacecraft that will eventually create artificial solar eclipse

Europe's troubled Vega-C rocket launches after delays

Vega-C set for launch marking its return to service

ROBO SPACE
China's Tianwen-1 probe reveals new insights into Martian internal gravity waves

Mars Ocean Analogs Completes Winter Solstice Voyage and Plans Future Expeditions

China aims to return Mars samples to Earth by 2031

Scientists map complete energy spectrum of solar high-energy protons near Mars

ROBO SPACE
Long March 12 set for inaugural launch from Hainan space center

China inflatable space capsule aces orbital test

Tianzhou 7 completes cargo Mission, Tianzhou 8 docks with Tiangong

Zebrafish thrive in space experiment on China's space station

ROBO SPACE
AST SpaceMobile teams with Cadence to drive space-based cellular broadband

Parsons and Globalstar demonstrate first software-defined LEO satellite solution

Losses in 2024 cyclone season unusually high: Munich Re

Veteran Ventures Capital invests in Turion Space to drive advanced space technology

ROBO SPACE
A new way to create realistic 3D shapes using generative AI

Speaking crystal AI predicts atomic arrangements to aid material discovery

Scientists explore sustainable use of fly ash for water treatment

Cracking the Code for materials that can learn

ROBO SPACE
Unveiling a hydrogen-controlled nano-switch in electron transport proteins

Final data and undiscovered images from NASA's NEOWISE

Team identifies how interstellar medium impacts pulsar signals

Discovery Alert: a 'Hot Neptune' in a Tight Orbit

ROBO SPACE
Magnetic tornado is stirring up the haze at Jupiter's poles

Uranus moons could hold clues to hidden oceans for future space missions

A clue to what lies beneath the bland surfaces of Uranus and Neptune

Europa Clipper deploys instruments on journey to icy moon of Jupiter

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.