AI boosts accuracy in stellar classification efforts

AI boosts accuracy in stellar classification efforts
by Simon Mansfield
Sydney, Australia (SPX) Mar 25, 2025

AI tools are revolutionizing the way astronomers study celestial bodies, offering new levels of precision and automation in classifying stars. A global research collaboration recently demonstrated how deep learning algorithms and large language models can efficiently and accurately categorize stars based on their light curves. The findings, published on February 26 in *Intelligent Computing*, are detailed in a study titled "Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification."

Central to the research is the StarWhisper LightCurve series, a set of three AI-powered models developed to process and classify variable stars from light curve data. These models utilize automated deep learning techniques, which autonomously adjust key training parameters such as learning rate, batch size, and model complexity, thereby reducing the need for manual adjustments.

Researchers trained the models using light curve data obtained from NASA's Kepler and K2 missions. The dataset primarily included five major types of variable stars, along with a smaller subset of rare star types to enhance the models' versatility.

In performance evaluations, the AI models demonstrated high accuracy in categorizing different types of variable stars. Among them, the Conv1D + BiLSTM model, which merges convolutional neural networks with bidirectional long short-term memory layers, achieved a 94% accuracy rate. Meanwhile, the Swin Transformer, an advanced model derived from natural language processing transformers, attained a 99% accuracy rate.

One of the study's highlights was the Swin Transformer's ability to identify Type II Cepheid stars-a rare form of pulsating star constituting only 0.02% of the dataset-with 83% accuracy.

Despite its superior accuracy, the Swin Transformer requires extensive preprocessing, including converting light curves into image format. In contrast, the StarWhisper LightCurve models achieved close to 90% accuracy while requiring minimal human intervention, thereby streamlining data processing and enabling scalable, parallel analysis. This efficiency supports the development of multi-modal AI tools in astronomical research.

The StarWhisper LightCurve suite comprises three large language models tailored to different formats of astronomical data:

- A text-based model built on Gemini 7B, optimized for time-series data classification.

- A multimodal model, based on DeepSeek-VL-7B-Chat, designed for analyzing image-rendered light curves.

- An audio-based model, developed using Qwen-Audio, which converts light curves into sound wave data for classification.

These models form part of the broader StarWhisper initiative, an AI project focused on building large language models with robust reasoning and instruction-following capabilities for astronomy. Additional information is available at: https://github.com/Yu-Yang-Li/StarWhisper.

Research Report:Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification