IIT Bombay develops AI model to decode satellite images using natural language

Share This Post


Researchers at the Indian Institute of Technology, Bombay (IIT Bombay), have developed an artificial intelligence (AI) model that enables machines to interpret satellite and drone images using everyday language prompts, potentially transforming applications in disaster response, surveillance, urban planning, and agriculture.

The model, called Adaptive Modality-guided Visual Grounding (AMVG), has been designed by a team led by Professor Biplab Banerjee from IIT Bombay’s Centre of Studies in Resources Engineering.

Spotting a cat in a living room might be easy for artificial intelligence, but decoding complex, high-resolution satellite imagery based on natural language instructions has long been a challenge, said Shabnam Choudhury, lead author and PhD. researcher at IIT Bombay. AMVG aims to bridge that gap by allowing users to feed prompts like “find all damaged buildings near the flooded river” and receive targeted results within minutes, even from hundreds of cluttered images.

The research, published in the International Society for Photogrammetry and Remote Sensing Journal of Photogrammetry and Remote Sensing, suggests that AMVG could make image analysis faster, more intuitive, and more accessible to agencies and researchers.

“Remote sensing images are rich in detail but extremely challenging to interpret automatically. Existing models struggle with ambiguity and contextual commands,” explained Ms. Choudhury.

AMVG introduces a combination of innovations – including a Multi-stage Tokenised Encoder and Attention Alignment Loss (AAL) – that help the model identify objects more accurately based on contextual understanding. AAL, in particular, acts like a “virtual coach,” teaching the system to focus on relevant image regions when interpreting commands. “When a human reads ‘the white truck beside the fuel tank,’ our eyes know where to look. AAL teaches the machine to do the same,” Ms. Choudhury said.

The team envisions a wide range of applications. In disaster response, agencies could quickly locate damaged infrastructure after floods or earthquakes. Security organisations could identify camouflaged vehicles near sensitive areas, while farmers could monitor crop health by simply asking the model to highlight yellowing patches.

However, Professor Banerjee clarified that AMVG has not yet been tested in real-world disaster scenarios. Speaking to The Hindu, he said, “We have done some preliminary studies, but due to the absence of real-world grounding datasets for disaster management, we couldn’t conduct a full-scale evaluation. Crafting such a dataset is one of our future plans.”

According to the team, AMVG outperforms existing approaches when detecting damaged buildings, hidden vehicles, or crop patterns in complex terrains, though a more comprehensive benchmark study is still pending.

Asked whether AMVG could help governments and NGOs during floods, earthquakes, or wildfires by providing real-time insights, Professor Banerjee was optimistic, “Surely. That’s one of the strongest use cases we envision.”

The researchers are also exploring collaborations to bring AMVG into operational use. “We have already worked with ISRO on some similar problems,” Professor Banerjee revealed. “A new round of collaborations with ISRO is likely to start shortly, and such vision-language models will be rigorously considered there.”

AMVG has shown encouraging results across imagery from satellites, drones, and aircraft-based sensors. The next phase of research involves deploying the model in different geographical and environmental scenarios to evaluate its adaptability.

In a notable step for the field, the IIT Bombay team has also open-sourced the AMVG implementation on GitHub. “Open-sourcing is still uncommon in remote sensing. We wanted to encourage transparency and accelerate progress,” Ms. Choudhury said.

While the model shows promise, the team acknowledges limitations. AMVG currently depends on high-quality annotated datasets and requires optimisation for real-time deployment. Work is underway on sensor-aware versions and compositional grounding techniques to improve adaptability across diverse landscapes.

“Our goal is to build a unified remote sensing understanding system – one that can ground, describe, retrieve, and reason about any image using natural language,” Ms. Choudhury said.

Published – September 04, 2025 04:55 pm IST



Source link

spot_img

Related Posts

spot_img
Previous article