DPIIT committee on AI and copyright suggests compromise position on data mining

Share This Post


Image used for representation purpose only.
| Photo Credit: Getty Images/iStockphoto

A government working paper released on Monday (December 8, 2025) suggested that AI large language models (LLMs) like ChatGPT should, by default, have access to content freely available online, and that publishers should not have an opt-out mechanism for such content. Instead, a copyright society-like non-profit should be set up to collect royalties for both members and non-members of that body. 

The working paper, authored by a committee formed by the Department for Promotion of Industry and Internal Trade, is not final, and is accepting public comments for thirty days. The document is one of the main indicators of how the Indian government is thinking of balancing copyright holders’ fears that AI systems will regurgitate content they invested in without remuneration, and LLM developers who have routinely consumed massive amounts of data online to train their models. 

Nasscom, which was represented in the DPIIT’s committee, dissented, arguing that forced royalties would amount to a “tax on innovation,” and said that “mining” or scraping the web for data must be allowed for freely available content without paywalls, and that both crawlable and access-restricted content providers should have options to “reserve” their content from being mined for LLM development.

No opt-out

The committee rejected Nasscom’s dissent, arguing that small content creators may not have the means to actually enforce such opt-outs. 

The Digital News Publishers Association, which represents traditional news media outlets with a digital presence, including The Hindu, has sued ChatGPT maker OpenAI in the Delhi High Court for copyright infringement. OpenAI denies the allegations. The working paper argues that it may not be prudent to await the outcome of this and other similar litigation. 

The recommendations, if put in place through a law, would essentially eliminate any allegations of improper access to data, by blessing all access provided a fee is paid. This model is similar to the “compulsory licensing” framework in place for radio stations in India, which are empowered to play music without negotiating rights for them, as long as a statutorily prescribed fee is paid to rightsholders.

This balancing may face pushback from both AI developers and content creators; while the latter may argue against anything that increases development costs — few AI firms are even profitable at the moment, leaving little appetite to share revenues — while content creators may resist a flat fee if they feel their inputs are far more valuable in training a model than other royalty recipients.

A payout to the copyright society that is set up for distributing AI riches to content creators would be distributed by giving weightage to factors like web traffic and social indicators, like how respectable a publisher is. Any decision would be appealable to the judiciary, the working group says.



Source link

spot_img

Related Posts

spot_img