The BBC mentioned final week that they’re blocking AI software program OpenAI internet crawler from scrapping it’s content material. It follows different organisations like Reuters, Getty Photos and different content material suppliers. Rhodri Talfan Davies who’s the director of countries on the BBC mentioned:
We don’t imagine the present ‘scraping’ of BBC information with out our permission . . . to coach ‘gen AI’ fashions is within the public curiosity and we wish to agree a extra structured and sustainable strategy with know-how firms.
So how will we forestall OpenAI and different AI software program from scrapping our content material? OpenAI mentioned that if you wish to discourage it’s GPTBot (that is what the bot is known as that crawls web sites) then it’s important to add this to your robots.txt file:
Consumer-agent: GPTBot
Disallow: /
What about different AI scraping web site bots?
Though ChatGPT is probably the most well-known AI firm proper now, different firms like Google and Fb are utilizing bots to scrap content material from the net. If you wish to try to forestall all bots then it’s important to add these to your robots.txt file:
Consumer-agent: CCBot
Disallow: /
Consumer-agent: ChatGPT-Consumer
Disallow: /
Consumer-agent: GPTBot
Disallow: /
Consumer-agent: Google-Prolonged
Disallow: /
Consumer-agent: Omgilibot
Disallow: /
Consumer-Agent: FacebookBot
Disallow: /
Consumer-agent: Amazonbot
Disallow: /
However why ought to we block them?
I’m not making an attempt to persuade it’s best to go and block all of those bots from scraping your content material right this moment. As a substitute it raises the query of ought to all of those bot have the power, with out your permission, scrap your content material and never reference you because the supply? Chris Coyier not too long ago blogged mentioned that:
If an enormous firm despatched a robotic to your door to ask for a lock of your hair, would you give it to them? In the event that they requested for one sq. inch of your land, would you signal it over? In the event that they requested you to run on a treadmill for one minute a day for them, would you hop to it? What in the event that they didn’t ask?
Additionally we should keep in mind that disallowing these bots doesn’t imply they may cease scrapping your content material. It can discourage them however there are many bots on the market and scrapping content material is a well-liked theme in the mean time.

Written by
Michael Gearon is a Senior Interplay Designer at Authorities Digital Service (GDS) in Cardiff. Michael Gearon is without doubt one of the authors of The Tiny CSS Initiatives guide, printed by Manning Publications. Beforehand Mike was a product designer on the GoCo Group together with GoCompare, MyVoucherCodes and WeFlip. As properly working for manufacturers in South Wales like BrandContent and HEOR.