With all of the various LLMs and LLM-using-services springing up all over the web without any regard at all to copyright, morality, or consent, I wanted to find a way of blocking these things from using any of my content. Normally, we would use robots.txt
to block search engines, and checking ChatGPT's docs this seems to be the case for ChatGPT. I found this was also the case for Bard. A quick search of my nginx logs, and this appears to be accurate. The following should get it done:
location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: GPTBot\nDisallow: /\nUser-agent: ChatGPT-User\nDisallow: /\nUser-agent: Google-Extended\nDisallow: /\nUser-agent: CCBot\nDisallow: /\nUser-agent: anthropic-ai\nDisallow: /\nUser-agent: Omgilibot\nDisallow: /\nUser-agent: Omgili\nDisallow: /\nUser-agent: FacebookBot\nDisallow: /\nUser-agent: Yext\nDisallow: /\n";
}
Apparently, there are a few who also follow meta tags, so:
<meta name="robots" content="noai, noimageai">
For the curious, the Absurd License has received a version bump to explicitly not allow usage with AI. It now reads:
Permission to use, copy, modify, and/or distribute this software for any purpose except LLM training or other AI training/enhancement with or without fee is hereby granted.