Fellow sysadmins, if you want to a quick fix to stop the hungry GPT/OpenAI and CC bots from harvesting content across your many sites, here's a one-liner I just put together:
find -L . -name robots.txt -type f -print0 | xargs -0 sed -i -e '$a User-agent:\ GPTBot\nDisallow:\ /\nUser-agent:\ CCBot\nDisallow:\ /'
Just remove the `-i` switch to do a dry run. It follows symlinks and will append to the end of your existing robots.txt's.
#linux #unix #privacy #sysadmin