I saw on the Fediverse that, following the news of OpenAI providing an
opt-out tool for robots.txt (not amused), some people have listed a
bunch of IP ranges associated to their crawlers.
https://mathstodon.xyz/@filipw/110852364162684071
I was wondering whether it would be reasonable for sourcehut to block
these IP ranges (and potentially those of Google, MS, etc, if they can
be found).
Best,
Tanguy
I went ahead and added this to robots.txt, but I'm not making any
special priority to roll out the change so it'll show up over the next
few weeks.
Not going to block any IP ranges, odds are it would result in collateral
damage.
> I went ahead and added this to robots.txt, but I'm not making any> special priority to roll out the change so it'll show up over the next> few weeks.> > Not going to block any IP ranges, odds are it would result in collateral> damage.
I understand, thanks!
------- Original Message -------
On Friday, August 11th, 2023 at 11:50, Drew DeVault <sir@cmpwn.com> wrote:
> > > I went ahead and added this to robots.txt, but I'm not making any> special priority to roll out the change so it'll show up over the next> few weeks.
Thanks for that!
Does this means that every website hosted on sourcehut is already covered?
Or should I add the robots.txt to my build process?
On 23/08/11 12:03, Drew DeVault wrote:
>robots.txt for pages.sr.ht is your own concern, you should add one if>you need it.
Thanks for the clarification. It makes sense.