~sircmpwn/sr.ht-discuss

4 3

Ethan Marcotte's "Blockin’ bots" technique

Details
Message ID
<6e9e6861-d51a-4d4c-9b0a-8d216096b4bd@app.fastmail.com>
DKIM signature
pass
Download raw message
I don't like to have my website indexed or harvested without my permission. The corporations that engage in these activities make my skin crawl.

I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte uses htaccess:
https://ethanmarcotte.com/wrote/blockin-bots/

Is there a way to mimic Marcotte's technique on sourcehut pages? Perhaps using SiteConfig?
https://srht.site/advanced-settings

Thanks,

Brett
Details
Message ID
<87y16wnhp5.fsf@city17.xyz>
In-Reply-To
<6e9e6861-d51a-4d4c-9b0a-8d216096b4bd@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte 
> uses htaccess:
> https://ethanmarcotte.com/wrote/blockin-bots/

Isn't a robots.txt to your pages.sr.ht site all that it's
needed to prevent certain UA to index a website?

Or I am missing something?
Details
Message ID
<6b6fa680-2e5c-4ac8-b722-1fe1de37202d@app.fastmail.com>
In-Reply-To
<87y16wnhp5.fsf@city17.xyz> (view parent)
DKIM signature
pass
Download raw message
On Sun, 23 Jun 2024, at 10:15, jman wrote:
>> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte 
>> uses htaccess:
>> https://ethanmarcotte.com/wrote/blockin-bots/
>
> Isn't a robots.txt to your pages.sr.ht site all that it's
> needed to prevent certain UA to index a website?
>
> Or I am missing something?

In case you missed it, the link you quoted answers your question.
Details
Message ID
<715a4fd7-58fd-462a-89d7-65bcfeb8fe65@app.fastmail.com>
In-Reply-To
<6b6fa680-2e5c-4ac8-b722-1fe1de37202d@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
>>> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte 
>>> uses htaccess:
>>> https://ethanmarcotte.com/wrote/blockin-bots/
>>
>> Isn't a robots.txt to your pages.sr.ht site all that it's
>> needed to prevent certain UA to index a website?
>>
>> Or I am missing something?
>
> In case you missed it, the link you quoted answers your question.

Cloudflare elaborated on the point I was trying to make and the question I tried to ask:
https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click

Unscrupulous companies do not honor robots.txt fies.

I would like to supplement my robots.txt file on the server side. Is there a way I can do that if I host my website on sourcehut?
Details
Message ID
<9206312a-d1cd-496c-a88c-4802346b90e4@app.fastmail.com>
In-Reply-To
<715a4fd7-58fd-462a-89d7-65bcfeb8fe65@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On Sat, Jul 6, 2024, at 8:45 PM, Brett Bonfield wrote:
>>>> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte 
>>>> uses htaccess:
>>>> https://ethanmarcotte.com/wrote/blockin-bots/
>>>
>>> Isn't a robots.txt to your pages.sr.ht site all that it's
>>> needed to prevent certain UA to index a website?
>>>
>>> Or I am missing something?
>>
>> In case you missed it, the link you quoted answers your question.
>
> Cloudflare elaborated on the point I was trying to make and the 
> question I tried to ask:
> https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click
>
> Unscrupulous companies do not honor robots.txt fies.
>
> I would like to supplement my robots.txt file on the server side. Is 
> there a way I can do that if I host my website on sourcehut?

Another good post on this topic, this one by Adam Newbold, who runs omg.lol:
https://notes.neatnik.net/2024/06/gotta-block-em-all
Reply to thread Export thread (mbox)