<6e9e6861-d51a-4d4c-9b0a-8d216096b4bd@app.fastmail.com>
I don't like to have my website indexed or harvested without my permission. The corporations that engage in these activities make my skin crawl. I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte uses htaccess: https://ethanmarcotte.com/wrote/blockin-bots/ Is there a way to mimic Marcotte's technique on sourcehut pages? Perhaps using SiteConfig? https://srht.site/advanced-settings Thanks, Brett
<87y16wnhp5.fsf@city17.xyz>
<6e9e6861-d51a-4d4c-9b0a-8d216096b4bd@app.fastmail.com>
(view parent)
> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte > uses htaccess: > https://ethanmarcotte.com/wrote/blockin-bots/ Isn't a robots.txt to your pages.sr.ht site all that it's needed to prevent certain UA to index a website? Or I am missing something?
<6b6fa680-2e5c-4ac8-b722-1fe1de37202d@app.fastmail.com>
<87y16wnhp5.fsf@city17.xyz>
(view parent)
On Sun, 23 Jun 2024, at 10:15, jman wrote: >> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte >> uses htaccess: >> https://ethanmarcotte.com/wrote/blockin-bots/ > > Isn't a robots.txt to your pages.sr.ht site all that it's > needed to prevent certain UA to index a website? > > Or I am missing something? In case you missed it, the link you quoted answers your question.
<715a4fd7-58fd-462a-89d7-65bcfeb8fe65@app.fastmail.com>
<6b6fa680-2e5c-4ac8-b722-1fe1de37202d@app.fastmail.com>
(view parent)
>>> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte >>> uses htaccess: >>> https://ethanmarcotte.com/wrote/blockin-bots/ >> >> Isn't a robots.txt to your pages.sr.ht site all that it's >> needed to prevent certain UA to index a website? >> >> Or I am missing something? > > In case you missed it, the link you quoted answers your question. Cloudflare elaborated on the point I was trying to make and the question I tried to ask: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click Unscrupulous companies do not honor robots.txt fies. I would like to supplement my robots.txt file on the server side. Is there a way I can do that if I host my website on sourcehut?
<9206312a-d1cd-496c-a88c-4802346b90e4@app.fastmail.com>
<715a4fd7-58fd-462a-89d7-65bcfeb8fe65@app.fastmail.com>
(view parent)
On Sat, Jul 6, 2024, at 8:45 PM, Brett Bonfield wrote: >>>> I've updated my robots.txt file. I would like to supplement it on the server side. Ethan Marcotte >>>> uses htaccess: >>>> https://ethanmarcotte.com/wrote/blockin-bots/ >>> >>> Isn't a robots.txt to your pages.sr.ht site all that it's >>> needed to prevent certain UA to index a website? >>> >>> Or I am missing something? >> >> In case you missed it, the link you quoted answers your question. > > Cloudflare elaborated on the point I was trying to make and the > question I tried to ask: > https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click > > Unscrupulous companies do not honor robots.txt fies. > > I would like to supplement my robots.txt file on the server side. Is > there a way I can do that if I host my website on sourcehut? Another good post on this topic, this one by Adam Newbold, who runs omg.lol: https://notes.neatnik.net/2024/06/gotta-block-em-all