~nytpu/public-inbox

1

Wikimedia User-Agent Policy

Wyatt Avery <mail@wdavery.com>
Details
Message ID
<F83C29E3-6645-4BB7-8CA9-6348AD76165A@wdavery.com>
DKIM signature
missing
Download raw message
Wikimedia requires a user agent to be set when downloading in bulk. It’s not a hard requirement, so downloads seem to work for the first few hundred, but will start to generate 403s if a user agent is not set.

Their policy can be seen here: https://meta.wikimedia.org/wiki/User-Agent_policy

When I ran into this issue I ran commons-downloader to generate URLs.txt, stopped it, and ran wget manually:
wget -i ./_URLS.txt -U “commons-downloader/1.0" -nc

Downloads were working again and no 403s were received..


As a side note, would an option to only generate the list of URLs  be valuable? Do not download any files immediately.

- Wdavery
Details
Message ID
<20221007200730.x2wpmefty54dr63n@GLaDOS.local>
In-Reply-To
<F83C29E3-6645-4BB7-8CA9-6348AD76165A@wdavery.com> (view parent)
DKIM signature
missing
Download raw message
Fixed in e688c31 (https://git.sr.ht/~nytpu/commons-downloader/commit/e688c306c53fc617a9648b606f6dc9ea2bacf9ac)

Thanks!

~nytpu

-- 
Alex // nytpu
alex@nytpu.com
gpg --locate-external-key alex@nytpu.com
Reply to thread Export thread (mbox)