~nytpu/public-inbox

1

Downloading Metadata

Wyatt Avery <mail@wdavery.com>
Details
Message ID
<4429905.LvFx2qVVIh@wdesktop>
DKIM signature
missing
Download raw message
This project is a huge time saver for me. Simple and effective. So thanks to any and all who have worked on it thus far.

In my use, there is a lot of metadata that's been added to Wikimedia Commons that is not embedded within the file, such as date, credit, notes, source, and rights (permission).

When scraping a collection of images it would be invaluable to have recorded that information as it came in.

Some may say you do not want to alter the original file, which is a valid stance. To avoid that you could store metadata in an XML sidecar file.
In terms of creating this file once the data is scraped, `exiftool` would do the job easily. 
Outside of my knowledge, but I'd imagine scraping the fields would be fairly easy to do since the site has standardized fields separated nicely in the html.

Obviously this greatly extends the scope of this tool, but I see it as a natural extension that would be a large value add.
As far as I know no tool can do this currently.

Interested to hear any thoughts on this.
Thanks,
Wyatt
Details
Message ID
<20221002200842.hvp5alndijy3hbj4@GLaDOS.local>
In-Reply-To
<4429905.LvFx2qVVIh@wdesktop> (view parent)
DKIM signature
missing
Download raw message
Hi!

On 2022-09-29 01:39PM, Wyatt Avery wrote:
> This project is a huge time saver for me. Simple and effective. So
> thanks to any and all who have worked on it thus far.
Thanks!

> In my use, there is a lot of metadata that's been added to Wikimedia
> Commons that is not embedded within the file, such as date, credit,
> notes, source, and rights (permission).
> 
> When scraping a collection of images it would be invaluable to have
> recorded that information as it came in.
> 
> Some may say you do not want to alter the original file, which is a
> valid stance. To avoid that you could store metadata in an XML sidecar
> file.
It looks like WikiMedia Commons has a nice API query specifically to
get metadata:
https://commons.wikimedia.org/w/api.php?action=help&modules=query%2Bimageinfo

It can even be downloaded in XML format directly, although I imagine it
probably won't be in the format most software expects for XML sidecar
files.  I'll look into getting those queries working, I'll give you a
heads-up when it's implemented.

> Obviously this greatly extends the scope of this tool, but I see it as
> a natural extension that would be a large value add.
> As far as I know no tool can do this currently.
Nah, I consider it very in-scope.  I mean, the whole point is to scrape
WikiMedia Commons so it makes sense to get the metadata too.

Thanks for the suggestion!
~nytpu

-- 
Alex // nytpu
alex@nytpu.com
gpg --locate-external-key alex@nytpu.com
Reply to thread Export thread (mbox)