Commons:Bots/Requests/GeographBot 2

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: Multichill (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Upload more files from from https://www.geograph.org.uk

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Nearly 5 million files to upload so in practice continues

Maximum edit rate (e.g. edits per minute): Just the regular upload speed so probably max about 10 files per minute (depending on file size).

Bot flag requested: (Y/N): Y (previous approval)

Programming language(s): Python

It's about ten years since the first big Geograph upload so it's time to catch up. The initial upload of the 1.8 million files was the first time we did a really scale batch upload. Now ten years later we have new technology: structured data on Commons and I'm going to use that for these new uploads. The preparation for this was actually already done last year.

One of the big struggles of the initial upload was the reverse geocoding. I'm using http://edwardbetts.com/geocode/ by User:Edward now. This is based on OpenStreetMap and Wikidata. We added a lot of missing data and links in both OpenStreetMap and Wikidata to get to the point that reverse geocoding now work on a civil parish (or equivalent) level for the whole of the United Kingdom. So for (nearly) every coordinate in the UK, we'll get the right Wikidata item and Commons category. More information at Commons:Geograph Britain and Ireland/Reverse geocoding.

Geograph uses tags to describe images. We (thanks MGA73!) mapped these to Wikidata items, see User:GeographBot/Tags. It already covers the majority of the images.

When a file gets uploaded, all data gets added as structured data. Only two things get added in the wikitext:

  1. {{Geograph from structured data}} which uses the structured data to fill the regular templates.
  2. One or two categories for the location. Reverse geocoding is done for the camera and depicted place. Only if these have different categories, two categories will be added.

Besides the location category, I won't be adding any other categories. All the tags will be converted into depicts (P180) statements. So you'll see depicts (P180) -> pub (Q212198) instead of Category:Pubs in Flintshire. I expect some people who love the good old category system to go through a bit of a mourning process here. But no, I'm not going to add these categories. All the intersection logic died with the Toolserver anyway.

This structure data statements are being added:

It did some sample uploads:

Multichill (talk) 17:59, 14 February 2021 (UTC)[reply]

Discussion

Any other questions? Multichill (talk) 21:39, 22 February 2021 (UTC)[reply]

Please make some test edits with the GeographBot account. --Krd 16:38, 28 February 2021 (UTC)[reply]
@Krd: did a couple, see Special:ListFiles/GeographBot. Multichill (talk) 19:07, 4 March 2021 (UTC)[reply]

Approved. --Krd 10:42, 5 March 2021 (UTC)[reply]