Commons:Bots/Requests/GeographBot 2
GeographBot (talk · contribs) 2
Operator: Multichill (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought: Upload more files from from https://www.geograph.org.uk
Automatic or manually assisted: Automatic
Edit type (e.g. Continuous, daily, one time run): Nearly 5 million files to upload so in practice continues
Maximum edit rate (e.g. edits per minute): Just the regular upload speed so probably max about 10 files per minute (depending on file size).
Bot flag requested: (Y/N): Y (previous approval)
Programming language(s): Python
It's about ten years since the first big Geograph upload so it's time to catch up. The initial upload of the 1.8 million files was the first time we did a really scale batch upload. Now ten years later we have new technology: structured data on Commons and I'm going to use that for these new uploads. The preparation for this was actually already done last year.
One of the big struggles of the initial upload was the reverse geocoding. I'm using http://edwardbetts.com/geocode/ by User:Edward now. This is based on OpenStreetMap and Wikidata. We added a lot of missing data and links in both OpenStreetMap and Wikidata to get to the point that reverse geocoding now work on a civil parish (or equivalent) level for the whole of the United Kingdom. So for (nearly) every coordinate in the UK, we'll get the right Wikidata item and Commons category. More information at Commons:Geograph Britain and Ireland/Reverse geocoding.
Geograph uses tags to describe images. We (thanks MGA73!) mapped these to Wikidata items, see User:GeographBot/Tags. It already covers the majority of the images.
When a file gets uploaded, all data gets added as structured data. Only two things get added in the wikitext:
- {{Geograph from structured data}} which uses the structured data to fill the regular templates.
- One or two categories for the location. Reverse geocoding is done for the camera and depicted place. Only if these have different categories, two categories will be added.
Besides the location category, I won't be adding any other categories. All the tags will be converted into depicts (P180) statements. So you'll see depicts (P180) -> pub (Q212198) instead of Category:Pubs in Flintshire. I expect some people who love the good old category system to go through a bit of a mourning process here. But no, I'm not going to add these categories. All the intersection logic died with the Toolserver anyway.
This structure data statements are being added:
- coordinates of the point of view (P1259) - The geo coordinates of the camera
- coordinates of depicted place (P9149) - The geo coordinates of the object (grid square) being photographed
- depicts (P180) - What do we see in the photo. This based on the [[tags and on the reverse geocoding of the object coordinates
- location of creation (P1071) - Where was the photo take? This is based on reverse geocoding the camera location with a fallback to the object location
- instance of (P31) - It's a photograph (Q125191)
- Caption - The title of the photo
- creator (P170) - Information about the photographer
- copyright status (P6216) - All photos are copyrighted (Q50423863)
- copyright license (P275) - All photos are released under the Creative Commons Attribution-ShareAlike 2.0 Generic (Q19068220). Includes some qualifiers to comply with licensing terms
- source of file (P7482) - Information about the source of the file
- inception (P571) - When was the photo made?
It did some sample uploads:
- File:Railway Embankment ^ Parish Boundary Stone - geograph.org.uk - 4000502.jpg
- File:Elm Farm - geograph.org.uk - 4000503.jpg
- File:Gorsy Lane west of Bailey Crossing - geograph.org.uk - 4000504.jpg
- File:Sallochy Wood - geograph.org.uk - 5000500.jpg
- File:Lane near East Down - geograph.org.uk - 5000501.jpg
- File:The southern side of Grosvenor Church formerly a railway building - geograph.org.uk - 5000502.jpg
- File:The western end of Grosvenor Church formerly a railway building - geograph.org.uk - 5000503.jpg
- File:Steeles - Tech Doc - Ticketmaster, Omagh - geograph.org.uk - 5000504.jpg
- File:Lane past Rowan Park - geograph.org.uk - 5000505.jpg
Multichill (talk) 17:59, 14 February 2021 (UTC)
Discussion
- @Multichill: I think it looks good. Now that you add both object and camera location I hope it will prevent attempts to block this upload. As far as I can see you add a lot of information. Much more information than is added for many manual uploads. I noticed that some files have a some extra information on Geograph. For example File:Elm Farm - geograph.org.uk - 4000503.jpg / https://www.geograph.org.uk/photo/4000503 has the line "Showing storage facilities for cereals grown on the farm.". There is no guarantee that such extra information is relevant but perhaps it would be worth including? --MGA73 (talk) 18:48, 14 February 2021 (UTC)
- This one is quite brief, but I vaguely recall quite a few of them have whole stories in them not relevant to Commons and I'm not sure if things like https://www.geograph.org.uk/photo/5000503 are even included in the API output. Multichill (talk) 18:54, 14 February 2021 (UTC)
- @Multichill: yeah there is a risk of adding a lot of blah blah if it is included. I do not mind that we don't try to import it. --MGA73 (talk) 19:08, 15 February 2021 (UTC)
- This one is quite brief, but I vaguely recall quite a few of them have whole stories in them not relevant to Commons and I'm not sure if things like https://www.geograph.org.uk/photo/5000503 are even included in the API output. Multichill (talk) 18:54, 14 February 2021 (UTC)
- Great. Looks good. Thanks for doing that. Do you want a external-id property for Geograph photographers as well? --- Jura1 (talk) 09:05, 15 February 2021 (UTC)
- Don't see add value in a new property. For the images it's nice to have the id so it's easy to check what we already have. By the way, most existing files already have structured data by the way (added that a year ago to the Geograph files). Multichill (talk) 17:05, 15 February 2021 (UTC)
- I think depicts should be set in semi-automatic mode to allow properly set prominent flag. It may make sense to create items for streets first to avoid generic items like lane. --EugeneZelenko (talk) 15:37, 15 February 2021 (UTC)
- The semi part is at User:GeographBot/Tags. The bot itself is fully automated. Of course people can always improve on what is initially added just like how we always did we categories. Multichill (talk) 17:05, 15 February 2021 (UTC)
Any other questions? Multichill (talk) 21:39, 22 February 2021 (UTC)
- Please make some test edits with the GeographBot account. --Krd 16:38, 28 February 2021 (UTC)
- @Krd: did a couple, see Special:ListFiles/GeographBot. Multichill (talk) 19:07, 4 March 2021 (UTC)
Approved. --Krd 10:42, 5 March 2021 (UTC)