Category talk:Media with geo-coordinates needing categories

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Subcategory bins[edit]

We have half a million pictures now, with coordinates but no way to find the ones that were made in places we know. Presumably thousands are near my home town or other familiar places, but which ones? There should be a bot putting them into subcategories according to long-lat bins. Jim.henderson (talk) 22:37, 19 March 2017 (UTC)[reply]

Maps with locations[edit]

It would be easier to categorize if people could choose photos from regions they know (as already mentioned above). The map tools should help, but none of them seems to work - the Google tool returns just blank page, the OSM tool opens a map with a note "sorry, no data to show". JiriMatejicek (talk) 10:09, 19 February 2018 (UTC)[reply]

Exactly. The tools in the Geogroup box don't work because they are overloaded. They can't handle hundreds of thousands in a category. It seems reasonable to suppose they could work with hundreds or maybe thousands. There are various standard systems to sort coordinates into hundreds of zones, or thousands or even millions, including:
Millions of course would be too many. Anyway I am confident that if all these are found to be unsuitable for making automated zonal categories, others can be used. Jim.henderson (talk) 02:42, 21 February 2018 (UTC)[reply]
This tool https://tools.wmflabs.org/wiwosm/osm-on-ol/commons-on-osm.php is able to display a self-limited number of images in a chosen map (if you zoom in, new markers may appear). Perhaps the same logic could be used for the osm4wiki tool? JiriMatejicek (talk) 06:58, 4 April 2018 (UTC)[reply]
https://tools.wmflabs.org/wikimap/?cat=Media_with_geo-coordinates_needing_categories&noimage=true --DB111 (talk) 12:05, 29 July 2019 (UTC)[reply]
Very good. One or both of these links should be on the category page. Jim.henderson (talk) 18:19, 31 July 2019 (UTC)[reply]

Clearing this category[edit]

I have put some suggestions on this category in the Village Pump: https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=288510185#Category:Media_with_geo-coordinates_needing_categories but with no reaction. Maybe somebody will notice it here... JiriMatejicek (talk) 06:59, 5 April 2018 (UTC)[reply]

I see no sign of anyone but us two. Another idea would be a semi-automated process that would feed us a latitude band, working eastward or westward from some longitude. Or vice versa. I have no idea whether any of the existing automation tools could be made to do that. Jim.henderson (talk) 00:03, 7 April 2018 (UTC)[reply]
Are there no tools that automatically convert the coordinates to a specific place on the planet (country, city)? --Fractaler (talk) 18:10, 7 April 2018 (UTC)[reply]
Obviously algorithms have been written for other contexts. When I snap a picture with my Android phone and upload by the Commons app, it suggests categories of several nearby points of interest, presumably based on the EXIF coords. I have no idea how difficult it would be to incorporate this feature into a Commons bot or other tool.
However, breaking news, in the past few minutes I discovered the incategory parameter of our Wiki search box. A search for
incategory:"Media with geo-coordinates needing categories" 40°30 74°
returned a load of pictures with about 10% falling into my New York and New Jersey area of interest. This of course is far from pure, but it's also richer pickings than I've obtained by any other method, thus far. Perhaps someone who has properly studied Wiki Searching can suggest improvements. Jim.henderson (talk) 13:24, 9 April 2018 (UTC)[reply]
Thanks for the tip, I've found a few this way. JiriMatejicek (talk) 09:58, 11 April 2018 (UTC)[reply]
Minor refinement:
incategory:"Media with geo-coordinates needing categories" 40°30 73° 59
gives majority in the western part of New York City. I intend to use it repeatedly, decrementing the minute number gradually, and later apply it farther west. If hundreds of us did this for the various places we know, we could clear the category in several months, unless for some reason there is a large number of pictures for which this method will not work. Obviously hundreds of us will not do it anyway, but it's a pleasant thought. Jim.henderson (talk) 21:38, 12 April 2018 (UTC)[reply]
Two more options (you may have used those already, but just in case):
As people often shoot in a specific area, you can search for uncategorized images from a given user - e.g. this guy has lots of photos from Algeria https://commons.wikimedia.org/wiki/Category:Photos_from_Panoramio_ID_5095076_needing_categories. Just replace his ID with one from author whose images you have already located in your area.
Also, searching for keywords (city, town, county) in different languages worked for me. JiriMatejicek (talk) 10:40, 16 April 2018 (UTC)[reply]
P.S. I've just found that option 1 as described above works only for some IDs. It worked for two that I tried, so I assumed it's universal, but it's not, sorry. Nevertheless, there is a possibility to search for images by a specific user simply by his/her name. JiriMatejicek (talk) 19:09, 16 April 2018 (UTC)[reply]
Splendid; now we've got a few different ways to use the "incategory" search feature. I figure Commons ought to have a more explicit "search in this category" button, but it doesn't, so we can use this option plus partial coords or author ID or keywords in different languages, or any of those in combinations. The author and keyword methods presumably are not limited to this geotagged category, but of course they ought to be used here. Maybe we should try to assemble this into a brief, clear instruction to go in the header of the cat. Jim.henderson (talk) 14:12, 18 April 2018 (UTC)[reply]

@JiriMatejicek and Jim.henderson: Hello from Germany. At the moment I try to empty this. Yesterday I categorize over 7000 images and I hope in maybe 2 weeks this category is empty. I wrote a Perl script that scan a number of images (for example 20000) for the camera coordinate. Then I use the OSM-Overpass-API with every coordinate like this:

// in which Admin_level is a Coordinate
is_in(52.45606,13.30474);
foreach(
  area._["admin_level"]["wikidata"];
  out;
);

// direct with an API-Request
// http://overpass-api.de/api/interpreter?data=[out:xml][timeout:20];is_in(36.860198,73.80928);foreach(area._["admin_level"]["wikidata"];out tags;);

You can also try this script with your own coordinates in overpass-turbo.

So I get the admin_level of this coordinate with a Wikidata-Entity. If this Wikidata-Entity has a Commons-Category (like in this request) then I put the image in this category. If not then it goes one admin-level up ( from Village --> County --> State --> Country). - I hope that the Wikipedian all over the world can use this and subcategorize later this images in here areas and add more catgeories to every images. I can only help with the first step. - You can also show in the history of User:Stefan_Kühn/Test. I use this page for the result of my script. From there I categorize all images with the Tool "cat-a-lot". Best regards. -- sk (talk) 11:56, 27 July 2018 (UTC)[reply]

That is extremely problematic. You are dumping numerous images - now lacking any category indicating they need further categorization - into wide geographic categories. You're not actually improving anything, just moving the work around. It would be much more helpful to apply full categorization to a smaller number of images than to use your script to apply useless categorization to many. Pi.1415926535 (talk) 17:22, 27 July 2018 (UTC)[reply]
@Pi.1415926535: No, I think you are wrong. This is a really good improvement. Two days ago we had around 170.000 images with geocoordinates. Nobody want search in this big heap of images for an image from his area. Often the description is not helpful, so a text search is not often useful. So If we can make a rough sorting in the adminstration area with the coordinate, this is good way to bring the images to the right wikipedians with the knowledge of the area. For example this beautiful image from San Francisco was not categorize for around 1 Year. Now with my script it has become the Category:San Francisco and you found it. And this is the same in other not so good supported areas like Africa or South America and so one. -- sk (talk) 19:19, 27 July 2018 (UTC)[reply]
I have to see it in action to make a thorough assessment but in theory this is very helpful and not problematic at all. --MB-one (talk) 09:30, 28 July 2018 (UTC)[reply]
@MB-one: Under User:Stefan_Kühn/Test you can see the next charge of run. Every image is without category and I will use this page with the tool cat-a-lot. -- sk (talk) 09:49, 28 July 2018 (UTC)[reply]
I did quite a bit of reverse geocoding a long time ago. Boundaries are always tricky (New York or New Jersey?). Services I used were Geonames Extended Find nearby toponym / reverse geocoding and OpenStreetMap Nominatim Reverse Geocoding. I used two to reduce the error margin. That was before Wikidata so now you should be able to cross reference it with Wikidata to get the right category. Good luck. Multichill (talk) 20:16, 28 July 2018 (UTC)[reply]

Mostly complete[edit]

Sorry I didn't participate in the past couple weeks but Wikimania in Cape Town and other matters kept me busy. I am very pleased with the good work done in my absence. Some 90% of the half million files that were in this category a year ago are gone now, presumably to nearly appropriate categories. Of course, such an automated tool can only provide rough categories; thousands of pictures must have landed on the wrong side of a river or mountain or in some other way slightly out of their correct cat. And obviously most coordinates are of the camera, whilst most categories are of objects. These cases are for editors to handle manually with our local knowledge. The net result of the automated process is that hundreds of thousands of pictures which previously were effectively lost, are now in categories that make them findable with a modest effort using Geogroup and other methods. I see that the Manhattan cat, for example is uncomfortably overcrowded, as it often is, but not so bad that I and other locals will have great difficulties in shrinking it. Jim.henderson (talk) 12:43, 1 August 2018 (UTC)[reply]

@Jim.henderson: Thanks. When I started at 2018-07-16 there are 157.690 images in this usefull search. Today (2018-08-01) only 70.398 are left. Also today I rewrote my script for better compact results. So I have not so much work with the categorizing of the images. - In the past I try to find a better way to the deeper categories, but it was too difficult in all regions worldwide. I get the best results by using this admin_level in OSM. For example in some areas I get cityparts (for examples: Category:Toshima, Tokyo (ward) or Category:Berlin-Mitte). In other parts of the earth the results are more rough categories (like governorates Category:Baghdad Governorate). Mostly there is no deeper area in OSM or no deeper wikidata-infos with a commons category. - An automated process down to the object-categories of the image is not in my possibilities. :-) I think the algorithm is good to be done by a bot in the future. -- sk (talk) 18:02, 1 August 2018 (UTC)[reply]

What I fear is that the destination categories may become so large, that Template Geogroup becomes ineffective. Thus, I don't see a problem with the failure to subcategorize the Sanjak of Baghdad, because that great metropolitan area has fewer photos than Category:Brooklyn. Also not much worry for Tokyo Prefecture, where the 23 Wards and various Ku and Cho divisions give finer grain than New York's coarse-grained five Boroughs. However, Geogroup still fails to work here, in this "Needing categories" category, so I hope something further can be done. As usual, my lack of focus keeps me busy at other things, so I have only looked at a few in Category:Manhattan where Geogroup works easily. Some are there because of the usual problem of uploaders being unfamiliar with categories, but the majority were put there by you, thank you, @Stefan Kühn: . Our local knowledge will move them, eventually. Jim.henderson (talk) 15:40, 4 August 2018 (UTC)[reply]

Half a year later[edit]

Category has tens of thousands again. Time to clear it down to subcats again? Perhaps it can be done monthly, automatically. Or daily. -- Jim.henderson

@Rudolphous and Jim.henderson: Hello Rudolphous, can you start your bot again? There are now over 45.000 images here. Is it possible to run your bot every week or month? Or is there too many manual interactions? If we don't have an automatic bot we can never reduce this category to zero by hand. -- sk (talk) 14:09, 21 May 2019 (UTC)[reply]

Live-map cluster solution[edit]

Thank you all. The live-map as now presented on the category page is completely different from what I was proposing, and very much better than any of my ideas. It has the minor disadvantage that the map only appears after a long fraction of a minute on my slow DSL connection. Also, it always starts with a worldwide view centered at Null Island, rather than remember what part of the world it was seeing last time I used it. These are very minor annoyances for a highly useful tool; thanks. Jim.henderson (talk) 13:44, 3 August 2019 (UTC)[reply]

@Jim.henderson and DB111: When I saw the tool WikiMap (german help page), I was happy how fast it work. If you have a small image-category like the Valley Zschonergrund it work very fast. If you don't use the parameter noimage=true, then you get also preview-images. The map process will take more time. I never expected, that this tool can handle bigger categories. But I try 30.000 postcards and it work. So I try this 40.000 "Media with geo-coordinates needing categories" and it work. I like the fact, that every time you reload this map, it refresh the scan over all images in this category. WikiMap can also show for example your uploads and some more things (We need an english help page). If you show a category or your upload, there is no "Null island". It show the best rectangle that all images fit in. - If in this category here the number of images is reduced, then it will very fast. At the moment you can load this map once and work with this and reload the map only after many work to see the result. I like the fast working with this map. I choose a region and clean this region. For example East-Germany. In Berlin we have over 1000 images. I will use the nearcoord 5km in Berlin and HotCat to push all this images in the Category:Berlin. This is very fast. -- sk (talk) 08:03, 4 August 2019 (UTC)[reply]
I added a "region" switch: https://tools.wmflabs.org/wikimap/?cat=Media_with_geo-coordinates_needing_categories&noimage=true&region=50%7C-130%7C25%7C-60 e.g. to see only the US. By the way it may speed up things a very little. Cheers --DB111 (talk) 14:23, 4 August 2019 (UTC)[reply]
@DB111: Very cool! Thanks! -- sk (talk) 16:44, 4 August 2019 (UTC)[reply]
A coder who is looking for something to do could make a box where we click to open a hierarchical list of regions to search. This would help in countries that have many thousands of uncategorized pictures and also have subdivisions. Always one can think of a refinement that makes someone do hours of work and saves time for many other editors. Jim.henderson (talk) 22:15, 7 August 2019 (UTC)[reply]