Commons:Bots/Requests/Usage Bot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Usage Bot (talk · contribs)

Operator: Bjh21 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: To create and maintain galleries on Commons that record the use of files by other projects that are not supported by Special:GlobalUsage.

Automatic or manually assisted: Automatic, unsupervised

Edit type (e.g. Continuous, daily, one time run): Batch runs, daily or less frequently

Maximum edit rate (e.g. edits per minute): 12 edits per minute

Bot flag requested: (Y/N): Y

Programming language(s): Python

This bot is prompted by a suggestion at Commons:Village pump/Proposals#File usage on openstreetmap.org, The following page uses this file to add an indication on Commons that a file is in use on OpenStreetMap. It will do this by querying OpenStreetMap for a list of links to files on Commons and then maintaining user galleries that link to all of these files. These user galleries will then appear under "File usage" on the relevant file pages, which will in particular make these external uses visible for deletion requests. Initially the bot will only target uses of files on OSM, but it could easily be extended to other external targets. Likely early ones are wikis using InstantCommons (such as the wikitech: wiki) and MusicBrainz. --bjh21 (talk) 10:27, 19 June 2022 (UTC)[reply]

Discussion

I think such a bot just for a user gallery would not need any approval. But I would suggest to not do this in a user gallery. This project should be in the Commons namespace. --GPSLeo (talk) 10:22, 20 June 2022 (UTC)[reply]

I'm quite happy to target the Commons namespace instead. @GPSLeo: Do you have any opinion on the page naming? My initial instinct is to have Commons:Files used on OSM/1, Commons:Files used on wikitech/1, etc. And then an overall Commons:Files used on other projects to tie them all together and provide a central talk page for the system. --bjh21 (talk) 11:21, 20 June 2022 (UTC)[reply]
I've replaced "user galleries" with "galleries" in the proposed task to allow for the possibility of maintaining galleries in project space instead of user space. --bjh21 (talk) 09:20, 21 June 2022 (UTC)[reply]

I think it makes much more sense to have tool similar to Commons:Glamorous . --EugeneZelenko (talk) 13:57, 20 June 2022 (UTC)[reply]

@EugeneZelenko: Can you explain why? Molgreen's idea was that the usage on OSM should be visible in the same place as usage on Wikimedia projects, namely on the file page. The point of this was to make it easy to see the external usage of a file when it's proposed for deletion. Requiring use of an external tool would be less convenient since it would require extra clicks to get to the list of uses. --bjh21 (talk) 20:14, 20 June 2022 (UTC)[reply]
From my point of view, the proposed solution of bjh21 is optimal. The use of the images is very recognizable without unnecessary email on watched pages. A big thank you to Bjh21. --Molgreen (talk) 04:24, 21 June 2022 (UTC)[reply]
I don't think that maintaining potentially huge page is reasonable. External tool is much better idea. --EugeneZelenko (talk) 13:43, 21 June 2022 (UTC)[reply]
@EugeneZelenko: The individual galleries need not be huge. With my current configuration, OSM would have 40 galleries of about 200 kB each. If you're worried about the churn caused by editing, I plan to have the bot avoid moving files between galleries, and to try to batch additions with other edits, so for projects with stable image usage the number of edits should be small. --bjh21 (talk) 18:48, 23 June 2022 (UTC)[reply]
@EugeneZelenko: I've now finally completed the changes mentioned above that will reduce database churn by minimising the number of galleries edited by the bot. The number of galleries edited should now be at most the number of files that have gone out of use plus 1/1000 of the increase in the number of used files. --bjh21 (talk) 17:29, 23 July 2022 (UTC)[reply]

Is there any example file page where we can see how this would look like? --Krd 06:18, 21 June 2022 (UTC)[reply]

I've updated User:Usage Bot/Used on OSM/1 to contain what the bot would currently put there, showing the first thousand images used on OSM. In operation, there would be 39 more galleries containing other files used on OSM. I wrote an InstantCommons backend yesterday as well, and User:Usage Bot/Used on wikitech/1 shows the first thousand Commons files used on wikitech. --bjh21 (talk) 09:17, 21 June 2022 (UTC)[reply]
I think it needs broader discussion what is the best approach. It could or could not be an idea to put this on the file pages themselves, or on the file talk pages, instead of galleries. Perhaps even structured data could or could not be an alternative. --Krd 19:21, 23 June 2022 (UTC)[reply]
@Krd: I think all of those would be bad because they would require edits that would show up on the watchlists of everyone watching the corresponding files. I've observed that people are quite averse to bot edits that they see as unimportant (such as SDC edits) ending up on their watchlists. We've had some discussion here and on COM:VPP about how to handle this. I suppose I could put a note on COM:VP pointing at one or other discussion. --bjh21 (talk) 21:25, 23 June 2022 (UTC)[reply]
That's how I feel too: two certainly very useful bots cause an email every time a change is made. Over time, I have received a good 5,000 e-mails for just under 2,800 media files. I have specially set up a mailbox rule to separate these mails. Do not misunderstand: the bots have their justification, but I could do without the mails. But because of that I don't want to switch off the observation either. - That's why I like this solution so much.--Molgreen (talk) 12:26, 24 June 2022 (UTC)[reply]
@Krd: I've posted what I hope is a suitably neutral notice on COM:VP. --bjh21 (talk) 19:57, 28 June 2022 (UTC)[reply]
Thanks for this - it is a great idea and especially galleries (instead of editing file pages) is a very nice innovation! I support this bot and propose expand it to support use of Wikimedia Commons files on OSM Wiki and https://wiki.openstreetmap.org/wiki/Key:wiki:symbol (note that OSM Wiki would need to be also checked whether file is shadowed) Mateusz Konieczny (talk) 12:27, 26 June 2022 (UTC)[reply]
I have code of bot doing something similar - see https://github.com/matkoniecz/mediawiki_file_copyright_handler_bot/blob/master/edit_note_files_used_in_osm_database.py and https://pypi.org/project/taginfo/ (code is trivial but taginfo was not mentioned so far here). Note that also https://wiki.openstreetmap.org/wiki/Key:wiki:symbol can use files from Wikimedia Commons. And finally, OSM Wiki is also using files from Wikimedia Commons. Exposing that would be also great Mateusz Konieczny (talk) 12:22, 26 June 2022 (UTC)[reply]

Files that are used in OSM may be replaced by other files. Does that mean they will also be removed from the gallery or not? Also there is the published-template (used on file_talk pages). How will this project interact with that? Wordpress has a plugin to make standardized attributions (using curid) for images imported from commons. Will this be covered? --C.Suthorn (talk) 04:02, 29 June 2022 (UTC)[reply]

@C.Suthorn: Yes, when a file stops being used on OSM (or whatever other sites the bot monitors), it would be removed from its gallery at the next bot run. The aim is for the galleries to track what's actually in use reasonably closely. There wouldn't be any direct interaction with {{Published}}. {{Published}} covers all off-site uses and doesn't indicate whether a use of a work depends on the file's remaining on Commons. Usage Bot, on the other hand, is specifically to track uses that would be affected by deletion from Commons. It might be possible to extend {{Published}} to cover recording whether a use is dependent on Commons, but I think the number of edits required would make it inappropriate for a bot to do this. There are currently about 21,000 files in Category:Commons as a media source, but there are 40,000 referenced by the wikimedia_commons tag on OSM. Which external sites are covered is a matter for community consensus if the galleries are in the Commons namespace. The main technical requirement is for an API that the bot can query to reasonably quickly find out which Commons files are in use on a project. MediaWiki and taginfo provide that; I don't know if the Wordpress plugin you mention does. --bjh21 (talk) 10:57, 29 June 2022 (UTC)[reply]
Hello bjh21, is there a new stand here. What could be the next step? many greetings --Molgreen (talk) 18:16, 18 July 2022 (UTC)[reply]

As Molgreen mentions, things have gone quiet here. I think I've answered all the questions that have been asked, and my attempt at bringing more comment from elsewhere was slightly successful. I presume the bureaucrats are hoping for something else, but I'm not sure what it is. Would it help if I reduced the scope? A good minimal scope might be to cover files used on Wikitech. Wikitech is a Wikimedia project but for technical reasons it uses InstantCommons. This means that while COM:INUSE theoretically applies to it, in practice its uses of files are invisible on Commons because it doesn't appear in Special:GlobalUsage. There are currently 1292 files from Commons in use on Wikitech, so the bot would edit at most two pages per run. Given how slowly Wikitech's file usage changes, I think weekly bot runs would be reasonable. Would that be acceptable? --bjh21 (talk) 18:01, 23 July 2022 (UTC)[reply]

I agree that most has been said, and if no additional arguments arise, I think it can and should be approved. If any better idea arises later, I think the procedure can be changed if required. --Krd 07:23, 24 July 2022 (UTC)[reply]
Maybe making edits is OK given clear approval among people who commented? Or would it be against Commons rules? Mateusz Konieczny (talk) 05:05, 27 July 2022 (UTC)[reply]

Approved. --Krd 14:06, 27 July 2022 (UTC)[reply]