Commons:Bots/Requests/iNaturalistReviewBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

INaturalistReviewBot (talk · contribs)

Operator: AntiCompositeNumber (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Automatically license review files from iNaturalist. A detailed description is available at User:iNaturalistReviewBot/Docs, but in summary, the bot checks files in Category:iNaturalist review needed against the corresponding iNaturalist images using a SHA-1 hash, then license reviews them as appropriate using {{INaturalistreview}}.

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Continuous

Maximum edit rate (e.g. edits per minute): 1 epm

Bot flag requested: (Y/N): Preferred (for API reasons), but edits will not be marked as bot

Programming language(s): python

{{INaturalistreview}} is protected by Special:AbuseFilter/70. Because iNaturalistReviewBot does not yet have LR, I am unable to make a test run. Logs from a simulated run (without wikitext changes) is available at https://tools-static.wmflabs.org/inaturalistreviewer/2020-06-25.txt. I will also make some example edits from my main account, but they won't be fully representative of an actual run. I am unsure of what the correct process is to get LR rights for the bot. AntiCompositeNumber (talk) 16:56, 25 June 2020 (UTC)[reply]

AntiCompositeNumber (talk) 17:27, 25 June 2020 (UTC)[reply]

Discussion

  • @AntiCompositeNumber: I think it is a super good idea to have a bot checking the files. So thank you for your request! I have 2 comments:
    1. When file does not match would it be possible to mention the license in the edit summary? I think the flickrreview bot does the same (size not found but the license is cc-by-sa-4.0) so we have that stored in case uploader change the license before a human check and find out it is just a crop.
    2. If license can be changed on inaturalist then there is a chance that the license was valid when uploaded. If the file was uploaded months or years ago I do not think we should add a copyvio warning on users talk page. Perhaps we should let a human do a review in case it is possible to verify the license via web archive.
Both are only relevant if it is possible to change the license. --MGA73 (talk) 18:40, 25 June 2020 (UTC)[reply]
@MGA73: For the license in the edit summary, generally no. Most links to iNaturalist are through /observations/<id> links, which can represent multiple images at multiple licenses. I could special-case it for /photos/<id> links and single-photo observations if you think it would be useful, but I don't see the benefit. As for old files, what is your suggestion to do instead? Still tag it copyvio, just don't warn? Tag it something else? (yes, iNaturalist does allow license changes.) --AntiCompositeNumber (talk) 01:44, 26 June 2020 (UTC)[reply]
@AntiCompositeNumber: okay then lets just forget first part (adding license in edit summary if size is not found).
For old files I think perhaps a "no permission" because that will give uploader a chance to check the uploads. What I worry is that a good user upload free files but because license reviewers are busy it takes 8 months to review the files and now the license is changed. The good user suddenly get 50 copyvio warnings on their talk page and risk a block. --MGA73 (talk) 07:01, 26 June 2020 (UTC)[reply]
@MGA73: What do you think the cutoff age should be? --AntiCompositeNumber (talk) 18:06, 26 June 2020 (UTC)[reply]
@AntiCompositeNumber: A week, a month. You choose :-) --MGA73 (talk) 18:08, 26 June 2020 (UTC)[reply]
  • @AntiCompositeNumber: I've added the bot to the image-reviewer group for one week (the closing 'crat should make the rights permanent when they add the bot flag; you can ping any reviewer or admin if they forget). Please review the above suggestions before making any edits. Best, --Mdaniels5757 (talk) 19:43, 25 June 2020 (UTC)[reply]
  • I went with 180 days (6 months) as the age for old files. A 30-edit test run is complete, some highlights:
A few notes about future expansion:
  • @Josve05a: asked me off-wiki if the bot could also add {{INaturalist}} when it reviews files. The answer is yes, and I may add that to the bot later without a new request.
  • The bot currently only uses SHA1 sums to compare images. I tried to use SSIM scores as well for a fuzzier comparison, but the pyssim library caused issues. If I can find a better implementation for that or another reliable fuzzy image comparison method, I may implement it without a new request.
  • With fuzzy comparison, I may be able to identify images that are only downscaled from the original and upload the original size. I would open a new request for that.
  • I could also copy taxon information to Structured Data depicts statements. I would also open a new request for that.
AntiCompositeNumber (talk) 21:47, 27 June 2020 (UTC)[reply]

Approved. --Krd 06:49, 21 July 2020 (UTC)[reply]