Commons:Bots/Requests/LicenseReviewerBot 2

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

LicenseReviewerBot (talk · contribs)

Operator: Bd9a119b5d05019d7c923207398ef3c3 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Automatically review audio files & image Files.

  • Also patrol the Recent Changes for new files(video,audio and image) uploads by new users and automatically review those.

Automatic or manually assisted: Automatic and unsupervised

Edit type (e.g. Continuous, daily, one time run): Continuous

Maximum edit rate (e.g. edits per minute): less than 5 per minute at peaks

Bot flag requested: (Y/N): Already Flagged

Programming language(s): Python

Bd9a119b5d05019d7c923207398ef3c3 (talk) 06:11, 23 October 2021 (UTC)[reply]


There are not enough license reviewers on commons and also there are no incentives for wasting time reviewing random files. The number of unreviewed images doubled from 17,474 in October 2020[1] to 34,551 in October 2021[2]. Oh wait, was it because of the pandemic? No, the number of unreviewed images quadrupled between October 2019 and October 2020 and has been increasing for more than 5 years. Same for the audio files. But last year I started LicenseReviewerBot(and before that now deprecated YouTubeReviewBot in 2019) for videos and the number of unreviewed videos decreased from 9,651(in April 219) to 1,049(October 2021). The YouTubeReviewBot didn't checked the video but only the license and had some false positives but the more advanced LicenseReviewerBot checks the full video and the license and has zero false positives reported so far.

Technical stuff
  • The bot will use acoustic fingerprinting, video hashing and image hashing for reviewing the files.
  • Image-hashing is known to be vulnerable to GAN attacks but such attacks can be easily prevented by varying the parameters used for the dimension reduction procedure. E.g. choose variable number of bits in the bit-string(You are therefore changing the number of characteristic pixels used to generate the hash value and GAN generated images can not pass the review unless the generated image is a copy of the original image.)
  • All URLs on the file-page will be archived on the Wayback machine unless they are already 404 or they don't allow archiving by the Internet Archive bot.
  • The bot will exclude files as a feature and not as a limitation that are currently reviewed by other bots.
Discussion

Please discuss below this comment. -- Bd9a119b5d05019d7c923207398ef3c3 (talk) 06:11, 23 October 2021 (UTC)[reply]

References
  1. https://web.archive.org/web/20201003193649/https://commons.wikimedia.org/wiki/Category:License_review_needed
  2. https://commons.wikimedia.org/wiki/Category:License_review_needed
Will start extended test run on or before 20th Nov(next Weekend), please note that the bot is active for previously approved videos and are not part of the test. -- Bd9a119b5d05019d7c923207398ef3c3 (talk) 07:06, 14 November 2021 (UTC)[reply]
Please don't close the request before January, I got caught (up) in some work and will make a run before next year. Sorry -- Bd9a119b5d05019d7c923207398ef3c3 (talk) 17:49, 27 November 2021 (UTC)[reply]
  • I am starting the trial from today but can't archive new URLs because of degraded Wayback Machine services (Save API is broken and availability API is inaccurate). Will only review files with links already archived till the issue with the wayback machine APIs are resloved. -- Bd9a119b5d05019d7c923207398ef3c3 (talk) 06:56, 2 January 2022 (UTC)[reply]

As there are no objections, I call this approved. --Krd 12:00, 15 January 2022 (UTC)[reply]