User:iNaturalistReviewBot/Docs

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

iNaturalistReviewBot checks Commons files that are sourced to iNaturalist.org to determine if they are properly licensed. It is designed to fail early and loudly in situations where automatic processing is difficult. It is written in python 3.7 using pywikibot and mwparsefromhell and runs from Toolforge.

Configuration

[edit]

The bot can be stopped by changing the runpage. The bot will not operate until the runpage is set back to True and the bot is manually restarted.

Most configurable values are set at User:INaturalistReviewBot/config.json, and the text of the license review template can be changed and translated as normal at {{INaturalistreview}}.

Problems

[edit]

Bugs and feature requests are tracked using GitHub. You can also report issues on User talk:iNaturalistReviewBot or User talk:AntiCompositeNumber.

  • If the way the bot is editing a particular page is problematic, manually review the file or add {{Nobots}} to the page and report the problem.
  • If the bot is editing multiple pages problematically, stop it using the runpage (preferred) or block the bot with autoblock disabled and report the problem.
  • If there are files in Category:iNaturalist review needed but the bot is not editing, please report the problem.

Review flow

[edit]

This flow is also available as a flowchart and as text on GitHub.

  1. A file in Category:iNaturalist review needed that transcludes {{INaturalistreview}} with no parameters is retrieved.
  2. The list of external links from the file page is checked for links to inaturalist.org. The first link to https://www.inaturalist.org/observations/<id> or https://www.inaturalist.org/photos/<id> that is found is assumed to be the source.
    • Links to other pages on iNaturalist are ignored.
    • The |source= parameter of the information template is not specifically checked. This allows more flexibility to deal with different templates and methods of specifying a source.
    • Due to an API limitation, /photos/<id> pages are loaded and parsed for an observation URL.
    • Links to other iNaturalist domains are also included.
  3. An API request is sent to iNaturalist for data about the observation. The API response includes metadata about the observation as well as basic information about the photos contained in that observation.
    • The API is queried based on the observation ID number parsed out of the source URL. If the API returns results for multiple observations, the bot will request human review for the file.
  4. Each photo in the observation is downloaded and checked against the SHA-1 hash of the Commons file to determine which photo matches the Commons file.
    • If the SHA-1 hash check fails, the bot will use a perceptual hashing algorithm to fuzzily compare the photos.
    • If the file on Commons does not match any of the photos on iNaturalist, the bot will request human review for the file. This will happen when the file on Commons has been edited, is not the original size, or has the wrong source URL.
    • If the observation includes duplicate photos, the first photo that matches the hash check will be used. iNaturalist should prevent duplicate photos from being added to an observation.
    • If a /photos/<id> link was found on the Commons page, only that photo will be checked.
  5. The license of the matched photo is pulled from the API response. The observation API response does not include the version number of Creative Commons licenses. Version 4.0 is assumed because that is the only version currently available on iNaturalist.
  6. The observation author is pulled from the API response. Because the observation API response does not include author information for specific photos inside an observation, the bot assumes the author of the photo is the same as the author of the observation.
    • This is a safe assumption as other iNaturalist users generally can not add photos to someone else's observation.
  7. The license of the Commons file is determined by looking for templates on the file page that are members of Category:Primary license tags (flat list).
    • This method will pick up license templates anywhere on the page, including inside {{Self}}.
  8. The Commons and iNaturalist licenses are compared.
    1. If the iNaturalist license is non-free, the file will fail license review.
    2. If the licenses are the same, the file will pass license review.
    3. If the Commons license does not match the free iNaturalist license, the file will pass license review with a changed license.
    4. If there is no license on the Commons page, the iNaturalist license will be added and the file will pass license review with a changed license.
  9. The Commons file page is updated with the results of the review.
    • {{INaturalistreview}} will be filled in wherever it was originally placed.
    • If the license must be changed, the new license will be placed immediately above {{INaturalistreview}}. The old license should be removed from the page, but may not be if it is inside another template.
    • If license review failed, {{Copyvio}} will be added to the top of the page and the uploader will be notified.
    • The parameters are used as described in the template documentation.

Copyright © 2019 AntiCompositeNumber

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Apache License, Version 2.0Apache License 2.0http://www.apache.org/licenses/LICENSE-2.0truetrue