User:Multichill/Categories

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Every picture should be in a category. Unfortunatly this is not the case for a lot of images. This is a process to increase the number of categorized pictures.

Steps[edit]

Tag pictures which dont have categories[edit]

First we need to know which images actually need categories. This is done by a bot. The bot tags images which dont contain categories and which dont contain templates which add relevant categories (like with books). The bot adds a date so that Category:Media needing categories gets divided in workable amounts of pictures. The bot checks yesterdays uploads once a day and tags all files that dont have categories. Two bots are also checking all images on commons.

Add category suggestions[edit]

A bot goes over the uncategorized images and for some images replace the uncategorized template with categories. When a file is used in a gallery and/or used at a wikipedia, one or more categories will be found. For all the subcategories of Category:Media needing categories to be checked by a bot the bot will do these steps:

  1. For each image: If an image is in a gallery and this gallery is in a category with the same name, replace {{Uncategorized}} with this cat.
  2. For each image: If an image is in a gallery and this gallery has a category with the exact same name and both are in the same parent category, replace {{Uncategorized}} with the category with the identical name.
  3. For each image: If an image is in a gallery, add the categories of the gallery to the image. The gallery name and the category name shouldnt be the same (should be caught by 1 & 2). If more than 3 categories were found, add {{Check categories}}
  4. Let CommonSense run over the category to figure out categories for the remaining images. This will add {{Check categories}} to every image which is categorized.
  5. Tag the category as checked. This will place the category in Category:Media needing categories requiring human attention.

CommonSense filters[edit]

  • Blacklist filter: Filter out all categories which are in the blacklist
  • Disambiguation filter: Filter out all disambiguation categories.
  • Category redirect filter: Replace redirect categories with their target.
  • Country filter: See User_talk:Multichill#Don't put files into Category:People by country.
  • Overcategorization filter: Filter out the parent categories. Implemented and in use (link)
  • .... (insert your suggestions here)

Check and clean up[edit]

After the bot is done some files need to be checked by a human.

  1. Category:Media needing categories requiring human attention - these files couldnt be categorized by a bot. A human has to find categories for these files and remove {{Uncategorized}}.
  2. Category:Media needing category review - these files were automaticly categorized by a bot, but a human should check the result and remove {{Check categories}}.

To do[edit]

Tag pictures which dont have categories[edit]

  • Bot is running stable now for more than a month. More bots running imageuncat.py would be good.
  • Expand the list of templates to ignore (when a template is not in the list it's assumed it adds a relevant category)

Add category suggestions[edit]

Templates / categories[edit]

  • Still have to write some documentation.

Categorization[edit]

Commonshelper[edit]

  • Add blacklist (for categories like Category:Hidden categories), disabled for the moment
  • Filter out duplicates - ✓ Done
  • Filter out supercats - Up to 8 steps of supercats are filtered out by this tool
  • On many places, you managed to find the country but this is accompanied by several "xxx by country" categories. This should be not too hard to improve (easy to say). The same goes for xxx by city, but is less important.
  • When inserting a xxx by country cat, it might be useful to add a comment with the country name behind it
  • When you manage to find the birth year, then I guess you found a basic article, so it should be interesting to insert that as an interwiki (that takes 30 % of my categorisation time). A wrong or double interwiki's are better than nothing.
  • As above, when you manage to find the birth year, it might be feasible to reuse the defaultsort
  • I guess that you don't include redirected pages. It might be useful to flag root categories (categories for diffusion) and disambiguation pages (and redirect pages if unavoidable) with a comment indicating so
    • I still have to implement a link correction feature in commonscat.py, this will probably solve most of this.
  • I fail to see the need for adding category:Categories for discussion
    • Blacklist
  • I understand that you inherit some information from commonsense, which limits your degree of freedom, so should the tools not be discussed as a couple/tandem ?

Check and clean up[edit]

  • Have some people monitoring these categories
  • Create some javascript addons to make the process quicker