Commons:Bots/Requests/YiFeiBot (25)

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: Zhuyifei1999 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: (See COM:BWR request) Similar to Commons:Bots/Requests/YiFeiBot (13), it use quarry:query/2556 to find all files that do not transclude {{Infobox template tag}}, {{Information}}, {{Biohist}}, or {{BANQ media}}, not on any of Category:Media_missing_infobox_template, Category:Artworks_missing_infobox_template, Category:Items_with_OTRS_permission_missing_infobox_template, Category:Pages_using_Information_template_with_parsing_errors. For these files, it checks whether \{\{information|\|\s*source\s*=) is in the wikitext case-insensitively. If it is in, the bot prepends Category:Pages using Information template with parsing errors, otherwise Category:Media missing infobox template. (Prepending to bypass possible syntax error on wikitext parsing)

Automatic or manually assisted: Automatic unsupervised

Edit type (e.g. Continuous, daily, one time run): Daily

Maximum edit rate (e.g. edits per minute): 6 edits per min

Bot flag requested: (Y/N): N

Programming language(s): Python: pywikibot

Zhuyifei1999 (talk) 10:58, 15 December 2015 (UTC)[reply]

Discussion

  • Due to lack of time testing, a visible test run will be (hopefully) done tomorrow. --Zhuyifei1999 (talk) 11:01, 15 December 2015 (UTC)[reply]
  •  Support Zhuyifei1999 thanks for taking this on. A little comment about a query: transclusion of {{Infobox template tag}} should have been sufficient since {{Infobox template tag}} is embedded in {{Information}} few months ago. However it might be a while until all the DB tables are updated. {{Biohist}} and {{BANQ media}} will need to be rewritten to base on one of the standard infoboxes so the need for excluding those two is also temporary. Also it might be better to rewrite the query to automatically exclude files in all subcategories of Category:Media_missing_infobox_template without listing them by hand. Hopefully that will not slow down the query too much. --Jarekt (talk) 13:34, 15 December 2015 (UTC)[reply]
    In that case, I changed the query to
    SELECT page_title
    FROM page
    WHERE page_namespace = 6      -- files only
    AND page_is_redirect = 0      -- no redirects
    AND page_title NOT LIKE '%/%' -- skip rare image subpages 
    AND NOT EXISTS ( 
      SELECT * 
      FROM templatelinks 
      WHERE page_id = tl_from 
      AND tl_title in ('Infobox_template_tag', 'Biohist', 'BANQ_media') 
      AND tl_namespace = 10 
    )
    AND NOT page_id IN ( 
      SELECT cl_from
      FROM categorylinks 
      WHERE cl_to IN (
        SELECT "Media_missing_infobox_template"
        UNION
        SELECT innerpage.page_title
        FROM categorylinks innercl
        INNER JOIN page innerpage
        ON innerpage.page_id = innercl.cl_from
        WHERE innerpage.page_namespace = 14
        /*AND innercl.cl_type = "subcat"*/
        AND innercl.cl_to = "Media_missing_infobox_template"
      )
    )
    ;
    
    Right now I'm running a null-editing bot to go over them to ensure all pages transcluding {{Information}} are updated. --Zhuyifei1999 (talk) 09:50, 16 December 2015 (UTC)[reply]
Now on files starting with "Bo" after > 7500 null edits. I just lowered the throttle to speed this up. --Zhuyifei1999 (talk) 11:47, 17 December 2015 (UTC)[reply]
To be honest, I don't understand anything of this request, but it seems you know what you are doing, and I suggest to approve this if no objections arise. --Krd 14:38, 17 December 2015 (UTC)[reply]
 Comment The task is to keep Category:Media_missing_infobox_template up to date adding new files as people upload them. Also detect special cases with parsing errors and add them to Category:Media_missing_infobox_template' subcategory Category:Pages using Information template with parsing errors, so we can quickly detect and fix bad edits that break infoboxes. --Jarekt (talk) 15:54, 17 December 2015 (UTC)[reply]
quarry:query/6359 works well for me.--Jarekt (talk) 04:23, 18 December 2015 (UTC)[reply]
Now on pages starting with "USS San Diego", after 50000+ null edits. @Krd: The real test run for this task haven't started, right now it's mass null editing false positives for the actual run --Zhuyifei1999 (talk) 10:11, 18 December 2015 (UTC)[reply]
Zhuyifei1999, I would not worry about "null editing false positives". I would use quarry:6359 and in a half a year you can change SQL to remove check for {[tl|Information}} from the query. By the way may be you can also do http://tools.wmflabs.org/catscan3/catscan2.php?language=commons&project=wikimedia&depth=1&categories=Media_missing_infobox_template&ns%5B6%5D=1&templates_any=Information%0D%0AInfobox+template+tag&ext_image_data=0&file_usage_data=0 this query] to find files that can be removed from the category once someone adds infobox. --Jarekt (talk) 12:57, 18 December 2015 (UTC)[reply]
Ok the mass null edit was done, and I switched to your query (it should have little difference anyways). Ran a test run at [1]. Oh I'm pretty sure in only half a year, few would be reparsed without a mass null-edit :P --Zhuyifei1999 (talk) 14:44, 18 December 2015 (UTC)[reply]
I think will be good idea to clarify type of errors with category or edit summary. For example, it is not obvious in File:Strýčkovice, kaplička II.jpg. --EugeneZelenko (talk) 15:23, 18 December 2015 (UTC)[reply]
File:Strýčkovice, kaplička II.jpg was a false positive as it had perfectly file {{Information}} template. May be problem was due to lag with database updates for new uploads? The bot should only be catching files without visible {{Information}} template (or other infobox). --Jarekt (talk) 16:45, 18 December 2015 (UTC)[reply]

Approved. --Krd 17:34, 9 February 2016 (UTC)[reply]