Commons:Bots/Requests/KasparBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

KasparBot (talk · contribs)

Operator: T.seppelt (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Moving {{Authority control}} information to Wikidata (en:Wikipedia:Bots/Requests for approval/KasparBot).

See related discussions: Template talk:Authority control, d:Wikidata:Bot requests#Import Identifiers from Commons Authority Control templates

Automatic or manually assisted: automatic

Edit type (e.g. Continuous, daily, one time run): big one time run + smaller daily runs for handling newly added data

Maximum edit rate (e.g. edits per minute): 15

Bot flag requested: (Y/N): Y

Programming language(s): Java, basic source code

T.seppelt (talk) 08:38, 16 May 2016 (UTC)[reply]

Discussion

@Jarekt: -- T.seppelt (talk) 08:38, 16 May 2016 (UTC)[reply]

Please do a few test edits. --Krd 08:41, 16 May 2016 (UTC)[reply]
✓ Done have a look at the contributions. I didn't notice any malfunction. -- T.seppelt (talk) 15:27, 16 May 2016 (UTC)[reply]
@~riley and Krd: You should have given this bot first autopatrol before asking for a test run... This makes me mass patrol this bot just to remove it from the patrol backlog... Poké95 00:17, 17 May 2016 (UTC)[reply]
And mass patrol done... Good that this bot just made about 50-100 edits... Poké95 00:22, 17 May 2016 (UTC)[reply]
@Pokéfan95: This bot performed a short test run in accordance with the instructions at Commons:Bots/Requests, this specific test run was 47 edits (not 50-100 as you state above) in total. If you wish for the BRFA process be changed, open discussion at Commons talk:Bots/Requests. Considering you have manually mass patrolled 173,007 edits already, I fail to see why you are complaining about another 47 edits. The whole point of not autopatrolling these changes is because the contributions are intended to be reviewed individually. If you do not wish to patrol edits like this in the future, you are more than welcome to request here that myself or someone else instead do it. Or, wait until a task is approved at which time I always patrol the trial edits. :-) ~riley (talk) 00:28, 17 May 2016 (UTC)[reply]
Sometimes bot left BNF ID (Creator:Auguste Delacroix). Looks like this happens because of difference in format (cb prefix). --EugeneZelenko (talk) 13:57, 17 May 2016 (UTC)[reply]
@EugeneZelenko: Yes, some BnF IDs couldn't be removed to due format violations. The bot possesses a error reporting system which is publicly available at Tool labs. After the first clean-up remaining errors will be published there in order to be accessible for manual review. -- T.seppelt (talk) 18:57, 17 May 2016 (UTC)[reply]
I think will be good idea to run bot without changing templates but to create list of violations. Then typical violations (prefix/suffix) could be fixed by this bot. --EugeneZelenko (talk) 13:59, 18 May 2016 (UTC)[reply]
I fixed the code for the BnF parameter. I'd still prefer to run the script with editing pages. I performed 100,000+ edits with this script. It works pretty well. @Jarekt: Another question: What is your plan now for handling Authority control in {{Creator}}, e.g. [1]? -- T.seppelt (talk) 09:27, 22 May 2016 (UTC)[reply]
Looks OK for me, but will be good idea to fix added space. --EugeneZelenko (talk) 14:06, 22 May 2016 (UTC)[reply]
My bot doesn't add space anymore. I can do some more test edits if you want. – T.seppelt (talk) 14:58, 22 May 2016 (UTC)[reply]
Space has indeed been fixed already, could you do one final short test now that you've fixed the code for the BnF parameter? ~riley (talk) 18:54, 22 May 2016 (UTC)[reply]
✓ Done please have a look at the edits. -- T.seppelt (talk) 07:27, 23 May 2016 (UTC)[reply]
T.seppelt Sorry about missing your ping from May 16, but somehow it is not showing up on my alert list. About, handling of {{Authority control}} by {{Creator}} and {{Institution}} templates. I altered both {{Creator}} and {{Institution}} as that if "authority" field is not provided but "Wikidata" field is provided than {{Authority control}} is added automatically. So the preferred way would be to just remove "authority" field from those 2 templates once all the fields are synchronized with wikidata. T.seppelt to me this task breaks down to 2 semi distinct tasks:
  1. matching pages with Wikidata: we have 44k categories that have {{Authority control}} but there is no indication of what Wikidata Q-code they go with. They will have the linked with a Q-code by following old-style interwiki links, or by using WikidataQuery to matching category name with Wikidata's Commons category (P373) or VIAF with Wikidata's VIAF ID (P214). We also have ~460 Creator and Institution templates using {{Authority control}} but no Q-codes. They also will have to be matched with wikidata or moved to Wikidata.
  2. task 2 is dealing with mismatches and data missing on Wikidata. See subcategories of Category:Authority control maintenance. Maybe this should be the priority for your bot since you probably have more experience with this than I do and your bot is probably approved on Wikidata.
Your current test edits were good, but I could not find any that involved cases where edits were needed to Wikidata. Is there a easy way to find those? --Jarekt (talk) 13:13, 23 May 2016 (UTC)[reply]
Am I missing something? I see the test edits corresponding with edits on wikidata? For example; Commons diff Wikidata diff. ~riley (talk) 20:24, 23 May 2016 (UTC)[reply]
Yes, as proposed I'd like to deal with the second task. Maybe we should focus on this in order to close the request. Later on I can also help with the other task. One question would be how good an automatic matching would be. Should we involve manual work in this process? –T.seppelt (talk) 10:39, 24 May 2016 (UTC)[reply]
T.seppelt, so your bot assumes that something will add "Wikidata" parameter to the Authority Control template. That is a great start and I agree that is what the focus should be close the request. Once Authority Control template has "Wikidata" parameter than Module:Authority control places it in either Category:Pages using authority control with parameters matching Wikidata or to one or more " Pages with mismatching..." or "Wikidata with missing ..." sub-categories of Category:Pages using authority control with parameters. My User:JarektBot routinely monitors the "matching" category and removes redundant identifiers, but I do not touch pages in "mismatching" and "missing" categories.
  • Your bot demonstrated ability to deal with "missing" identifiers, where Commons has identifier which is missing on Wikidata. User:~riley Thanks for digging out the examples. I think you are ready to go on this task after closing of this request. Alternatively you can concentrate on Wikidata side of the task and once identifiers are moved to Wikidata the Commons pages will automatically move to Category:Pages using authority control with parameters matching Wikidata where I can pick them up.
  • I am not sure how your bot deals with mismatches, but I was working on better Lua processing of those. I already verify the identifier formats based on regexp, but hope to do better job with matching to multiple records on Wikidata. I also hope to do better job with dealing with multiple VIAF identifiers specified on Commons. That might reduce the number of pages to deal with. Another type of mismatches that I can not handle in Lua are the cases where identifier on commons is provided but if you follow the link than you land on either "no such page" or a redirect to another page. In the first case the bot can remove bad identifier and in the second case update its value, which might match afterwards.
--Jarekt (talk) 12:46, 24 May 2016 (UTC)[reply]

Looks good, concerns addressed, no issues in most recent trial. If there are no objections, I think task should be approved. ~riley (talk) 07:31, 23 May 2016 (UTC)[reply]

 Agree --Jarekt (talk) 18:15, 24 May 2016 (UTC)[reply]
Approved. --Krd 18:15, 24 May 2016 (UTC)[reply]