Commons:Bots/Requests/NikkiBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

NikkiBot (talk · contribs)

Operator: Nikki (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Add structured data (such as audio transcription (P9533) and language of work or name (P407)) to audio files (such as those in Category:Lingua Libre pronunciation and Category:Audio files made by jeuwre).

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): One-time run for existing files followed by daily

Maximum edit rate (e.g. edits per minute): Maximum 6 edits per minute

Bot flag requested: (Y/N): Y

Programming language(s): Node.js

- Nikki (talk) 15:15, 9 January 2022 (UTC)[reply]

Discussion
  • I got ahead of myself and made a handful of example edits before reading that you're only supposed to do test runs if asked to, sorry. :( Anyway, Special:Diff/620440416 is an example for English, Special:Diff/620440342 is a non-Latin script example. As you can see, it can handle non-ASCII characters, the statements being added are grouped into a single edit, the properties used are listed in the summary (so that people can find edits for a particular statement more easily in the edit history), and the properties are linked such that the name of the property is shown using the user's preferred language (to make it more understandable for people who don't speak English). - Nikki (talk) 16:29, 9 January 2022 (UTC)[reply]
  • Looks OK for me, but I think will be good idea to add sanity checks for alphabets versus declared language. For example, some Cyrillic and Latin letters looks same and could be mixed, what is definitely wrong. --EugeneZelenko (talk) 15:20, 10 January 2022 (UTC)[reply]
    I do plan to add some script detection so that I can select the right language code for languages which use multiple scripts, but trying to detect issues like the one you mentioned seems a bit out of scope for this bot. That can happen with any textual data, not just audio transcription (P9533), and it's easier to find if it's part of the structured data, because then it can be queried using the Commons query service. There aren't any audio transcription (P9533) statements with issues like that, but this query, for example, finds captions with a Cyrillic letter followed by an ASCII Latin letter. - Nikki (talk) 13:46, 14 January 2022 (UTC)[reply]
  •  Support this request, as someone who has found it difficult previously to identify pronunciation files to link to forms on Wikidata lexemes and would be greatly aided in this task with the fruits of this bot's labor. Mahir256 (talk) 02:29, 28 January 2022 (UTC)[reply]
 Comment The two examples looks good, perhaps now is the time to a bit larger test run. I don't think it needs to be particularly large, but it might be good to see a variety of languages. Ainali (talk) 07:21, 28 January 2022 (UTC)[reply]
@Nikki: Can you please do another test run as suggeested? --Krd 04:24, 5 February 2022 (UTC)[reply]
@Ainali and Krd: Here's another 50 edits (sorry it took a while, I've been quite busy) - Nikki (talk) 01:41, 11 February 2022 (UTC)[reply]

Approved. --Krd 09:21, 13 February 2022 (UTC)[reply]