Commons:Bots/Requests/SchlurcherBot4

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

SchlurcherBot (talk · contribs) 4

Operator: Schlurcher (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Assuming file originated from WhatsApp based on filename: Add template {{WhatsApp|rename}} to files matching img-20[0-9][0-9][0-1][0-9][0-3][0-9]-wa anywhere in the filename. This will add them to the category: Category:Files from WhatsApp with bad file names and similar tasks.

Please also see Commons:Bots/Work_requests#Adding_newly_uploaded_files_to_Category:Files_with_IMG-date-WA_in_filename and the previous discussion below.

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Daily

Maximum edit rate (e.g. edits per minute): Estimated to be less then 10 files per day

Bot flag requested: (Y/N): N (Bot has flag already)

Programming language(s): Pywikibot

Schlurcher (talk) 21:19, 22 November 2017 (UTC)[reply]

Discussion

For a few test edits please see: [1] [2] [3]. Generally, the category will not be added, if the file is already in the category. The category will be added after one blank line, if there are no other categories. The category will be added on the next line, if there are other categories present. I would also like to get feedback if a more self explanatory name should be used instead of: Category:Files with IMG-date-WA in filename. Thanks, --Schlurcher (talk) 06:32, 2 March 2018 (UTC)[reply]

  • Few questions if you don't mind; How are you identifying the pages to edit; database search? While the daily runs will be small, how many pages do you estimate this will effect for the first run? While I understand those are test edits you referenced, but I hope your edit summary will be more descriptive. Something to consider.. perhaps should we also be tagging these files with a template that recommends renaming?~riley (talk) 06:57, 2 March 2018 (UTC)[reply]
@~riley: the oldest file with this (by filename) is File:Img-20130808-wa0006.jpg. So we currently have <700 files from 2013 to today. The "first run" has already happened about a week ago. (a little magic and a little cat-a-lot) Assuming SchlurcherBot4 will only handle newly uploaded files, I will add whatever was missed during this week later. (need to wait for a new index to be generated) - Alexis Jazz 16:45, 2 March 2018 (UTC)[reply]
@~riley: The Internationalization part of my bot uses a daily updated list of newly uploaded files that comes directly from a built-in function that queries the Wikimedia database. The same file is used for this task. A titleregex is applied, so only the files that match get analyzed. @Alexis Jazz: I already told my bot not to discard the old files so we can catch up once needed. --Schlurcher (talk) 00:04, 3 March 2018 (UTC)[reply]
I first have to check how I can tackle this. I'm also not sure if custom galleries scattered all around the user namespace are the way to go here. On initial thought, categories with a date identifier might be easier to find and handle. --Schlurcher (talk) 00:04, 3 March 2018 (UTC)[reply]
I don't believe Eugene is implying you would need to tackle this, anyone can setup a custom category using OrgeBot's existing framework. I have added a custom gallery for SchlurcherBot to run weekly, which we can remove, adjust or continue doing if we like it. Worth trying for a week or two, also useful for tracking deletions from the category! ~riley (talk) 00:14, 3 March 2018 (UTC)[reply]
  • Thanks Schlurcher! As for the name, I think you're right. As I feared after it happened the first time, users are removing this category from files after they have been renamed. But even after renaming it's still useful to know the file once originated from WhatsApp. Perhaps it's better to rename the category to "Files that likely originated from WhatsApp" or something? By the way, all test edits you linked were either deleted or nominated for deletion. Just shows the effectiveness of this. - Alexis Jazz 16:36, 2 March 2018 (UTC)[reply]
  • I would recommend adding a template (that includes the category) that both states it is likely the file is from WhatsApp and also should be renamed? +1 support for renaming the category; how about "Files from WhatsApp with bad file names" to match Category:Files from Flickr with bad file names. ~riley (talk) 00:14, 3 March 2018 (UTC)[reply]
@~riley: So to summarize: I'll create two new categories Category:Files from WhatsApp with bad file names and Category:Files that likely originated from WhatsApp. A template needs to be created that adds the category and makes it visible the file came from WhatsApp and optionally (parameter) that it was renamed. {{WhatsApp}} and {{WhatsApp|renamed}} for renamed files seems sane. Or it could be {{WhatsApp|needsrename}} and just {{WhatsApp}} when it has been renamed. Perhaps that's better as users will be unlikely to add that parameter or template themselves, but they probably will want to remove the "needsrename" argument. Which sounds better to you, or any other suggestions? - Alexis Jazz 00:41, 3 March 2018 (UTC)[reply]
@~riley: @Schlurcher: It looks okay to me now: File:Logo Alianza de Centro Democrático.jpg was renamed but came from WhatsApp, File:IMG-20180102-WA0057.jpg was not renamed and came from WhatsApp. I suggest appending the source with {{WhatsApp|badname}}. (or whatever parameter you like, the template doesn't care) Any comment? - Alexis Jazz 03:58, 3 March 2018 (UTC)[reply]
I have fixed the template so that it is appropriate to use; basically {{WhatsApp}} is the normal tracking template and category and {{WhatsApp|rename}} is rename template and renaming category. The bot basically will then apply {{WhatsApp|rename}} moving forward if the operator agrees and it can have the rename parameter removed after being renamed. ~riley (talk) 05:40, 3 March 2018 (UTC)[reply]
  • Would it also be an idea to automatically add the user (at least when they have uploaded less than 50 files or so) to a category, or is that going too far? I'm noticing the IMG-date-WA files get nominated, but once removed there's no easy way to track down the users who uploaded them. And other files uploaded by those users tend to have a good percentage of copyvios as well. - Alexis Jazz 04:16, 3 March 2018 (UTC)[reply]
    That would indeed be going too far, it's outside the framework of this bot. Majority of copyvios get nominated with no way to track down the users who uploaded them once deleted. ~riley (talk) 05:43, 3 March 2018 (UTC)[reply]

  • @Schlurcher: What are your thoughts so far? I realize we have taken over your bot task but this shouldn't be surprising considering Alexis was the one to request this task plus the fact this has developed into a much better solution. I think all of what we have described is easy to do within the framework of a pywikibot which exception to the last note. ~riley (talk) 05:43, 3 March 2018 (UTC)[reply]
No worries, I see ideas evolving; thats good. I'm not sure, if I understand the current proposal. As I understand, the new idea is to add the new template {{WhatsApp}} to files matching the above regex. Does this still need some bot intervention? --Schlurcher (talk) 06:03, 3 March 2018 (UTC)[reply]
The current proposal is still bot intervention; adding {{WhatsApp|rename}} to files matching the above regex based on the same premises (daily run, automatic, etc).~riley (talk) 06:27, 3 March 2018 (UTC)[reply]
I adjusted both the request text above as well es my bot script. To be able to add the template, I could not use the categorization script and thus had to do it again from beginning. Some test edits are: [4] [5] [6].
Files that already have the template {{WhatsApp}} will be ignored as well as REDIRECTs. I did not ignore files in the old category Category:Files with IMG-date-WA in filename; this category needs to be cleaned up once when we are aligned on what to do.
My proposed edit summary is: Assuming file originated from WhatsApp based on filename --Schlurcher (talk) 07:06, 4 March 2018 (UTC)[reply]
This is all looking very good. I do not expect to be online much this week due to reasons but you and ~riley please proceed. - Alexis Jazz 07:49, 4 March 2018 (UTC)[reply]
Throw Bot: or (Bot) somewhere in there and I am happy, you're one of the only bots that runs without the phrase in the edit summary an I would prefer if we stuck with the norm. ~riley (talk) 08:07, 4 March 2018 (UTC)[reply]
Updated the summary to Bot assuming file originated from WhatsApp based on filename. Further, I performed some housekeeping and cleaned up the Category:Files with IMG-date-WA in filename (and marked for deletion) and updated the {{WhatsApp}}. --04:49, 6 March 2018 (UTC)
Swell! Category deleted, nice work. ~riley (talk) 06:27, 6 March 2018 (UTC)[reply]

Administrator note: If there is no objection from the community, it is my recommendation that this task be approved (no flag required). ~riley (talk) 03:25, 6 March 2018 (UTC)[reply]

Approved. --Krd 11:46, 13 March 2018 (UTC)[reply]