Commons:Bots/Requests/YiFeiBot (19)

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

YiFeiBot (talk · contribs) (19)

Operator: Zhuyifei1999 (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Remove space(s) before file extensions. (affecting 48970 files as of today) Requested at Commons:Bots/Work_requests#Spaces_before_file_extensions

Automatic or manually assisted: Automatic unsupervised

Edit type (e.g. Continuous, daily, one time run): one time run (the rest I hope will be manually fixed. User:Steinsplitter made a list in https://tools.wmflabs.org/steinsplitter/leerz.php)

Maximum edit rate (e.g. edits per minute): 6 3 moves per minute

Bot flag requested: (Y/N): N

Programming language(s): python: pywikipedia

Zhuyifei1999 (talk) 10:48, 25 April 2014 (UTC)[reply]

Discussion

 Question What happens if the target file name already exists?--McZusatz (talk) 17:12, 13 May 2014 (UTC)[reply]
That's a skip. (That's the case in one of the two files that failed moves. I forgot the other, however.) --Zhuyifei1999 (talk) 12:07, 14 May 2014 (UTC)[reply]
 Question. As we had some files disappearing after moves with no apparent reason and were forced to delete them entirely, how does your bot ensure consistency? Is there any mechanism ensuring that the raw-file is not an 404 and the description is not empty after the move? --McZusatz (talk) 17:12, 13 May 2014 (UTC)[reply]
There's no way to be sure that 404 doesn't appear, but according to Steinsplitter's post above ^^, WMF Operations three moves per min are okay (I'd rather do 1 per min, however)
The problem is that you can not ensure that someone will process the requests every 2.5 hours. Even if someone actually did, there is still a chance that more than 500 files will hit CommonsDelinker at once. (Other file movers will contribute as well). Furthermore I feel more comfortable if CommonsDelinker does the replacements in real time/constant speed. --McZusatz (talk) 11:38, 15 May 2014 (UTC)[reply]
I don't see a problem... (@MCZ: Ich vretstehe das problem nicht... Dan verschieben wier 250 und 250. Du kommentierst erst nachem ein Bürokrat den Antrag freigeben will... Warum erst jetzt?) --Steinsplitter (talk) 11:50, 15 May 2014 (UTC)[reply]
If you are willing to keep track of it manually, I won't have a problem with it. --McZusatz (talk) 12:30, 15 May 2014 (UTC)[reply]
(@Steinsplitter: Da die Aufgabe des Bots nicht wirklich feststand (zumindest für mich) fand ich eine Nachfrage durchaus angebracht. Umso dringlicher, wenn der Bot demnächst loslegen soll. --McZusatz (talk) 19:54, 15 May 2014 (UTC))[reply]
Is CommonsDelinker queue size accessible via API or as wiki page? If it accessible, YiFeiBot should just wait if queue is full. --EugeneZelenko (talk) 14:19, 15 May 2014 (UTC)[reply]
Do I get this right: If User:CommonsDelinker/commands/filemovers is too large (50000 bytes?), pause the bot? --Zhuyifei1999 (talk) 14:37, 15 May 2014 (UTC)[reply]
I think number of entries on page is better criteria then size in bytes. --EugeneZelenko (talk) 14:07, 16 May 2014 (UTC)[reply]
Eugene, It is easer to work with bytes for the bot operator. I don't see the need to count templates. --Steinsplitter (talk) 14:17, 16 May 2014 (UTC)[reply]
Sure, pages size is easier to check. But page size limit should rely on minimal rename record size and have margin. --EugeneZelenko (talk) 14:28, 17 May 2014 (UTC)[reply]

Let's make use of average. According to query select avg(length(img_name)), min(length(img_name)), minmax(length(img_name)) from image where img_name regexp "_\.[a-zA-Z]+$";, the average of the length of the filenames to me renamed by this bot is ~ 61.2997 (max 246, min 7). With python code 61.2997 * 2 - 1 + len("{{universal replace|||reason=Robot: Removing space(s) before file extension}}"), the average command length ~ 198.5994. With 400 requests, page size ~ 30921.5994 (without the header). --Zhuyifei1999 (talk) 15:10, 17 May 2014 (UTC)[reply]

Why not just count all "{{" on the raw page if size>10k? --McZusatz (talk) 11:38, 18 May 2014 (UTC)[reply]
I'll do this on Friday --Zhuyifei1999 (talk) 08:29, 19 May 2014 (UTC)[reply]
No time, tomorrow. --Zhuyifei1999 (talk) 08:56, 24 May 2014 (UTC)[reply]
✓ Done Does User:YiFeiBot/~/pywikibot/com_end_space.py look okay? --Zhuyifei1999 (talk) 09:31, 25 May 2014 (UTC)[reply]
I don't understand py perfectly, but it seems you implemented everything that was discussed above. --McZusatz (talk) 14:05, 29 May 2014 (UTC)[reply]
 Question Is there anything else that needs doing (implementing) or discussing, or can we close this request as successful? odder (talk) 08:52, 30 May 2014 (UTC)[reply]
No, everything should be clear now and there was enough time to raise concerns. --McZusatz (talk) 09:17, 30 May 2014 (UTC)[reply]
+1 :) --Steinsplitter (talk) 09:19, 30 May 2014 (UTC)[reply]
 Bureaucrat note: Closing as ✓ approved, then. odder (talk) 10:29, 31 May 2014 (UTC)[reply]