User talk:Zhuyifei1999/Archive 18

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

video2commons and Youtube

Youtube videos are for the most part, already available in free formats such as VP8/Vorbis and VP9/Opus. Your tool should look for these free-format files and ignore the transcoding process if they are available.

For example, this file File:The Witcher 3 Wild Hunt - Gameplay Trailer (English).webm is copied from Youtube. With the Complete Youtube Saver extension for Firefox, I can pull the VP9/Opus streams transcoded by Youtube, which gives VP9/ Opus. Due to the spam blacklist you have to add the letter g so that the domain is googlevideo, not ooglevideo. These two files combined are 93MB, the VP8 version on Commons is 204MB. - hahnchen 12:41, 1 May 2016 (UTC)

@Hahnchen: Stream filtering is a complicated task. For now, I'm using whatever youtube-dl's --prefer-free-formats recommends. Do you have a better algorithm? --Zhuyifei1999 (talk) 13:03, 1 May 2016 (UTC)
Try specifying format 248 in youtube-dl. Given that the file uploaded to Commons was larger than the largest possible Youtube file, I expect you are recoding the video regardless. If webm is available, do not recode. - hahnchen 18:30, 1 May 2016 (UTC)
@Hahnchen: 248 = 1080p VP9 DASH video. This will only work if the YouTube video actually has a 1080p version. Besides, hard coding 248 will fail to download any 4K or better videos, as it will download 1080p instead. Regarding whether my tool is transcoding or not, it actually only transcodes when the downloaded video or audio codec is non-free or incompatible, and in that case it will transcode both codecs to whatever the user specified (if a codec of one stream match the target codec, no transcoding, stream is copied). But the question is, how do the tool know, whether there is a webm version, which webm version has the highest resolution, and whether the highest resolution webm version has highest resolution of all formats? Also, note that my tool targets most video sharing websites so I have to support ogv as well. --Zhuyifei1999 (talk) 04:05, 2 May 2016 (UTC)

20:09, 2 May 2016 (UTC)

More images for commons

--79.40.4.238 15:01, 3 May 2016 (UTC)

I'm pretty sure I'm the wrong person for this. Maybe User:Fae? --Zhuyifei1999 (talk) 15:48, 3 May 2016 (UTC)

23:22, 9 May 2016 (UTC)

Bot down?

Last time your bot was adding Category:New uploads without a license or other such categories was on May 1. Did something broke? --Jarekt (talk) 02:49, 9 May 2016 (UTC)

The two jobs on grid engine seems deadlocked since May 1. (The May 2-9 job submits failed because the grid engine thought the jobs are still active.) I just killed them, and hopefully they will get resubmitted tomorrow. --Zhuyifei1999 (talk) 10:07, 9 May 2016 (UTC)
Thanks --Jarekt (talk) 11:37, 9 May 2016 (UTC)
It is working again. --Jarekt (talk) 16:10, 10 May 2016 (UTC)

Please update FlickreviewR_2 configuration to use HTTPS URLs

Hi - while reviewing logs for insecure API requests, I've noticed one of the prominent ones is your FlickreviewR_2 bot. Could you please update its configuration to use https:// URLs to access Wikimedia sites, instead of http:// ? Thank you! --BBlack (WMF) (talk) 23:06, 9 May 2016 (UTC)

Thanks for the notification. The mwclient configuration file says host = 'commons.wikimedia.org', without the protocol. I need some time to find where the protocol can be supplied, and whether the current mwclient library supports it. --Zhuyifei1999 (talk) 05:34, 10 May 2016 (UTC)
@BBlack (WMF): I see https is set as default for mwclient in January, and I've just git pulled to latest git master. Could you verify that the insecure API requests are now gone? --Zhuyifei1999 (talk) 06:07, 10 May 2016 (UTC)
Looks good to me on the dashboards. Thank you! --BBlack (WMF) (talk) 22:33, 10 May 2016 (UTC)

Images from Gallica

Hi, This didn't work. Any idea? Regards, Yann (talk) 08:30, 11 May 2016 (UTC)

@Yann: I've long forgotten how the tool work, but have gallica's urls changed? the link on Commons:Gallica isn't working. --Zhuyifei1999 (talk) 10:12, 11 May 2016 (UTC)
Yes, it seems that Gallica changed its interface. :( Regards, Yann (talk) 11:59, 11 May 2016 (UTC)

Hi. I tried this tool last week and it was working perfectly. I try it today and it doesn't work anymore. Do you think there is a possibility to repair this tool ? That was an awsome one and I am greatly disappointed now. Thanks a lot. Mel22 (talk) 18:57, 13 May 2016 (UTC)

Let me check. Working with a French website is hard when I'm fr-0 --Zhuyifei1999 (talk) 01:39, 14 May 2016 (UTC)
@Yann and Mel22: I found this http://gallica.bnf.fr/iiif/ark:/12148/btv1b90784730/f1/0,1024,1024,1024/1024,/0/native.jpg url style to kind of work, but I can't figure out how to find the image size, or the image gets stretched down indefinitely. --Zhuyifei1999 (talk) 02:49, 14 May 2016 (UTC)
Interesting. 0,1024,1024,1024 seems to be the coordinates, /1024,/ seems to be the image size, and /0/ is a rotation parameter. I will investigate more later. Regards, Yann (talk) 11:33, 14 May 2016 (UTC)

Large fraction of false positives in Category:New uploads without a license

Zhuyifei1999, There is much larger fraction of false positives in Category:New uploads without a license this month. I do not know if something changed. See files below: --Jarekt (talk) 12:15, 13 May 2016 (UTC)

I have no idea why they happen, and idk any may of finding out, sorry. Could you check the message I just left you? --Zhuyifei1999 (talk) 12:25, 13 May 2016 (UTC)
FYI this is probably due to phabricator:T132541. Magog the Ogre (talk) (contribs) 02:50, 15 May 2016 (UTC)

Hello

Hello, can you review the following two images, please?

Thank you. Ssven2 (talk) 07:40, 15 May 2016 (UTC)

YiFeiBot adding Category:Pages using Information template with parsing errors

Hey there User:Zhuyifei1999,

I was looking around and found that User:YiFeiBot seems to be adding Category:Pages using Information template with parsing errors to files even if the InfoBox was being correctly formatted. Examples: File:Hgr-106.jpg, File:WikiDSC_9311CHoeltschl.jpg and File:Turnierhelme_im_Burgmuseum_Meersburg.jpg.

I wanted to check if this was a bug in the bot or is there some reason for it doing so that I'm unaware of. I've currently removed the category, as he category description says "File description pages with parsing errors in Template:Information resulting in non-rendered template" which didn't seem to be correct for these files. Let me know, and I'll revert the changes appropriately.AbdealiJK (talk) 01:47, 4 May 2016 (UTC)

Thanks. They are false positives. While my bot does two null edits and 3 (IIRC) checks before actually adding the category, false positive rate is difficult to reduce as MediaWiki page information updates are done in job queue and not real time. --Zhuyifei1999 (talk) 02:28, 4 May 2016 (UTC)
Maybe we could also check Category:Pages using Information template with parsing errors for files transcluding {{Information}} or {{Infobox template tag}} and automatically remove them from the library on a daily basis.
@Jarekt: Hmm. How much time do you suggest to have between the time of the category addition run and the category removal run? Or maybe it should run several times a day? --Zhuyifei1999 (talk) 10:44, 4 May 2016 (UTC)
I was thinking about processing the previous day just before the new run, so I guess 24 hours, but if we can do it more often that would be better. We could also add similar category removal runs for "no license" categories. We seem to have occasionally people helping with those categories by tagging files with "no license" templates, but often without removing the files from the categories. --Jarekt (talk) 12:02, 4 May 2016 (UTC)
@Jarekt: While 24 hours would give enough time for any page informations in the job queue to finish, I'm afraid some false positives would get tagged with "no source" or "no license" within the 24 hours, failing the purpose of reducing false positives. (Oh I can't run it more than like 6 times a day, as the query is quite expensive and running too often might "annoy" jynus) --Zhuyifei1999 (talk) 12:29, 4 May 2016 (UTC)
Two thoughts. I do hope that false positives would not get tagged with "no source" or "no license". Right now I am the one mostly tagging files with "no license" and fixing files in Category:Pages using Information template with parsing errors (I empty those categories on most days), and I make sure to check them all individually. The purpose of removal runs would be to keep number of false positives and fixed-but-not-removed files small, on the days nobody empties the categories. Removal run queries should not be expensive as they only look at files in their category. --Jarekt (talk) 15:55, 4 May 2016 (UTC)
Good point. I'll code it next week, and run 6 times a day --Zhuyifei1999 (talk) 08:46, 5 May 2016 (UTC)
great --Jarekt (talk) 20:55, 5 May 2016 (UTC)
@Jarekt: Does this query look good for undoing the licensing task? I'll finish the two scripts this weekend --Zhuyifei1999 (talk) 12:11, 13 May 2016 (UTC)
I will have to test the query with some files in the categories. The query should closely match your addition query, or otherwise we will end up with a lot of categorize/uncategorize cycles. You should use {[tl|Deletion template tag}} which should simplify the query a lot. Also in untested quarry:query/9706 I was trying to narrow down the subquery. In your version it would returned ids of all the files on Commons and that would be expensive. in my version it would only return files in the 2 categories that need to be removed. --Jarekt (talk) 15:35, 13 May 2016 (UTC)
Opps, quarry:query/9706 did not saved my changes. I will try to recreate it from memory. --Jarekt (talk) 01:54, 15 May 2016 (UTC)
Now quarry:query/9706 returns files that can be removed from Category:New_uploads_without_a_license --Jarekt (talk) 04:14, 15 May 2016 (UTC)
✓ Done added to crontab and now running 6 times a day. One hit on first run. --Zhuyifei1999 (talk) 11:27, 16 May 2016 (UTC)

16:01, 16 May 2016 (UTC)

Hallo Zhuyifei1999, there are a lot of files (such as File:Aerial photographs of Florida MM00034964x (9409340769).jpg or File:De-Abmachung.ogg), that are wronly contained in the mentioned category. Is there a way to clean that up? --Arnd (talk) 08:05, 18 May 2016 (UTC)

I got my bot on reverting those edits for licensing ones last weekend. I'll try to get the same functions for this category and the missing information category this weekend --Zhuyifei1999 (talk) 09:00, 18 May 2016 (UTC)

18:40, 23 May 2016 (UTC)

Issue with bot recognizing licensing

I'm not sure if this is were I should bring this up, I apologize if this is not correct.

https://commons.wikimedia.org/wiki/File:Mike_Hart_in_Stampede_Wrestling.png

It says on my picture page that the bot that checks Flickr files can't figure out the licensing on the picture that I uploaded. What should I do? I've been planing on uploading a lot of images today.*Treker (talk) 17:28, 27 May 2016 (UTC)

In those cases, just go for a manual review. Something might have gone wrong --Zhuyifei1999 (talk) 04:06, 28 May 2016 (UTC)
Someone actually fixed that now thankfully. A user has checked most of my uploads now.*Treker (talk) 04:09, 28 May 2016 (UTC)

GAP - Job did not seem to succeed

Hi. I'm trying to download [25], but I'm getting "Job did not seem to succeed. Please check your input or refresh this page to try again. If this still fail, please contact commons:User:Zhuyifei1999" (running [26], Job "gap-2QG3KeIceW4FFA").

Same for [27] from [28]
and [29] from [30] --Sporti (talk) 09:37, 26 May 2016 (UTC)
I'll look into it this weekend. I'm forgetting how the the script works already --Zhuyifei1999 (talk) 13:22, 26 May 2016 (UTC)
@Sporti: I increased the memory from 4G to 5G. Can you see if it's working now? --Zhuyifei1999 (talk) 14:56, 28 May 2016 (UTC)
It worked for #1 and #3, #2 is getting same error. --Sporti (talk) 15:21, 28 May 2016 (UTC)
The script is wasting too much memory. I'll do some memory optimizations. --Zhuyifei1999 (talk) 15:48, 28 May 2016 (UTC)
The script is already segfault-ing on newer systems of tool labs. Needs a rewrite when I got more time. BTW: I searched again on how to download those images, answers are either too outdated, "go search wikimedia commons", or a simple "you can't" :( --Zhuyifei1999 (talk) 16:55, 28 May 2016 (UTC)
It would be nice if you could, even though they are inventing new and new ways to make downloading difficult... I did find this site [31] however, but it doesn't allow highest res download directly and only works with Chrome (did the one that didn't go though and worked - File:Rihard_Jakopič_-_Križanke_(svež_sneg).jpg).--Sporti (talk) 16:11, 29 May 2016 (UTC)
Well, for me it is easy to do a downsizing and make the download work every single time. But trying to get large images is so memory-consuming (for some unknown reason) that it keeps using up all the memory. --Zhuyifei1999 (talk) 16:26, 29 May 2016 (UTC)

16:18, 30 May 2016 (UTC)