Commons:Bots/Requests/Slick-o-bot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Slick-o-bot (talk · contribs)

Operator: Slick (talk)

Bot's tasks for which permission is being sought: uploading

Automatic or manually assisted: manually started

Edit type (e.g. Continuous, daily, one time run): only run when started by operator

Maximum edit rate (eg edits per minute): pywikipediabot framework defaults

Bot flag requested: (Y/N): Y

Programming language(s): Python (pywikipediabot framework upload.py)

Slick (talk) 17:01, 7 August 2012 (UTC)[reply]

Discussion

$ exiftool Defense.gov_News_Photo_120725-D-BW835-414_-_Secretary_of_Defense_Leon_Panetta,_right,_meets_with_Polish_Minister_of_Defense_Tomasz_Siemoniak,_second_from_left,_in_the_Pentagon_on_July_25,_2012._Panetta,_Siemoniak_a.jpg|grep "\(^City\|^Province-State\|^Keywords\)"
Province-State                  : Arlington, Va.
Keywords                        : Secretary of Defense Leon E. Panetta
City                            : The Pentagon
$ exiftool Defense.gov_News_Photo_120725-D-NI589-003_-_Participants_of_the_DoD_s_Joint_Civilian_Orientation_Conference_receive_the_close_attention_of_a_drill_instructor_at_the_Marine_Corps_Recruit_Depot_in_San_Diego_on_July_2.jpg|grep "\(^City\|^Province-State\|^Keywords\)"
Province-State                  : CA
Keywords                        : Members of the JCOC staff and civilian visitors visit Marine Cor, July 25, 2012. (DoD Photo By Glenn Fawcett)
City                            : Marine Corps Recruit Depot
You could add only existing categories. Will be good idea to add {{Check categories}} to each image. --EugeneZelenko (talk) 14:49, 11 August 2012 (UTC)[reply]
Done. Only add existing categories and add the template. See the last 10 trail uploads. --Slick (talk) 00:12, 12 August 2012 (UTC)[reply]
As requested here I do only little bit categorize now --Slick (talk) 22:43, 18 August 2012 (UTC)[reply]
Looks fine for me. I think will be good idea to involve some humans to help with images categorization. It these up[loads are part of some Wikipedia project(s) or may be part of some, will be good idea to ask there for help. --EugeneZelenko (talk) 14:41, 12 August 2012 (UTC)[reply]
This mean I can start the entire upload process now? What about the bog flag? Do I need one? --Slick (talk) 15:27, 12 August 2012 (UTC)[reply]
I think will be good idea to await for other suggestions. I don't think that bot status is needed. Anyway will be good idea to show uploads in recent changes for human help in categorization. --EugeneZelenko (talk) 14:49, 13 August 2012 (UTC)[reply]
Now I am really confused. Why should I go here, request a bot flag and do a long discussion here if I not need a bot flag? In this case I see no reason to wait for feedback for this not needed bot flag. Maybe this is a discussion about the howto to upload, thats the wrong place and should go to Commons:Batch_uploading. --Slick (talk) 15:01, 13 August 2012 (UTC)[reply]
Will run further test uploads now - trottle to 1 upload/minute. --Slick (talk) 10:36, 14 August 2012 (UTC)[reply]
As requested here, ensure that bot do not upload a file twice when it already exists on commons. --Slick (talk) 18:23, 14 August 2012 (UTC)[reply]
I would suggest to shorten the file names. Example: The image name of File:Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat.jpg is very long - as all recent files of Slick-o-bot. A shortening of the file names sould not cause any lost of information. "File:Defense.gov News Photo 100501-A-4830W-026" is sufficient. In many cases the automatically created file consist of the detailed naming of the military ranks of some personell. I suggest to use "File:Defense.gov News Photo [image ID]" for the remaining uploads. --High Contrast (talk) 16:57, 16 August 2012 (UTC)[reply]
This is contrary to the wish from EugeneZelenko who want meaningful file names. I break the uploads until there is a consensus. But please, asap. --Slick (talk) 18:10, 16 August 2012 (UTC)[reply]
My request is not different to Eugene's one - I wish to have meaningful file names but "Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat" is not really meaningful. Besides, I doubt that bots can create meaningful file titles neither by using parts of the oftenly very detailed image descriptions. As such, I suggest shortened image names should be preffered. --High Contrast (talk) 21:22, 16 August 2012 (UTC)[reply]
Ok, now I understand. I agree with you. So it is be better use only short names. Just wait some hours for diffrent opinions before resume with short names. --Slick (talk) 21:30, 16 August 2012 (UTC)[reply]
Bot will resume with short names as suggested now. (trottle 1 upload/minute) --Slick (talk) 18:22, 17 August 2012 (UTC)[reply]
Thank you for your patience and for your contributions. Let's the bot run. --High Contrast (talk) 21:35, 18 August 2012 (UTC)[reply]

The duplicate detections is not working optimally right now. A recent issue: [1] and the "older" and featured duplicate [2]. --High Contrast (talk) 23:25, 18 August 2012 (UTC)[reply]

It shows the same, so it looks like the same, but isnt the same (for a non-human) ;)

$ md5sum "Defense.gov News Photo 090321-N-8273J-409.jpg" 
0309e8ccfb3ec17660cc6c1eed9fada3  Defense.gov News Photo 090321-N-8273J-409.jpg
$ md5sum "USS_Annapolis_ICEX.jpg"
ea0da2ffe54c4de0c9f3e82b6ac9822c  USS_Annapolis_ICEX.jpg

(In this case it depents on different meta-data.) To find this dups it is necessary to recognize the content of an image. To recognize it is necessary to know it (i.E. as "fingerprint"). To find the same, is necessary to know all. So its necessary to have "fingerprints" from all images in commons to find this dups. Its not impossible, but there are different problems that have to resolve before. So currently we can only detect the same image. --Slick (talk) 09:15, 19 August 2012 (UTC)[reply]

I see. I thought the same image resolution would also be a sufficient criterion to detect dupes with your tool. Nevertheless, the bot works fluently and now we need help from others in order to categorize the newly uploaded files properly. Regards, High Contrast (talk) 11:17, 19 August 2012 (UTC)[reply]
The resolution is only a sufficient criterion if the software already detect the dup. You can not identify a dup of an image by resolution when you dont know there is one. Think about. (Instead that it has the same checksum which is detect by the commons software) --Slick (talk) 13:22, 19 August 2012 (UTC)[reply]
I am unsure if I can run the bot without throttle now, when the bot has no botflag. I dislike the current throttle. --Slick (talk) 13:25, 19 August 2012 (UTC)[reply]
I change the throttle to 3 uploads / minute. --Slick (talk) 10:35, 22 August 2012 (UTC)[reply]


  • I start to upload this job. ~23600 files. I extract all VRINs on Commons I found and remove (>3000) duplicates in this collection before upload. Throttle to max. 4 uploads / minute. The bot flag is requested furthermore. --Slick (talk) 19:21, 14 September 2012 (UTC)[reply]

Unless there are further objections, I suggest we approve this request. --99of9 (talk) 06:05, 12 October 2012 (UTC)[reply]

 Support--Sanandros (talk) 06:41, 12 October 2012 (UTC)[reply]
Approved --99of9 (talk) 23:16, 22 October 2012 (UTC)[reply]