Commons:Bots/Requests/Slick-o-bot
Slick-o-bot (talk · contribs)
Bot's tasks for which permission is being sought: uploading
Automatic or manually assisted: manually started
Edit type (e.g. Continuous, daily, one time run): only run when started by operator
Maximum edit rate (eg edits per minute): pywikipediabot framework defaults
Bot flag requested: (Y/N): Y
Programming language(s): Python (pywikipediabot framework upload.py)
Slick (talk) 17:01, 7 August 2012 (UTC)
Discussion
- Please make a test run. --EugeneZelenko (talk) 14:30, 8 August 2012 (UTC)
- Test done to Category:Defense.gov_News_Photos_to_check. --Slick (talk) 14:49, 9 August 2012 (UTC)
- I see and resolved the bug that create double file extensions like here and here. --Slick (talk) 14:49, 9 August 2012 (UTC)
- Please use language template for Author field. Is it possible to create more meaningful file names? Will be good idea to think about categorization for photos by date. --EugeneZelenko (talk) 14:54, 9 August 2012 (UTC)
- Show my an example please for the Author field you wish. --Slick (talk) 14:58, 9 August 2012 (UTC)
- Filenames are generated by "headline" given in the exif/iptc-data. But the most files havent that. I can generate that by description, but then I will cut at ~200 bytes lenght (and can not give an guarantee that is usefull). Should I? --Slick (talk) 15:01, 9 August 2012 (UTC)
- Now using language template for author, create filename from description, add two date categories, add more categories by exif/iptc data where possible, add ID-USMil template, see Category:Defense.gov_News_Photos_to_check. Waiting for comments ... --Slick (talk) 20:49, 9 August 2012 (UTC)
- Looks like excessive categories are occurred like Arlington, Va. in File:Defense.gov News Photo 120725-D-BW835-414 - Secretary of Defense Leon Panetta, right, meets with Polish Minister of Defense Tomasz Siemoniak, second from left, in the Pentagon on July 25, 2012. Panetta, Siemoniak a.jpg (and state should be full in Category names, but Pentagon is already there), July 25 in File:Defense.gov News Photo 120725-D-NI589-003 - Participants of the DoD s Joint Civilian Orientation Conference receive the close attention of a drill instructor at the Marine Corps Recruit Depot in San Diego on July 2.jpg (25 July is already there). --EugeneZelenko (talk) 14:50, 10 August 2012 (UTC)
- Categories are given by exif/iptc data or other scriptable informations. Take a look in this files, you will see the images are taggeg with this. I am not from USA to know thats wrong. And an script can not detect this wrong tagging too. Is not my job to check photographers given keywords and location. Categories are not checket if they exists. For example Arlington, Va. and the "double" date July 25 are Locations Name and/or Keywords in The exif-data. I just map them to categories. Its not possible to check ~15000 files with up to 20 keywords each for the best exists categories. After import its easy to select all files in a wrong catagorie with cat-a-lot und move to the right one. In my opinion this is not a problem, but bits and pieces. I am a little bit sad. I like to start. --Slick (talk) 16:09, 10 August 2012 (UTC)
- Looks like excessive categories are occurred like Arlington, Va. in File:Defense.gov News Photo 120725-D-BW835-414 - Secretary of Defense Leon Panetta, right, meets with Polish Minister of Defense Tomasz Siemoniak, second from left, in the Pentagon on July 25, 2012. Panetta, Siemoniak a.jpg (and state should be full in Category names, but Pentagon is already there), July 25 in File:Defense.gov News Photo 120725-D-NI589-003 - Participants of the DoD s Joint Civilian Orientation Conference receive the close attention of a drill instructor at the Marine Corps Recruit Depot in San Diego on July 2.jpg (25 July is already there). --EugeneZelenko (talk) 14:50, 10 August 2012 (UTC)
- Please use language template for Author field. Is it possible to create more meaningful file names? Will be good idea to think about categorization for photos by date. --EugeneZelenko (talk) 14:54, 9 August 2012 (UTC)
$ exiftool Defense.gov_News_Photo_120725-D-BW835-414_-_Secretary_of_Defense_Leon_Panetta,_right,_meets_with_Polish_Minister_of_Defense_Tomasz_Siemoniak,_second_from_left,_in_the_Pentagon_on_July_25,_2012._Panetta,_Siemoniak_a.jpg|grep "\(^City\|^Province-State\|^Keywords\)" Province-State : Arlington, Va. Keywords : Secretary of Defense Leon E. Panetta City : The Pentagon
$ exiftool Defense.gov_News_Photo_120725-D-NI589-003_-_Participants_of_the_DoD_s_Joint_Civilian_Orientation_Conference_receive_the_close_attention_of_a_drill_instructor_at_the_Marine_Corps_Recruit_Depot_in_San_Diego_on_July_2.jpg|grep "\(^City\|^Province-State\|^Keywords\)" Province-State : CA Keywords : Members of the JCOC staff and civilian visitors visit Marine Cor, July 25, 2012. (DoD Photo By Glenn Fawcett) City : Marine Corps Recruit Depot
- You could add only existing categories. Will be good idea to add {{Check categories}} to each image. --EugeneZelenko (talk) 14:49, 11 August 2012 (UTC)
- Done. Only add existing categories and add the template. See the last 10 trail uploads. --Slick (talk) 00:12, 12 August 2012 (UTC)
- As requested here I do only little bit categorize now --Slick (talk) 22:43, 18 August 2012 (UTC)
- Looks fine for me. I think will be good idea to involve some humans to help with images categorization. It these up[loads are part of some Wikipedia project(s) or may be part of some, will be good idea to ask there for help. --EugeneZelenko (talk) 14:41, 12 August 2012 (UTC)
- This mean I can start the entire upload process now? What about the bog flag? Do I need one? --Slick (talk) 15:27, 12 August 2012 (UTC)
- I think will be good idea to await for other suggestions. I don't think that bot status is needed. Anyway will be good idea to show uploads in recent changes for human help in categorization. --EugeneZelenko (talk) 14:49, 13 August 2012 (UTC)
- Now I am really confused. Why should I go here, request a bot flag and do a long discussion here if I not need a bot flag? In this case I see no reason to wait for feedback for this not needed bot flag. Maybe this is a discussion about the howto to upload, thats the wrong place and should go to Commons:Batch_uploading. --Slick (talk) 15:01, 13 August 2012 (UTC)
- Will run further test uploads now - trottle to 1 upload/minute. --Slick (talk) 10:36, 14 August 2012 (UTC)
- As requested here, ensure that bot do not upload a file twice when it already exists on commons. --Slick (talk) 18:23, 14 August 2012 (UTC)
- I would suggest to shorten the file names. Example: The image name of File:Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat.jpg is very long - as all recent files of Slick-o-bot. A shortening of the file names sould not cause any lost of information. "File:Defense.gov News Photo 100501-A-4830W-026" is sufficient. In many cases the automatically created file consist of the detailed naming of the military ranks of some personell. I suggest to use "File:Defense.gov News Photo [image ID]" for the remaining uploads. --High Contrast (talk) 16:57, 16 August 2012 (UTC)
- This is contrary to the wish from EugeneZelenko who want meaningful file names. I break the uploads until there is a consensus. But please, asap. --Slick (talk) 18:10, 16 August 2012 (UTC)
- My request is not different to Eugene's one - I wish to have meaningful file names but "Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat" is not really meaningful. Besides, I doubt that bots can create meaningful file titles neither by using parts of the oftenly very detailed image descriptions. As such, I suggest shortened image names should be preffered. --High Contrast (talk) 21:22, 16 August 2012 (UTC)
- Ok, now I understand. I agree with you. So it is be better use only short names. Just wait some hours for diffrent opinions before resume with short names. --Slick (talk) 21:30, 16 August 2012 (UTC)
- Bot will resume with short names as suggested now. (trottle 1 upload/minute) --Slick (talk) 18:22, 17 August 2012 (UTC)
- Thank you for your patience and for your contributions. Let's the bot run. --High Contrast (talk) 21:35, 18 August 2012 (UTC)
- Bot will resume with short names as suggested now. (trottle 1 upload/minute) --Slick (talk) 18:22, 17 August 2012 (UTC)
- Ok, now I understand. I agree with you. So it is be better use only short names. Just wait some hours for diffrent opinions before resume with short names. --Slick (talk) 21:30, 16 August 2012 (UTC)
- My request is not different to Eugene's one - I wish to have meaningful file names but "Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat" is not really meaningful. Besides, I doubt that bots can create meaningful file titles neither by using parts of the oftenly very detailed image descriptions. As such, I suggest shortened image names should be preffered. --High Contrast (talk) 21:22, 16 August 2012 (UTC)
- This is contrary to the wish from EugeneZelenko who want meaningful file names. I break the uploads until there is a consensus. But please, asap. --Slick (talk) 18:10, 16 August 2012 (UTC)
- I would suggest to shorten the file names. Example: The image name of File:Defense.gov News Photo 100501-A-4830W-026 - U.S. Army 1st Lt. Nicholas Eidemiller of 1st Platoon Able Company 2nd Battalion Airborne 503rd Infantry Regiment 173 Airborne Brigade Combat.jpg is very long - as all recent files of Slick-o-bot. A shortening of the file names sould not cause any lost of information. "File:Defense.gov News Photo 100501-A-4830W-026" is sufficient. In many cases the automatically created file consist of the detailed naming of the military ranks of some personell. I suggest to use "File:Defense.gov News Photo [image ID]" for the remaining uploads. --High Contrast (talk) 16:57, 16 August 2012 (UTC)
- I think will be good idea to await for other suggestions. I don't think that bot status is needed. Anyway will be good idea to show uploads in recent changes for human help in categorization. --EugeneZelenko (talk) 14:49, 13 August 2012 (UTC)
- This mean I can start the entire upload process now? What about the bog flag? Do I need one? --Slick (talk) 15:27, 12 August 2012 (UTC)
- Looks fine for me. I think will be good idea to involve some humans to help with images categorization. It these up[loads are part of some Wikipedia project(s) or may be part of some, will be good idea to ask there for help. --EugeneZelenko (talk) 14:41, 12 August 2012 (UTC)
- You could add only existing categories. Will be good idea to add {{Check categories}} to each image. --EugeneZelenko (talk) 14:49, 11 August 2012 (UTC)
The duplicate detections is not working optimally right now. A recent issue: [1] and the "older" and featured duplicate [2]. --High Contrast (talk) 23:25, 18 August 2012 (UTC)
It shows the same, so it looks like the same, but isnt the same (for a non-human) ;)
$ md5sum "Defense.gov News Photo 090321-N-8273J-409.jpg" 0309e8ccfb3ec17660cc6c1eed9fada3 Defense.gov News Photo 090321-N-8273J-409.jpg $ md5sum "USS_Annapolis_ICEX.jpg" ea0da2ffe54c4de0c9f3e82b6ac9822c USS_Annapolis_ICEX.jpg
(In this case it depents on different meta-data.) To find this dups it is necessary to recognize the content of an image. To recognize it is necessary to know it (i.E. as "fingerprint"). To find the same, is necessary to know all. So its necessary to have "fingerprints" from all images in commons to find this dups. Its not impossible, but there are different problems that have to resolve before. So currently we can only detect the same image. --Slick (talk) 09:15, 19 August 2012 (UTC)
- I see. I thought the same image resolution would also be a sufficient criterion to detect dupes with your tool. Nevertheless, the bot works fluently and now we need help from others in order to categorize the newly uploaded files properly. Regards, High Contrast (talk) 11:17, 19 August 2012 (UTC)
- The resolution is only a sufficient criterion if the software already detect the dup. You can not identify a dup of an image by resolution when you dont know there is one. Think about. (Instead that it has the same checksum which is detect by the commons software) --Slick (talk) 13:22, 19 August 2012 (UTC)
- I am unsure if I can run the bot without throttle now, when the bot has no botflag. I dislike the current throttle. --Slick (talk) 13:25, 19 August 2012 (UTC)
- I change the throttle to 3 uploads / minute. --Slick (talk) 10:35, 22 August 2012 (UTC)
- I finish the upload job the bot was requestet for. 14572 uploaded files. I am sad that I am finish before the bot was allowed official. The bot and the bot flag is requested furthermore for further jobs. --Slick (talk) 13:49, 25 August 2012 (UTC)
- I start to upload this job. ~23600 files. I extract all VRINs on Commons I found and remove (>3000) duplicates in this collection before upload. Throttle to max. 4 uploads / minute. The bot flag is requested furthermore. --Slick (talk) 19:21, 14 September 2012 (UTC)
Unless there are further objections, I suggest we approve this request. --99of9 (talk) 06:05, 12 October 2012 (UTC)
- Support--Sanandros (talk) 06:41, 12 October 2012 (UTC)
- Approved --99of9 (talk) 23:16, 22 October 2012 (UTC)