Commons:Bots/Requests/Red panda bot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Red panda bot (talk · contribs)

Operator: Shizhao (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought:

Upload freely licenses photos in Flickr Explore to Wikimedia Commons. Flickr Explore is Flickr's way of showcasing the most interesting photos. The Flickr Interestingness algorithm chooses about 500 images to showcase for each 24-hour period. [1][2] There are many valuable and good quality photos.

Tag Files into Category:Photos in Flickr Explore.

Test upload: Special:Contributions/Red_panda_bot

Automatic or manually assisted:

Automatic semiautomatic and supervised.

Edit type (e.g. Continuous, daily, one time run):

  • Flickr Explore archive (2004-2020-03): Continuous, until the end.
  • 2020-03 ~ future: daily

Maximum edit rate (e.g. edits per minute):

  • Flickr Explore archive (2004-2020-03): 1-5/min
  • 2020-03 ~ future: about 0~50/day

Bot flag requested: (Y/N): No need flag bot

Programming language(s): pywikibot, fork flickrripper.py. Source code see User:Red panda bot/source shizhao (talk) 02:18, 2 March 2020 (UTC)[reply]

Discussion

There categories from flinfo, just simply match flickr tags to existing categories. This may not be an intelligent algorithm. And I tag a {{Check categories}} template. In file description page, already have Flickr tags fields. flickr does not provide any language tags for the text. Even if there is a language detection algorithm, it is difficult to ensure that the judgment of the language is completely correct.--shizhao (talk) 16:53, 2 March 2020 (UTC)[reply]
Is this still a test, or is the bot already running? --Krd 18:35, 4 March 2020 (UTC)[reply]
This still a test --shizhao (talk) 14:58, 5 March 2020 (UTC)[reply]
I suggest excluding images that are tagged 'manhammer' because they are trash from private parties or low quality regarding what is depicted. --Achim (talk) 16:52, 5 March 2020 (UTC)[reply]
In addition, one could consider a minimum size, because images like this one (max size 300x400 px) are quite poor. --Achim (talk) 17:05, 5 March 2020 (UTC)[reply]
These are mostly some images of flickr just launched in 2004. 2004-02-07 vs. 2005-02-07 vs. 2006-02-07. These are only a small number--shizhao (talk) 17:52, 5 March 2020 (UTC)[reply]
@Achim55: Fixed on [3], only upload > 200k photos--shizhao (talk) 12:46, 6 March 2020 (UTC)[reply]
@Krd, Achim55, and EugeneZelenko: How is it now?--shizhao (talk) 08:52, 11 March 2020 (UTC)[reply]
I see that some images got deleted. Are there any licensing problems? --Krd 09:00, 11 March 2020 (UTC)[reply]
@Krd: The files I deleted were bugs found during previous tests (Only file description page was created, no image uploaded). Fixed. Files deleted by others are tag as {{Flickr-public domain mark}} files. (need recheck license, see User:Red panda bot/license10. Allowed license in Manual:Pywikibot/flickrripper.py)--shizhao (talk) 14:09, 11 March 2020 (UTC)[reply]
I can run it?--shizhao (talk) 07:59, 17 March 2020 (UTC)[reply]
I'd apprechiate to hear more opinions. IMO there are quite many nonsense files between the uploads. --Krd 07:38, 20 March 2020 (UTC)[reply]
Just to be clear, this is not the fault of the requestor nor in any way against him, but just questioning the idea of uploading the whole stream without any manual review. --Krd 16:27, 20 March 2020 (UTC)[reply]
@Krd: All images will tag as {{subst:unc}} or {{subst:chc}} for manual review. All images that may have licensing issues will list on User:Red panda bot/check license for manual review. I also will always review there images. Nonsense files mainly focus on 1-2 months after flickr just launched (2004-02 to 2004-03). Most nonsense files have been uploaded during the test (upload to 2004-03-12). Such files will no longer be uploaded in the future. I didn't expect flickr to have Flickr Explore since the launch of the internal beta during the test.--shizhao (talk) 07:55, 23 March 2020 (UTC)[reply]
Anyone have any other questions or comments?--shizhao (talk) 03:25, 1 April 2020 (UTC)[reply]
I would suggest to not use Flickr tags at all. See File:Today's Cat@2019-05-02 - Flickr - masatsu.jpg as example. Is it possible to filter meaningless descriptions like in File:MG 5328-5669-5670-5671-5672-5673 - Flickr - rickmassey1.jpg? --EugeneZelenko (talk) 14:37, 2 April 2020 (UTC)[reply]
@EugeneZelenko: The descriptions from flinfo. Flickr tags can provide a lot of information about the file, especially if the filename and description are meaningless, and it can also help with file category. Really don't need Flickr tags at all? --shizhao (talk) 07:52, 3 April 2020 (UTC)[reply]
I think all such upload require human review. Unless project around bot activity is organized, I don't see point of mass uploads. --EugeneZelenko (talk) 14:31, 3 April 2020 (UTC)[reply]
I have to agree. We shouldn't start to upload hundreds of files which all require manual review before there is a plan how and by whom this will be done. Just creating a new backlog of unusable files is not within project scope. --Krd 17:31, 3 April 2020 (UTC)[reply]
Flickr free accounts limited to 1,000 photos and videos[4]. Many valuable free license photos will never be found again, This also includes photos from Flickr Explore. Now have 4 photos used wikimeida sites.[5]--shizhao (talk) 12:01, 7 April 2020 (UTC)[reply]
From the last upload batch of 11 files I have nominated 2 as possible copyright violations (FOP issues). --Krd 14:57, 7 April 2020 (UTC)[reply]
If just upload new Flickr Explore photo, Do not upload archived photos, Is it okay? Free Flickr Explore photos are about 0-50 photos per day, most of the time there are only about 10 photos a day. I can switch to semi-automatic--shizhao (talk) 02:07, 15 April 2020 (UTC)[reply]
Please disable automatic categorization. See File:Valverde de la Vera - Flickr - santiagolopezpastor (1).jpg as another example. --EugeneZelenko (talk) 13:39, 15 April 2020 (UTC)[reply]
I add {{Check categories}} and blacklist in script[6]--shizhao (talk) 07:47, 20 April 2020 (UTC)[reply]
Who will actually check that categories? --EugeneZelenko (talk) 14:16, 20 April 2020 (UTC)[reply]
Everyone including me. This is the meaning of wiki--shizhao (talk) 03:27, 27 April 2020 (UTC)[reply]
Please take a look on Category:Media needing category review by date to see prospects of this approach. --EugeneZelenko (talk) 14:08, 27 April 2020 (UTC)[reply]

I would like to summarize that we have a reasonable amount of copyright issues with the files intended to be uploaded, as well as scope issues and the requirement of manual review of files and category cleanup, which is all unresolved. Is that correct? --Krd 03:51, 10 May 2020 (UTC)[reply]

--shizhao (talk) 02:11, 18 May 2020 (UTC)[reply]

Additional opinions? @Schlurcher: maybe? --Krd 10:04, 27 May 2020 (UTC)[reply]
Let me try to give additional thoughts on the issues raised by Krd. I have looked at all the uploads by the bot to date and have made about 20 individual deletion requests for out of scope pictures and 350 more es part of a mass deletion request. The core problem to this is that the Flickr archive is not a good source for pictures in scope at commons. Please look at a random date in the archive [7], the majority of pictures is out of scope for commons. So please strike Flickr Explore archive (2004-2020-03): 1-5/min from the request. Over time the quality of the pictures got better (as Flickr has more pictures in general) and thus there are more suitable pictures for commons in the newer and future dates. However, the general scope issue raised by Krd remains. In my opinion the core to this issue is the different scope of Flickr explore. As I see it, most pictures in this feed are suitable for a desktop screen background but not necessarily can ever be used in an educational context. Some examples from the stream of today [8] [9]. Some users have advocated to accept a certain degree of scope issues as long as the overall content is helpful. In commons, pictures are only found ever again, if they are included in appropriate categories. Unfortunately and as highlighted before, this is also not the case. Copyright issues is not a specific issue with this request but generally applies to Flickr transfer. We have built some general safeguards with regard to this. In summary, I share the concerns from Krd regarding scope issues and the pictures in scope will require manual review of files to be ever suitable. Thus the benefit of transferring them by bot is still questionable. Looking forward to your thoughts. --Schlurcher (talk) 13:08, 30 May 2020 (UTC)[reply]

 Question: Would you be willing to make this a fully supervised task (with tool support) where you take full responsibility of categorization and assessing if the images are in scope? --Schlurcher (talk) 08:00, 31 May 2020 (UTC)[reply]

I very much agree with you. I can be responsible for categorization and assessing if the images are in scope. But what is a fully supervised task? (No flag bot?)--shizhao (talk) 02:17, 5 June 2020 (UTC)[reply]
Fully supervised would mean that you assess the pictures for beeing in scope or not prior to uploading them here. This could be done by manually generating a list of files for the day to be uploaded and then let the bot upload them, or by confirming each upload manually before the bot performs the action. We rarely give bot flags to upload bots, as we want new uploads to be checked and noticed by users. --Schlurcher (talk) 10:38, 6 June 2020 (UTC)[reply]
ok. I will rewrite the code--shizhao (talk) 01:19, 11 June 2020 (UTC)[reply]
@Schlurcher and Krd: I have rewritten the scripts, see source code. The new process is as follows:
  1. Automatically check out the license-compliant photos from Flickr Explore (Specify a date range or specify a date)
  2. Manually select the photos that meet the commons scope on a webpage
  3. Automatically upload the photos selected in step 2 to commons
  4. Manually check the photos uploaded to commons in step 3
See test results--shizhao (talk) 06:25, 12 June 2020 (UTC)[reply]
I think uploads look much better now. I think this can be approved, but should run without bot flag as there is manual interference. Thoughts? --Krd 10:29, 14 June 2020 (UTC)[reply]
One more condition: bot should not add categories based on Flickr tags. --EugeneZelenko (talk) 13:47, 14 June 2020 (UTC)[reply]
{{Check categories}} used--shizhao (talk) 00:02, 15 June 2020 (UTC)[reply]
@Krd: yes,no bot flag--shizhao (talk) 06:33, 15 June 2020 (UTC)[reply]
Please see above my comment of usefulness of {{Check categories}}. Sorry, I object approval. --EugeneZelenko (talk) 14:02, 15 June 2020 (UTC)[reply]
@EugeneZelenko: Remove all categories based on Flickr tags, see test results--shizhao (talk) 06:27, 16 June 2020 (UTC)[reply]
Looks OK for me. Please don't forget to take care about uploaded files categorization. --EugeneZelenko (talk) 13:51, 16 June 2020 (UTC)[reply]
My comments have also been all addressed. Can be approved from my point of view. --Schlurcher (talk) 20:05, 16 June 2020 (UTC)[reply]

Approved. --Krd 17:19, 23 June 2020 (UTC)[reply]