Commons:Requests and votes/FlickreviewR

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

FlickreviewR

Comments

Currently a lot of Flickr images are uploaded to the Commons. As Flickr allows the uploaders to change their licensing at any time, images must be confirmed to have a copyright status acceptable on Commons. FlickreviewR is a bot that checks the image status at Flickr, and confirms whether the license on Flickr is one acceptable on the Commons.

Some examples: [1], [2], [3], [4], [5]

The second real testrun has been completed: User:FlickreviewR/log. You can see some errors in log 3, which have been fixed in log 4. -- Bryan (talk to me) 21:41, 13 November 2006 (UTC)[reply]

Comments

Also a small log can be found here. Note that not all edits listed their are really submitted to the server. -- Bryan (talk to me) 17:16, 11 November 2006 (UTC)[reply]

This seems to be working for the pass/fail circumstances which is helpful. However how does it deal with the following situations (all of which have been observed)?
  1. The Flickr image is different from the Commons image
  2. The Flickr image is now set to private and is not publically visible
  3. The Flickr image has been deleted and is no longer available
The first of these is the most serious problem; as slight changes are possible (rotation, reflection, cropping) but it is also possible that the Flickr "source" is not the same as the Commons image (like this [6]).--Nilfanion 18:48, 11 November 2006 (UTC)[reply]
Both 2nd and 3rd are easy, it posts a FLICKR_NOT_FOUND message to the log. That one currently has not yet a tag. The 1st is more difficult. I has not yet seen that case earlier. What I can do is compare the dates, and if available also the exif data. If the postdate on COM is earlier than the lastupdate date on Flickr, it should obvious need human attention. -- Bryan (talk to me) 19:10, 11 November 2006 (UTC)[reply]

Just to let you know, a picture matches either of the following tags:

('PASSED', 'FAILED', 'PASSED_CHANGED', 'NOT_CC_TAGGED', 'UNKNOWN_LICENSE', 'NO_FLICKR_LINK', 'FLICKR_NOT_FOUND', 'FLICKR_NOT_MATCHING', 'NO_EXIF')

-- Bryan (talk to me) 21:17, 12 November 2006 (UTC)[reply]


  • I wonder how image comparison works. Will case with different resolutions be handled? Resolution could be different if pro account gone free and otherwise (I saw one such case in past). --EugeneZelenko 21:07, 11 November 2006 (UTC)[reply]
Currently the comparison is solely based on postdates: if it is posted on Flickr later than here on the Commons, it is obviously not the same file. I realize however that this is not enough. Luckily both flickr and mediawiki provide exif information. With that, a very rich comparison can be accomplished. If it is cropped for example with a decent photo edittor, the exif tags are retained. If the exif information does not match or is absent, their obviously is need for human attention, and the bot will skip those. -- Bryan (talk to me) 12:40, 12 November 2006 (UTC)[reply]
Thank you for explanation. As my last doubt gone, I  Support bot status for such routine work. --EugeneZelenko 16:13, 12 November 2006 (UTC)[reply]
It actually uses edit summaries. I will start a new test run probably to night or tomorrow morning, and post the results here. If it is running for real, it will post a detailed log of its action anyway. -- Bryan (talk to me) 12:43, 13 November 2006 (UTC)[reply]
Hrm, must have been something else that was missing edit summaries. Ok, I think I can  Support this. Alphax (talk) 12:53, 13 November 2006 (UTC)[reply]
  •  Support - this looks workable now. However, a few tweaks would integrate the bot into the human review. Firstly, it should check to see if the image has been reviewed already and skip it if it has (the test runs seem to have selected reviewed images). Second, it should categorise the images correctly (tweak its template?). There seem to be four outcomes: valid, invalid, unsourced and more info needed. The bot's tags for the first two cases should categorise the image appropriately, just as if a human had reviewed. The 4th case should either leave it in the review needed category or perhaps place it in something like Category:Flickr images without EXIF. The 3rd case would probably be best to point to the possibly unfree category, pending some decision on what to do with these images.--Nilfanion 19:54, 13 November 2006 (UTC)[reply]
    • Currently working on a log, so no duplicate scanning will be performed. I suggest that for NO_EXIF and those like that the category Category:Flickr images needing human review, which is some broader than only no exif. I suggest we also point unsourced images to that category.
Further I have a questions regarding to templates. Should the templates be in userspace or in templatespace?

-- Bryan (talk to me) 20:10, 13 November 2006 (UTC)[reply]

I mean the templates the bot uses to tag the images. Shall they be kept in userspace or template space? About errors, yesterday I only saw one false error, because it was linked by a non standard flickrlink, but I will fix that. By the way, what I found, is that most images that have no exif on commons, do actually have exif on flickr. Also they are mostly downscaled. *smells a new project when this is done*. -- Bryan (talk to me) 19:48, 16 November 2006 (UTC)[reply]
  • What about the Flickr image first having a free license, but later being withdrawed by the author? The image would be still suitable for Commons as license changes are not legally retroactive, but we would have no means of checking that, as, AFAIK, Flickr doesn't keep a log of license changes. (This affects this bot but is a more general issue, and I don't know if it's been discussed before.) --Chewie 09:04, 16 November 2006 (UTC)[reply]
This issue has been discussed again and again... What this bot (and the existing human review system) provide is a way of confirming the license info for currently valid ones. All currently non-free images are placed in a possibly unfree image category; where we can decide what to do about them at a future date.--Nilfanion 17:15, 16 November 2006 (UTC)[reply]
This is precisely why a bot is a good thing. The sooner we can review licenses and track/record that they were in a good state, the fewer licenses will have changed on us before we recorded the good license. Changed images cause loss of images. So the bot, if it can speed up the process for easy cases, will help out loads! ++Lar: t/c 17:30, 16 November 2006 (UTC)[reply]
Thats what the EXIF check does. If the EXIF data is the same the image is OK, if they are different it needs a closer look.--Nilfanion 13:55, 18 November 2006 (UTC)[reply]
Wouldn't comparing MD5 checksums make more sense? Lots of images don't have EXIF data. —Chowbok 01:24, 12 December 2006 (UTC)[reply]
I have copied this comment to the bot's User_talk:FlickreviewR, as this debate is closed. -- Bryan (talk to me) 11:45, 12 December 2006 (UTC)[reply]