Commons:Bots/Requests/Dexbot 5

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Dexbot (talk · contribs)

Operator: Ladsgroup (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Adding {{Information}}

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): one time run

Maximum edit rate (e.g. edits per minute): 60/min

Bot flag requested: (Y/N): N

Programming language(s): python

Amir (talk) 02:07, 25 December 2014 (UTC)[reply]

Discussion

It's another step in metadata clean up (Further discussions). It only fixes when the description is one line and consists more than 5 words. The bot detects language of the description (that's why the description has to be more than 5 words) and adds proper information template. I did ~50 edits for test [1] you can check.

I'd rather we don't make up explicit {{Own}} claims if the uploader made no such assertion. File:Denmark-Norway and possessions.png, for example, is clearly not the uploader's own work, but based on a pre-existing blank world map. LX (talk, contribs) 10:27, 25 December 2014 (UTC)[reply]
I've looked at the edits. Can you except certain templates? In this edit the bot added only the template to the description, while this template would be on a better place outside the information template. Other issues were found with this edit were the uploader claims another date, however the exif-date aggressor with you thus some weird edge case you can't possibly watch out for (other one same uploader). Same story for this one (1 year and 1 day of). this edit has the same source issue as mentioned by LX, but I don't see how that's fixable. And finally the most important one: this edit. This file was imported from en-wiki and given a "self" template, however the uploader is not the author of the work! This is going to happen with quite some files and is a big error. Some fixes could be to exempt descriptions with a user name (linked either this wiki or cross-wiki) which is not the same as the uploader. And exempt descriptions which say transferred/en wiki etc. Mvg, Basvb (talk) 12:22, 25 December 2014 (UTC)[reply]
The source issue is fixable by not making stuff up and instead leaving the source field blank. That will put the files into the maintenance category Files with no machine-readable source, which is where such files belong until they're manually fixed (or deleted, where appropriate). LX (talk, contribs) 13:21, 25 December 2014 (UTC)[reply]

@LX and Basvb: For the first issue: I will skip descriptions that consists only template(s). About the EXIF differences: I have no idea how to fix or skip them and IMO corrupted EXIF is not our problem. and about the {{Self}} template: I can skip if someone else is mentioned in the text but for other cases I think it's not a problem that should be considered at this level. Honestly I think it's better to mark them with {{Own}} because it makes these data a machine-processable data and it will be easier to find errors i.e. Using {{Self}} is an error but when we mark them with {{Own}}, errors are become easier to find, Specially in the future when new system of metadata is being used Amir (talk) 13:00, 26 December 2014 (UTC)[reply]

I agree that a self license should hold some form of own work from a Wikimedian (could also be an imported file from another project). I forgot to say in my last reaction that I really like your language determination, that seems to work well. Mvg, Basvb (talk) 13:15, 26 December 2014 (UTC)[reply]

Amir, what are you using for language detection? --Dschwen (talk) 22:20, 5 January 2015 (UTC)[reply]

@Dschwen: Hey, I use langdetect and accept the result if the number is more than 90% (usually it's less 70% or more than 99.999%). Amir (talk) 17:27, 6 January 2015 (UTC)[reply]

If there are no further issues, I suggest that we close this request as successful. --99of9 (talk) 04:54, 5 February 2015 (UTC)[reply]

I stand by my objection against making up source and authorship claims when no such claims were made by the uploader, which the bot is still doing. (In the linked example, from looking at the user's other uploads, it's probably a correct assumption, but the bot seems to just be guessing blindly.) I don't see how finding errors are made any easier by having to dig through the file description's page history to see that the uploader never actually claimed what we now claim about these files. If you want to make errors obvious, make it clear that these are assumptions made by a nonthinking entity. ("Source: No machine readable source provided. Own work assumed based on copyright claims." / "Author: No machine readable author provided. User:Example assumed based on copyright claims.") LX (talk, contribs) 22:04, 5 February 2015 (UTC)[reply]
@LX: Help me understand the issue a bit more. The previous version of that file had {{Self}}, which states "I, the copyright holder of this work, hereby publish it..." Isn't that a strong statement of both authorship and source? --99of9 (talk) 23:08, 5 February 2015 (UTC)[reply]
No, it is not. There are several reasons why someone who is not the author may be the copyright holder: inheritance, works made for hire and copyright transfer come to mind. A person who takes a work for which the copyright has expired and makes copyrightable modifications to it would also be the sole copyright holder but not the sole author of the resulting work. Also, while it's demonstrably far from a guarantee that people won't wilfully misspeak, requiring users to assert that they are the author, that they are uploading their own work, and that they are the copyright holder has certain practical benefits. It's harder for copyright violators to feign ignorance and claim that they simply didn't understand that at least one of the three claims were false, and it gives the truly ignorant three different paths to enlightenment. As a consequence, it provides a little more security for reusers. I assume it's for these reasons that Commons:Licensing#License information states that author/creator of each media file must be provided on every file description and that "A generic license template which implies that the uploader is the copyright holder (e.g. {{PD-self}}) is no substitution for this requirement." (Original emphasis as used in the policy.) LX (talk, contribs) 01:04, 6 February 2015 (UTC)[reply]
Ok thanks, I now understand which angle you are working. I agree this will cause trouble when owner!=author. I'm not sure I agree that requiring a further assertion of authorship is realistic now for long-uploaded files with long-gone uploaders. In those cases, either a bot cleans up based on (usually reasonable) assumptions, humans clean up one-by-one based on (better?) assumptions, we leave messy forever, or we delete. I take it you favour one of the middle two options? --99of9 (talk) 01:28, 6 February 2015 (UTC)[reply]
I'm fine with the first option as long as the assumptions are clearly identified as such, ideally using templates for automatic categorisation and translation. That not only benefits human readers, but it also helps future, more specialised bot tasks, human cleanup efforts and (when appropriate) deletion processes. The statement that license templates do not constitute source or authorship information has been explicit in the licensing policy since June 2007, so while it may not be realistic to expect retired users to come back and improve their file descriptions, files (directly) uploaded to Commons after that date without explicit source and authorship information should be considered more problematic than older uploads. Not being able to automatically make such distinctions is, in my opinion, far messier than the current state of affairs. LX (talk, contribs) 07:06, 6 February 2015 (UTC)[reply]

@Ladsgroup: Could you please comment on the objections raised? Thank you. --Krd 17:15, 11 June 2015 (UTC)[reply]

It would be good if we have a note added by bot stating "Own Workd (based on license)" (like based on EXIF data; which can be clearly wrong lots of times). I easily add it the code if people agree on this Amir (talk) 17:09, 15 June 2015 (UTC)[reply]
I don't think that wording would be at all clear to most reusers. I stand by the wording I originally proposed above, but as long as it's done via a template, the wording can be changed and translated, so that's the main thing. LX (talk, contribs) 19:19, 15 June 2015 (UTC)[reply]
@LX: Can you create the template? Thanks Amir (talk) 14:46, 19 June 2015 (UTC)[reply]
I've created {{Own assumed}} and {{Author assumed}}. They could use a few more translations. LX (talk, contribs) 23:01, 19 June 2015 (UTC)[reply]
I switched the first one to use Translate, in order to get more translations. Jean-Fred (talk) 12:26, 21 June 2015 (UTC)[reply]
Doesn't seem to have worked. Now it just renders as a redlink: Template:Own assumed/i18n/en. LX (talk, contribs) 17:26, 21 June 2015 (UTC)[reply]
...so I reverted it. Feel free to redo whatever you tried to do, but please make sure the result is something that's in a usable state. Cheers, LX (talk, contribs) 19:40, 23 June 2015 (UTC)[reply]
Sorry for the overlook (I had made sure it worked in my interface but did not check English) − I fixed it now (phab:T56579 keeps biting me). Jean-Fred (talk) 20:40, 23 June 2015 (UTC)[reply]

@LX: I made about 50 edits based on your template. Does it look good? [2] Amir (talk) 20:44, 30 June 2015 (UTC)[reply]

It's better, but the authorship information is still made-up, and that should be pointed out as well. Please use {{Author assumed}} as well. Thanks, LX (talk, contribs) 20:51, 30 June 2015 (UTC)[reply]
@LX: Made another 50 edits. [3] Is it good now? Amir (talk) 18:47, 1 July 2015 (UTC)[reply]
Hi Amir! Yes, based on the sample I looked at, I have no objections at this point. Thanks, LX (talk, contribs) 18:55, 1 July 2015 (UTC)[reply]
Looks fine for me too. --Steinsplitter (talk) 18:58, 1 July 2015 (UTC)[reply]

Again then, if there are no further objections, I suggest we close this request as successful. --99of9 (talk) 00:58, 2 July 2015 (UTC)[reply]

Approved. --Krd 16:31, 7 July 2015 (UTC)[reply]