Commons:Bots/Requests/Wyangbot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Wyangbot (talk · contribs)

Operator: Wyang (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Batch uploading Ancient Chinese character images for use on Wiktionaries (PD-ancient).

Automatic or manually assisted: Mostly automatic (supervised).

Edit type (e.g. Continuous, daily, one time run): One-time run.

Maximum edit rate (e.g. edits per minute): 30/min.

Bot flag requested: (Y/N): Y.

Programming language(s): Python (Pywikibot)

Wyang (talk) 01:44, 8 July 2016 (UTC)[reply]

Discussion

  • Please don't make large batch uploads before this discussion end.
  • Please create bot user page (see {{Bot}}).
  • Please wrap all text English text in {{ACClicense}} in {{En}} (but this is not related to bot directly).
EugeneZelenko (talk) 13:57, 8 July 2016 (UTC)[reply]
Please help make this as prompt as possible - pages on the English Wiktionary are waiting for these images to be uploaded to display their contents properly (wikt:Module:zh-glyph). Newly uploaded entries are made to conform to existing formats. This is a one-time upload so even a temporary bot flag would be sufficient. Wyang (talk) 22:25, 10 July 2016 (UTC)[reply]
Pages on the English Wiktionary are patient (supposing they have an inner life), and so are English Wiktionary users. If the new format is a problem, just revert to the previous one. Michelet-密是力 (talk) 06:47, 16 July 2016 (UTC)[reply]

Any updates? It has been more than one week now. Entries on the English Wiktionary are being compromised. Wyang (talk) 02:10, 16 July 2016 (UTC)[reply]

Discussion 2 - objections

Some problems and objections, these uploads are not acceptable, at least in this format :

  • Massive uploads from public accessible databases are legally a problem. Any single character by itself is public domain, given its old age, but a database in itself is protected, even when publicly accessible, and cannot be copied as a whole. As said in Richard Sears's Agreement, the database owner has allowed extractions of characters one at a time, not massive uploads.
    Before doing something like that, it should be made clear either that Richard Sears agrees to it, or that there is no problem for such uploads (including diplomatic ones) whatever his objections may be.
  • (It seems I'm the only active ACC project contributor.) There is a problem with the ACClicense template, with respect to the ACC project conventions.
    1. Richard Sears has not given his agreement for such massive uploads. The mention «Courtesy & permission & Copyright from Richard Sears website © 2003 (see Richard Sears Agreement)» should not be used, it is usurped.
    2. To be usable with other wiki's models, within the ACC project, file names should be something like file:公-seal.svg, with the Chinese character in the first position of the name, not file:ACC-b01050.svg.
    3. If the files are to be used under the ACC project, they should rather be 300x300 px and the character should be 5 px from both edges (either vertical or horizontal, whichever is the longest dimension).
    4. Using the ACClicense template automatically gives some classifications, that mean that some more work is to be done on these characters, but they are unusable in that case. The category:ACC needing decomposition comes with characters whose decomposition have not been given (with the component1=... parameters), but no decomposition will be available for bot uploads. Idem, Category:-stroke ancient Chinese characters means that the "strokes=" parameter has not been filled (and counting strokes is obviously impossible for a bot). Using the ACClicense leads to clutter these categories with unwanted files.
  • With the upload format, there is no simple way to identify if a picture for a given Chinese character is available.
  • Richard Sears has indeed made a terrific job with his site, and thanks should be given to him for that, but the information he gives should be proofread before being used :
    1. His classification is not accurate and can be misleading. Characters may have composition variants, and the picture given on a page often relates to a variant, that is to say another unicode character (see for instance File:弈-bigseal.svg, found on the 奕 page instead of the 弈 one).
    2. There is no need ever to duplicate multiple instances of simple character (公 is an obvious example). Multiplicity is fine for Richard Sears' database, which intends to be exhaustive, but how can it comply with any of the Wikimedia project ?

Before allowing such uploads :

  • Do not use the ACC license.
  • Clarify the legal status of the massive database transfer.
  • Clarify the use of multiple instances with respect of a Wikimedia project scope.
  • Create a license template that reflects the legal status of the upload and the purpose of the picture series.

Past uploads should at least be modified as for the license being used. Michelet-密是力 (talk) 06:47, 16 July 2016 (UTC)[reply]

With some chances, a bot upload for small seal character would be OK, though, since there is almost never a multiple choice for them ; and in that case the ACC file format can easily be satisfied. Michelet-密是力 (talk) 07:05, 16 July 2016 (UTC)[reply]

Thanks for the various comments. I set up Template:ACC-PD-ancient as the license template for the uploaded entries, which differs from the ACC license template in the following ways:

  1. Removed the courtesy attribution to Sears' website. The ancient script images are public domain images, as their creators have long deceased and what we and Sears' did was merely digitalising the script forms, similar to the case of Template:PD-chem for chemical structures.
  2. Removed the requirements for component decomposition and stroke number. This information can be automatically later by bots if so desired.

You are correct in saying that one image can be mapped to multiple characters and association of characters with images may not be correct. This is the reason the images are uploaded as their identifiers, rather than the tentative modern characters they represent. A separate system has been uploaded on the English Wiktionary to identify each of the images with characters (and vice versa). This is stored at wikt:Module:zh/data/glyph-data, and such method of storage allows easy rectification. Multiple correspondences of an image are faithfully preserved.

The reason multiple images for a particular script for a character are uploaded is because all the script forms are different, some significantly so, and some subtly so. Displaying all possible variants of a character paves way for credible character shape origin/evolution explanations on Wiktionaries and allows for completeness. The script forms are hidden by default. Please see wikt:公 for an example.

Please let me know if there are any queries. Wyang (talk) 13:57, 16 July 2016 (UTC)[reply]

Discussion 3 - database protection & project scope

A/ With your creating Template:ACC-PD-ancient two of my four objections are answered. You could use that template on some of your uploads so the result can better be appreciated.

B/ Still two objections to go:

Clarify the legal status of the massive database transfer.

Remember that the characters being public domain is not the point: the database itself is protected as such. Indeed, Sear has “only” digitalized the pictures, AND his work is protected as such. You may legally copy part of a database (ie, some characters from time to time) but not all of it, as long as the database is protected. See w:Database Directive or here or anywhere there for information on legal database protection.

The comparaison with template:PD-chem is irrelevant, the pages thus uploaded on Commons are not a massive upload from an existing database. Most of these files actually appear to be "own works".

Furthermore, there has been a consensus within Commons to recognize Sear's work, both because it has been an awesome work, and because he authorized Commons to upload pictures from his database. This is why his pictures are given priority, even when equivalent pictures may be available elsewhere (the Chinese text project pictures come from an "ancient Chinese character" font, that has most probably been itself taken on Sear's database).

A massive upload would contradict this recognition and may appear as irrespectful. This is why I have a strong objection to doing such a thing. But if the Commons community changes its mind and thinks otherwise, my opinion could of course be outnumbered.

Clarify the use of multiple instances with respect of a Wikimedia project scope.

See Commons:Project scope : «The aim of Wikimedia Commons is to provide a media file repository [...] that acts as a common repository for the various projects of the Wikimedia Foundation». You can't upload things on commons unless it falls within the scope of a Wikimedia project - see Commons:Deletion policy#Out of scope.

The general purpose for the Wiktionary is given in wikt:Wiktionary:Welcome, newcomers: “We aim to include not only the definition of a word, but also enough information to really understand it”. The «etymology» section for a Chinese character is described in wikt:en:Wiktionary:About Han script#Etymology , its purpose is «explaining the development of the character form».

Your explanation saying that «all the script forms are different, some significantly so, and some subtly so. Displaying all possible variants of a character paves way for credible character shape origin/evolution explanations on Wiktionaries and allows for completeness» misses the point of justifying such inclusions: Sear's database collects all instances found in historical documents of such or such character, and the Wiktionary project has no need for such completeness. If differences are subtle, they are not needed for an etymology explanation; and if differences are important, they will be a source of confusion for the reader (and most of the time correspond to a different character and should relate to a different etymology).

What is needed for an etymological section is not to be exhaustive, but to provide the best selection of characters that will explain the original form and its developments along time. This is precisely what the ACC project is doing. Given the choice, the pictures for each kind of script are carefully selected according to their ability to explain the development of the character form, and the character decomposition is (normally) given to allow for comparisons within compound characters.

This is the system used on most Wiktionaries. This is the system that has been used so far on the English wiktionary. That is, before you modified the "Han etyl" template on the 8th of July, without ever discussing the opportunity of doing so. «A separate system has been uploaded on the English Wiktionary» - indeed, but it is your own work, made single-handedly, without any discussion nor consensus, and most probably outside the project scope.

Before arguing by yourself about the opportunity to force down your modifications to the English Wiktionary, please start a discussion and reach a consensus about your changes on the relevant English Wiktionary discussion pages - your arguments and my objections may be transferred there as well.

Michelet-密是力 (talk) 09:35, 20 July 2016 (UTC)[reply]

Thanks for the input. With regard to the license, Commons:Ancient Chinese characters/Richard Sears Agreement states that his images are released under the GNU General Public License (GPL), which is a free license guaranteeing others the freedoms to run, study, share (copy), and modify the original material. The Commons page notes:
"Please note: the images themselves are in the public domain and therefore not subject to the terms of the GPL. However, the GPL does apply to the database schema and Sears's methods of organizing the images. Additionally, as Sears has gone to a great deal of trouble to convert these ancient images into GIFs and is releasing his work under a free license, we should comply with his requests as faithfully as we can."
In other words, the GPL license cannot be applied to the images themselves - it is only the organisation of the images which can be subject to GPL. On the English Wiktionary, pages have duly noted the source of the images providing links to Sears' page. The template Template:ACC-PD-ancient can be modified to a GPL license too if that is needed.
Using one image from each script stage for a discussion on the evolution of character is simply grossly insufficient. There is so much variation even in a certain stage of characters which often invokes debates in the literature and difficulty in reaching a consensus as to what the character actually represents. Please have a read of the English Wiktionary's passage for our previous example of wikt:公#Glyph origin to see what is meant by this. Other examples include wikt:兮#Glyph origin, wikt:典#Glyph origin, wikt:共#Glyph origin, wikt:具#Glyph origin... plus many more, although I haven't got around to expanding on the explanations of the graphical evolution since it is halted by this. You would easily understand what I mean if you flip through any decent Chinese books on character origins; it's amazing how the ancient people only left us with these characters without ever clearly explaining why they drew them as such. The design may have been "obvious" to the original inventors, but the modern people simply have no idea, and the theories people put forth can be hugely divergent from one another.
In regard to the template change, I don't believe a discussion is very fruitful and necessary in this case. There is a betterment of the glyph origin sections and there is no opposition from the small Chinese-editing community on Wiktionary. There was no consensus for the initial format of the template "Han etym" on Wiktionary either. Wyang (talk) 13:10, 22 July 2016 (UTC)[reply]
You seem not to be willing to understand the problems, and you do not answer the objections :
  • Databases ARE protected as such, and a MASSIVE upload would BE ILLEGAL. See the references above and answer that point, without discussing the fact that glyphs by themselves are public domain, which is NOT discussed here.
  • Your examples (, , , , ) clearly demonstrate that in fact there is little variation on those examples, and that the pictures are presented systematically, without any selection. You do not need all instances for variation discussion. Contrarily to your sayings, your template is not meant to select or discuss variations.
If indeed your interest were to discuss graphical etymological variations, this is what you would be doing on the en:Wiktionnary, and you would quickly realize that the whole series from Sear's database is not needed for such discussions. See for comparison the French entries (, , , ) where the graphic meanings and possible variations are indeed discussed (for instance in ) without ever needing more pictures. The point is: When there is some variation, it can be discussed with a couple of pictures, and the whole picture series is not needed for that. Therefore you are not justified to upload the whole series using this (outstretched) "justification".
Michelet-密是力 (talk) 08:13, 23 July 2016 (UTC)[reply]

Veto

You are here to discuss whether Commons should authorize Wyangbot to proceed on massive uploads on Sear's database, that you started on Commons without discussion nor consensus, because you want to use it on the English Wiktionary through the "Han etyl" template, which you modified without discussion nor consensus on the English Wiktionary. On all projects it is bad practice to disorganize the project to prove your point. Your argument is obviously artificial and do not reflect the real use you intend on the English Wiktionary, and this goes way beyond the limits of "assume good faith". Given that you refuse to discuss the objections and oppose fallacious arguments to maintain your initial position, I change my  Disagree to a  Veto.

  • If you really want to prove that you are interested in discussing the "hugely divergent theories" on the wiktionnary page, just do it, discuss the (, , , , )-pages according to what you pretend to be doing, and we'll discuss the result when you're done.
  • Before arguing by yourself about the opportunity to force down your modifications to the English Wiktionary, and your uploads to the Commons database, please start a discussion and reach a consensus about your changes on the relevant English Wiktionary discussion pages - your arguments and my objections may be transferred there as well.
  • If no consensus is seen on the Wiktionary on your approach, in a month or two a mass deletion will be asked on your previous Commons uploads, for lack of relevance.

Michelet-密是力 (talk) 08:13, 23 July 2016 (UTC)[reply]

Let's be positive, though : the bot and the bitmap->svg transformation is a nice hack (wow), and it's indeed a pity not to use it. As said previously, uploadings limited to the small script characters would be without choices to be made between versions, OK with respect to project scopes, and OK legally and ethically, since these characters are found in various places such as Chinese Text Project or xiaoxue or internationalscientific or in the 北師大說文小篆 font. Michelet-密是力 (talk) 08:33, 23 July 2016 (UTC)[reply]
@Wyang: Please comment on the above suggestion. Thank you. --Krd 09:13, 23 July 2016 (UTC)[reply]

Wow. What can I say. Complete neglect of the GPL (license) status that the images are released under and the note that the images themselves are not even protectable under GPL on the agreement page. Obliviousness to the apparent variation that exists within a single script stage of Chinese characters (e.g. wikt:公#Glyph origin). Hostility towards new ideas and bullying of new users. Very, very disappointed. Wyang (talk) 21:41, 23 July 2016 (UTC)[reply]

I have to admit that even having read the whole request a few times I'm not sure how we shall proceed here. @Wyang and Micheletb: Is there any progress in other discussions related to this topic, and/or could you please sum up the current status? Thank you. --Krd 18:06, 26 August 2016 (UTC)[reply]

Hello, Krd, basically the question amounted to "is it ok to upload the whole of Sear's database on Commons". My objections were threefold:

  1. (legal problem) Though copying (some, limited in amount) free images from a public available database is indeed legal, there are laws against a substantial upload. Therefore, @Wyang: must demonstrate that the proposed upload is legal with respect to these restrictions.
  2. (ethical problem) Sear has explicitly agreed for our copying some of his pictures on Commons - but certainly not all of them in a single upload. The Commons community has agreed that he must be thanked for that, and that the uploads may be credited to him, under that agreement, and this is explicated in Template:ACC-PD-ancient. A massive upload goes beyond this authorization, and must receive both a new template and a different community agreement. This problem could be formally solved by using a different template (though the ethical part would remain a problem, subject to community agreement).
  3. (main formal internal problem) The Commons:Project scope states that «The aim of Wikimedia Commons is to provide a media file repository [...] that acts as a common repository for the various projects of the Wikimedia Foundation». @Wyang: Has not demonstrated that this upload would be within Commons project scope. Without such internal justifications, images are not allowed on Commons and can be speedily deleted (see Commons:Deletion policy#Out of scope).

As to now there has been no progress/answer ; but there is no need to hurry to "a mass deletion ... on [@Wyang: 's] previous Commons uploads, for lack of relevance"

Short version : no need to process before end September. Michelet-密是力 (talk) 19:02, 28 August 2016 (UTC)[reply]

As we're having October now, is there any update? --Krd 18:04, 4 October 2016 (UTC)[reply]
I'm closing this as declined, having no consensus to run this task. Please feel free to reopen this later. --Krd 09:20, 12 October 2016 (UTC)[reply]