Commons:Bots/Requests/IngeniousBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

IngeniousBot (talk · contribs)

Operator: Premeditated (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought:

For importing Wikidata statements image (P18)media legend (P2096) into the images captions data. Starting with images of human (Q5). Then later on logo image (P154) and each site with a image (P18)media legend (P2096) connection. The edits will follow Commons:File captions example. There is no designated tools for structured data on Commons yet. So, I would use the API.

Automatic or manually assisted: Automatic (supervised)

Edit type (e.g. Continuous, daily, one time run): Continuous until every possible media legend (P2096) is added.

Maximum edit rate (e.g. edits per minute): 50 edits then 1 min break/sleep

Bot flag requested: (Y/N): Y

Programming language(s): Python (pywikibot)

Premeditated (talk) 18:28, 10 August 2019 (UTC)[reply]

Discussion

The bot is btw not autoconfirmed. Needs that to make the test edits.- Premeditated (talk) 18:58, 10 August 2019 (UTC)[reply]

✓ Done --EugeneZelenko (talk) 13:24, 11 August 2019 (UTC)[reply]
Off-topic for this request, but may be you would be interested in implementing Commons:Bots/Work requests#List of Wikidata items without image? --EugeneZelenko (talk) 13:29, 11 August 2019 (UTC)[reply]
Thanks EugeneZelenko, have now done some test edits. Interesting request. I'll maybe look into that another time. - Premeditated (talk) 22:19, 11 August 2019 (UTC)[reply]
Looks OK for me as long validity of caption is assumed. But it's hard to tell how easy cases like File:Fabio Carpi 02.jpg could be handled. --EugeneZelenko (talk) 14:06, 12 August 2019 (UTC)[reply]
Yeah, that was kind of an unfortunate edit. The script rely on other contributors edits and it's difficult to quality control all of them. I could restrict the text to a minimum length, but i'm also importing non-latin languages who makes it difficult. A lights dictionary could be included, but i'm not sure if that will make a difference. - Premeditated (talk) 15:28, 12 August 2019 (UTC)[reply]

Is there a way to directly tell mediawiki to use the information on WikiData without a bot back-importing it? Appart from this general aspect, could you please explain this edit [1]. The diff shows no difference. Also, could you please explain how the bot will handle situations hwere the is already a captation in Commons that is different from the one used in WikiData. Thanks --Schlurcher (talk) 09:45, 12 August 2019 (UTC)[reply]

Hi Schlurcher, thank you for the question. The Structured Data on Wikimedia Commons is Wikibase support in Wikimedia Commons. Wikibase is the technology that is also used for Wikidata, but is two different databases. So, after my understanding it has to be imported. License wise are both CC0. The edits that you reference show the hard part about this bot task. How to remove HTML and Wiki markup and still keep a short, multilingual descriptions about files. Often is HTML markup information not wanted for captions, so I remove all of it. In this edit [2] HTML markup is used as an alternative to en:Template:Credit, and should not be used for captions (in infobox usage). When there is no caption to add the API should not try to add “nothing”, so I will fix that. The script makes a request to determine what already exist of captions, and skip adding that. - Premeditated (talk) 10:18, 12 August 2019 (UTC)[reply]
Thanks for confirming. Based on what I understand this bot will perform a significant amount of edits. Do we know how many media legend (P2096) exist?
In any case, we should make sure we capture some more of the odd cases. Could you please perform an extended testrun for another 300 or so edits. Optimal would be 100 each from human (Q5), logo image (P154) and image (P18) as per your intended task. Please add (test edit) to the edit sumaries, preferrably with a link to this discussion here to both motivate people to contribute to this discussion as well as make them understanding why the edits are not hidded from the watch pages. --Schlurcher (talk) 11:48, 13 August 2019 (UTC)[reply]
Thanks for the request. I have now ran 50 edits each for images of human (Q5), logo image (P154), and non-human images (image (P18)). I manage to run the old script for images of human (Q5), so there could be some edits with nothing in it, but it is fixed in the new script which I used on the others.
Result:
Summery start with "TEST EDITS (Q5);" are for images of humans. Gives 23392 pages. Finds some hungarian media legend (P2096) that only consist of "%YEAR%-ben". Example "2010-ben". Not very informative.
Summery start with "TEST EDITS (P154);" are logos. Gives 300 pages. Looks OK for me.
Summery start with "TEST EDITS (P18);" are for images/P18 of non-humans. Gives 42983 pages. Looks OK for me. - Premeditated (talk) 23:34, 13 August 2019 (UTC)[reply]
I would expect Commons name + Scientific name, and (if specified) sex, age, action in description of images like File:Catostomus occidentalis.jpg. Probably it'll be good idea to postpone caption creation for media where data Commons scheme could be easily formalized (flora/fauna, architecture). --EugeneZelenko (talk) 14:03, 14 August 2019 (UTC)[reply]
Sorry for the late respons. I disagree with your opinion. The captions is not the description. The captions should be simple and short, like described at Commons:File captions. Most of media legend (P2096) is.
My point was that caption should include most relevant information, like species, not circumstances how photo was taken. --EugeneZelenko (talk) 14:12, 17 August 2019 (UTC)[reply]

Hi, thanks for caring. I stumbled across this edit of your bot.

  1. The simple question is about wiki-markup like italic (''italic'') and bold ('''bold'''). Through due the very nature of these markups (no tags), they survive your stripping of wiki-markup and are kept in your added caption. Depending on the usage of the caption italic and bold texts will be shown when the caption is shown. How to proceed? If the rule is: no wiki-markup, than I feel that these markups have to be removed too. If there applies a different rule, well, it is not described in your bot's description. Maybe this has to be clarified for captions in general.
  2. In general, how will the bot deal with updates of media legend (P2096) on wikidata side? Is the creation of captions a one-time shot creating two copies of identical information that are drifting apart after some time, or will your bot keep these information in sync, as long as there is no human intervention on the other side? It is quite tedious to run after bot edits as a human, in fact it is impossible, beyond our overall human resources. What will happen, when additional language media legend (P2096) is added, stuff gets deleted, or content will be changed? Is there an invariant condition your bot will try to achieve?
  3. is there any idea on how to avoid garbage in - garbage out (like with the Hungarian "2010-ben")?

regards --Herzi Pinki (talk) 09:13, 14 August 2019 (UTC)[reply]

Hello Herzi Pinki, sorry for the late respons.
  1. This syntax didn't trigger the tag wikisyntax. So I have added this manually. I will fix this for those that had been added.
  2. This sound like a feature the developers should consider making.
  3. At this point, no. Guess we just have to trust the users. Something is better than nothing. - Premeditated (talk) 18:47, 16 August 2019 (UTC)[reply]

Approved. --Krd 15:16, 13 September 2019 (UTC)[reply]