Commons:Bots/Requests/WLKBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

WLKBot (talk · contribs)

Operator: WLKBot (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information) User:Kim Bach is the operator, he can be contacted through commons mail, User:MSanderhoff can also be contacted.

Bot's tasks for which permission is being sought: Upload of public domain images of artwork and related metadata, using the Template:Artwork of artwork provided through partnerships with Danish GLAMS.

The first partnership is with SMK - The National Gallery of Denmark a project page is created for the purpose Commons:SMK - Statens Museum for Kunst

For the SMK contribution, the images are downloaded and metadata is mapped to the Artwork template, utilising the Statens Museum for Kunst, SMK API: entry KMS1 and the wikilabskultur Artwork template preprocessor.

Automatic or manually assisted: The bot is manually assisted and will most likely be running from a stand alone computer

Edit type (e.g. Continuous, daily, one time run): One time run (several batches)

When: The bot operates at specified intervals (batches). Activity depends on the availability of new batches from GLAMs, it can be started on demand by the local operator, on a stand alone pc

Maximum edit rate (e.g. edits per minute): Most likely 10-50 edits per minute, but only in short bursts, and it can be set to a much lower rate. The first proposed upload will be in the form of approx. 500 files of 10-30MB in size, time to finish is not of the essence, the total size of the contribution from SMK currently is approx. 70.000 images, the collection is expected to be quite static.

Bot flag requested: (Y/N): Y

Programming language(s): The bot is written in Python using the pywikibot library

The source code is available on GitHub


WLKBot (talk) 17:13, 16 December 2021 (UTC)[reply]

Discussion
First test run, a total of 25 images uploaded. --WLKBot (talk) 09:44, 20 December 2021 (UTC)/User:Kim Bach[reply]
Could medium be internationalized with template or Wikidata item? If not, please use language tag. Same for object type like Blyant. Could bot add painting by or more detailed category? Please also make batch categories hidden. --EugeneZelenko (talk) 16:04, 20 December 2021 (UTC)[reply]
This file should have a default sortkey (Defaultsort) "Købke, Christen, 1838" as we generally sort items by the creator's last name.
It should have a category of "Paintings by Christen Købke in Statens Museum for Kunst" with a local sortkey of "|1838" (after the category name). I did that manually.
It should have a category of "Paintings by Christen Købke" with a local sortkey of "|1838" (after the category name). Is this not redundant? No, because those paintings will eventually be divided up as landscapes, marine paintings, portraits and so on.
It might have a category of "19th-century paintings in the Statens Museum for Kunst". Note: Here it is the Statens Museum for Kunst. Don't ask.
It might have a category of "Landscape paintings in the Statens Museum for Kunst".
It might have a category of "1838 paintings from Denmark".
Happy programming. Cheers Rsteen (talk) 12:54, 21 December 2021 (UTC)[reply]
Thank you for your comment, and the detailed example, it's very helpful, I can definitely improve on the categorisation along the lines you suggested WLKBot (talk) 21:00, 22 December 2021 (UTC)[reply]
@Kim Bach: please don't use your bot account to make manual edits like the previous comment.
I enjoyed visiting the SMK and looking forward to having all this content on Commons! Multichill (talk) 18:29, 1 January 2022 (UTC)[reply]
@Kim Bach and MSanderhoff:  ? --Krd 04:26, 5 February 2022 (UTC)[reply]
@Krd I assume that the "?" means that you'd like an update :-). We're busy implementing the changes suggested by the community, and we will not run another test until we're confident in the changes we're making, this should only be weeks away. Kim Bach (talk) 13:59, 6 February 2022 (UTC)[reply]

Please report current status and intentions. --Krd 15:12, 28 November 2022 (UTC)[reply]

@Krd
Current status is that we've addressed most of the suggestions by EugeneZelenko and Multichill and some of the suggestions by RSteen
This is a list of the changes we've made
  • medium has been wrapped in language template and will be using Wikidata items in test and production runs
  • object type has been wrapped in language template and will be using Wikidata items in in test and production runs
  • painting/... by categories added
  • batch categories made hidden
  • No longer adds Category:Images released under the CC0 1.0 Universal license by Statens Museum for Kunst
  • Wraps the medium field in {{Da}}, will move to Wikidata items in test and production runs
  • We'll be adding structured data using QuickStatements after we've uploaded the images, we will add the Wikidata-item to the Artwork template when we have created the Wikidata-items or identified already existing Wikidata-items
  • We've created two sections for the copyright and permissions
  • We're no longer using Creator template, we've switched to using Wikidata items, and will not upload images that has no Wikidata item for creators. The idea is to add the ones that are missing to Wikidata as we go along, using QuickStatements
  • We've added ...by... categories, but we think that most of the might have suggestions by Rsteen are better handled through structured data, which we'll be adding using QuickStatements
Below is an example of a file that was uploaded manually, where the Wikitext was generated by our updated code, and used to improve the code, we'll perform a few more of these before doing a test run, but we feel like we're ready
Giovanni Battista Piranesi, Det indre af Pantheon, 1768, KKSgb9860-86, Statens Museum for Kunst
The intention is to resume testing, firstly by doing some more manual uploads, in the context of Kim Bach, in preparation of new test runs.
Pending approval we'll move slowly to production runs, the plan is to do that one creator at a time.

--Kim Bach (talk) 05:32, 30 November 2022 (UTC)[reply]

Please feel free to do a small test run. Krd 05:59, 30 November 2022 (UTC)[reply]
@Krd We've made a test run of 20 new media files. We're of course referencing a number of new categories, we should probably create those as well. Maybe as we go along, maybe at a later point in time.
We're referencing a number of object types that aren't created yet. It looks like this is done in LUA code. I do have all the Q item numbers of the object types we're using on hand, but we're also planning on adding structure data using QuickStatemenst, we're currently testing that.
Regards --Kim Bach (talk) 10:05, 3 December 2022 (UTC)[reply]
Since you (EugeneZelenko, Multichill and RSteen) commented on our first test run last year, I’d like you to know that we’re completed our second test run, and that we”ve tried to address most of your suggestions. —Kim Bach (talk) 09:17, 6 December 2022 (UTC)[reply]
object type should be internationalized. Also excessive indentation for license tags and newlines between license tags and categories. --EugeneZelenko (talk) 16:10, 6 December 2022 (UTC)[reply]
Had a quick look, nice images! My points
  • I see a lot of redundant white space. Please trim a bit more,
  • I'm not a huge fan of extra information fields. What's the point of adding "SMK record created", "SMK record modified" and the (broken) iiif link?
  • You're getting a warning because you are using {{PD-old}}. I guess most works are covered by {{PD-old-100-expired}}?
  • In the upload edit summary I would put something like "uploaded artwork from https://open.smk.dk/en/artwork/image/KKSgb22228" instead of "created artwork"
Multichill (talk) 18:09, 6 December 2022 (UTC)[reply]
Thank you Multichill, I've incorporated your suggestions and I'll be omitting the "other fields" in the future, they were meant for internal bookkeeping (created and updated timestamps of metadata from SMK)
I've updated the wikitext for this image, trying to take your and Eugene's suggestions into account
Tobias Stimmer, Romerne indtager Satricum, 1574, KKSgb22345, Statens Museum for Kunst
Kim Bach (talk) 22:22, 6 December 2022 (UTC)[reply]
Thank you EugeneZelenko, I can address your comments in this way:
Internationalisation of object type
I suppose you mean that I should use this:
  • object type={{en|Woodcut print}}
Excessive white space
I can beautify the wikitext along these guidelines:
  1. No leading whitespace
  2. No spaces between equal signs
  3. No double line feeds
For instance:
{{Artwork
|artist=<Artist Name>
|title=<The Title>
|description=<The Description>
...
}} Kim Bach (talk) 19:03, 6 December 2022 (UTC)[reply]
On my opinion spaces after pipes make text more readable. --EugeneZelenko (talk) 15:39, 7 December 2022 (UTC)[reply]
Agree, bit too much trimming see Template:Artwork#Usage for a good example how it should look. Multichill (talk) 18:18, 7 December 2022 (UTC)[reply]
Thanks, I’ll go for that, is at ends with the suggestion by Eugene (space after pipe, I guess that’s not to importerne.
Another possible issue is the new lines. They flush left, and it doesn”t look good. Is there a way to add indentation that doesn’t show up in the rendered wiki page, I suppose the List templates could be used for that purpose. Kim Bach (talk) 14:40, 8 December 2022 (UTC)[reply]
I've made changes that tries to accommodate your standard usage suggestion - this is a "bit" confusing because the Usage sample and the Multilingual sample differs :-/ - and the suggestion with spaces trailing pipes by EugeneZelenko. I've updated the wikitext for the Tobias Stimmer image to reflect the suggested changes :-) Kim Bach (talk) 11:09, 9 December 2022 (UTC)[reply]
Title consists from two tags. Is it possible to merge title in two sentences in one tag? Also there is d:Q18219090, so will be good idea to use it instead of text. --EugeneZelenko (talk) 15:54, 9 December 2022 (UTC)[reply]
Multiple titles
The reason we have two titles for this particular item, is that the museum DB can have several titles for an item.
The best thing to do, might be to limit it to one - since one of the titles is considered official by the museum. We could use that, and then, at a later point in time, add more titles to wikidata, we could also change it to one line, like you suggested.
Using wikidata
Wikidata, works fine for the medium tag wrapped in the Technique template, we can do that, since we've already mapped the Q-numbers.
Is it possible to use Q-numbers for the object type too?
We've tried it, with no luck in nailing the syntax. Kim Bach (talk) 22:24, 9 December 2022 (UTC)[reply]
Looks like Module:Artwork should be enhanced. --EugeneZelenko (talk) 15:17, 10 December 2022 (UTC)[reply]
Hi. Take a look at this upload File:Elisabeth Jerichau Baumann, En såret dansk kriger, 1865, KMS852, Statens Museum for Kunst.jpg. The dimensions are not ok. They are in mm instead of cm. Do not know if this is a general error, and have not seen any comments on it before. Cheers Rsteen (talk) 10:35, 20 December 2022 (UTC)[reply]
Thank you, yes, that was a general error in the code from last year, everything was a factor 10 off, as you noticed, we've fixed that since. I forgot to fix this manually in the batch from last year, so I'll do that. Kim Bach (talk) 15:57, 21 December 2022 (UTC)[reply]
An updated status. We're currently refactoring the category generating code, this will most likely not be ready until january 2023. We still welcome comments to the last test run. Kim Bach (talk) 16:10, 21 December 2022 (UTC)[reply]
@Krd
We've refactored the code, and are now ready to perform a new test. The main change is that we now try to create new category pages, if they don't exist Kim Bach (talk) 21:31, 11 January 2023 (UTC)[reply]
@Multichill and EugeneZelenko we're ready to perform a new test run :-) --Kim Bach (talk) 14:31, 13 January 2023 (UTC)[reply]
Just to clarify: was Module:Artwork improved meanwhile? --EugeneZelenko (talk) 15:59, 13 January 2023 (UTC)[reply]
No, it looks like the procedure involves requesting an edit, I wasn't aware of that, I've added a comment on the talk page. Kim Bach (talk) 20:31, 13 January 2023 (UTC)[reply]

Can anybody please summarize what is the current status and what exactly is missing? --Krd 06:41, 29 January 2023 (UTC)[reply]

Indeed, as you might have noticed from the discussion above, what is missing is a (correct) mapping of the "object type" field.
We've been in discussion with the maintainers of the artwork template, unfortunately the field is not supporting Wikidata Q-items directly, so we'll have to add these to the mapping table, all are P31-tems.
Once we've done that - a few hundered - we're ready for at new test run.
I'll update you on the progress later this week. Kim Bach (talk) 14:11, 31 January 2023 (UTC)[reply]
I've drafted a version of the object type data table, where I've added the object types I was missing, you can find it here User:Kim_Bach/sandbox/object_type_data. I've asked the maintainers of the Artwork template to review it.
Kim Bach (talk) 17:15, 4 February 2023 (UTC)[reply]
Multichill can you help me by reviewing my suggested changes to the mapping table as drafted here User:Kim_Bach/sandbox/object_type_data. —Kim Bach (talk) 22:21, 9 February 2023 (UTC)[reply]
@Krd:
I've opened a discussion here I need help with review and approval of suggested changes to Module:I18n/objects/data and here I need help with review and approval of suggested changes to Module:I18n/objects/data
Kim Bach (talk) 08:02, 8 March 2023 (UTC)[reply]
Sadly I have no idea how to put this forward. Any additional ideas welcome. Krd 11:25, 6 April 2023 (UTC)[reply]
Thank you, I had some good suggestions from the community and I will work along those lines. I’ll post an update when the changes are made. Kim Bach (talk) 10:00, 7 April 2023 (UTC)[reply]
@Kim Bach: sorry, I completely lost track of this. I thought we had some minor points left and the bot was ready to go. I updated the module. Is that the last thing? Maybe do some test edits now as a final check? Multichill (talk) 16:39, 8 April 2023 (UTC)[reply]
Thank you, and that is understandable, it has been a long process, and I have been very cautious. I just had help from MGA73 and I'm compiling a new list, that isn't such a mess when trying to apply a diff. Kim Bach (talk) 12:44, 10 April 2023 (UTC)[reply]
@Kim Bach: . The module has been updated so hopefully it should include the things you suggested earlier. --MGA73 (talk) 13:04, 10 April 2023 (UTC)[reply]
Once again thank you for the great effort, I'll check and add items that might be missing, but we should be good to go! Kim Bach (talk) 13:18, 10 April 2023 (UTC)[reply]
I've checked the updated list, and all the additions I wanted are indeed present! Kim Bach (talk) 14:09, 10 April 2023 (UTC)[reply]
@Kim Bach: Great. Then you could perhaps upload some files for final test? --MGA73 (talk) 17:54, 11 April 2023 (UTC)[reply]
Thanks, I'll do that next week, I'm currently busy doing volunteer work for DepressionsForeningen :-) Kim Bach (talk) 11:35, 23 April 2023 (UTC)[reply]
I've just completed a test run with 20 images Kim Bach (talk) 20:53, 27 April 2023 (UTC)[reply]

Test of 2023-04-27 There was an issue with the "object type" property, I've corrected the error, and updated the new images, so they should reflect the changes in the code--Kim Bach (talk) 07:21, 28 April 2023 (UTC)[reply]

@Kim Bach: I see that you're using {{PD-old-auto-expired}}, but not setting deathyear. Either fill it or just use {{PD-old-100-expired}} directly to correct this. Multichill (talk) 15:12, 28 April 2023 (UTC)[reply]
I've updated the code, and I'm ready for a new test run Kim Bach (talk) 18:50, 29 April 2023 (UTC)[reply]
If you assign categories automatically, you need to follow the general naming convention for categories more closely. Take File:Tobias Stimmer, Gaius Duillius sejrer i et søslag over kartaginienserne, 1574, KKSgb22331, Statens Museum for Kunst.jpg. It is categorized as Woodcut prints by Tobias Stimmer in the Statens Museum for Kunst. The normal category would be Woodcuts by Tobias Stimmer in the Statens Museum for Kunst. It is also categorized as Graphics by Tobias Stimmer in the Statens Museum for Kunst. That category is not necessary and represents overcategorization. The category Woodcuts by Tobias Stimmer in the Statens Museum for Kunst should lead to Woodcuts in the Statens Museum for Kunst, which again should lead to Prints in the Statens Museum for Kunst. Try and take a look at how it is done at the Rijksmuseum, which is more or less the gold standard. Take Category:Prints in the Rijksmuseum Amsterdam. Here all the different types of prints are gathered and you can go to subcategories by artist, by century and so on. We should attempt something similar. Cheers Rsteen (talk) 11:51, 1 May 2023 (UTC)[reply]
Thank you. @Rsteen: What we need to do, is to look much closer at the category mapping, as you suggested, it might not be that diffocult. Do you understand JSON format? Kim Bach (talk) 04:01, 2 May 2023 (UTC)[reply]
Absolutely. And thank you for your work. There is a goldmine of artworks at Statens Museum for Kunst. Cheers Rsteen (talk) 05:05, 2 May 2023 (UTC)[reply]
So...The strategy would be to create categories similar to the ones the Rijksmuseum Amsterdam has created, and then create a mapping table between those categories and the data returned by SMKAPI. Taking into consideration that some existing categories are not using the naming convention. I'll propose something.
I've created a navigation box based on the Category:Rijksmuseum Amsterdam the draft is located here User:Kim Bach/sandbox/Statens Museum for Kunst, it should be a template instead.
Statens Museum for Kunst uses a total of 128 unique categories, Rijksmuseum Amsterdam has 38 main categories in their navigation box. I will work on some kind of consolidation, like the 38 categories in the navigation box. I'm sure the art historians at SMK can be helpful in grouping the categories. Kim Bach (talk) 11:50, 3 May 2023 (UTC)[reply]
I have attempted to create the category hierarchy needed for the woodcut prints, the file I've updated File:Tobias Stimmer, Gaius Duillius sejrer i et søslag over kartaginienserne, 1574, KKSgb22331, Statens Museum for Kunst.jpg has been updated to use the new categories.
I'd like you to help with a review before I write the code to achieve this.
These are the categories I've created (phew - and this is just a start - dates, decades, centuries etc. still are missing):
Kim Bach (talk) 14:29, 5 May 2023 (UTC)[reply]
Thank you for your kind assistance, I can see that I'm creating a lot of work for you. I'll be more careful before updating in production in the future. Kim Bach (talk) 21:36, 5 May 2023 (UTC)[reply]

I've been working with User:Rsteen on the categorisation, and we've come to an agreement on how to proceed.

So I hereby request permission to perform a new test upload. Note that the new version of the bot will create artist and object related categories on the fly, if they're missing. --Kim Bach (talk) 14:21, 2 June 2023 (UTC)[reply]

Please do. Krd 14:43, 2 June 2023 (UTC)[reply]
A new test batch of 20 images has been run Kim Bach (talk) 04:47, 4 June 2023 (UTC)[reply]
What is the conclusion? Was the test successful? Krd 06:36, 9 June 2023 (UTC)[reply]
Kim Bach I noticed that on for example File:Tobias Stimmer, Tarquinius Superbus og Sextus' udsending i haven, 1574, KKSgb22317, Statens Museum for Kunst.jpg there are 2 x description in Danish. The language is however German. But as far as I can tell the problem is that the API mislabel the German description as Danish. So I would say it was a success. Do you agree or do you see other issues? --MGA73 (talk) 08:03, 10 June 2023 (UTC)[reply]
I'll look into that issue.
The only thing I'm working on is adding DEFAULTSORT templates. Kim Bach (talk) 07:06, 18 June 2023 (UTC)[reply]
@MGA73:The title that is in German, but is marked as being in Danish is caused by the data returned by the API.
@Rsteen: I've added the DEFAULTSORT template
@Krd: I'm ready for a new test run. Kim Bach (talk) 20:09, 9 July 2023 (UTC)[reply]
Please make a new test run. Krd 15:04, 16 July 2023 (UTC)[reply]
@Krd, @Rsteen, @Multichill, @EugeneZelenko:
I've finally made a test run with 20 items, the first being File:Ugo da Carpi, Pan og Apollon og Marsyas , , KKSgb22416, Statens Museum for Kunst.jpg, the reason it shows to commas in the file name is because the date of the work is unknown, it might be better to write "no known date" or something like that, the script also created new categories on the fly, for instance Category:Prints by Ugo da Carpi in the Statens Museum for Kunst Kim Bach (talk) 17:11, 14 August 2023 (UTC)[reply]
@Rsteen: I've noticed that you're changing "Collections of the Statens Museum for Kunst" to "in the Statens Museum for Kunst by artist", I guess I should update the code to reflect that, I must admit that I'm not that good at LUA that I understand this short hand Kim Bach (talk) 07:17, 18 August 2023 (UTC)[reply]
Hi Kim. Yeah, but that is not a big problem. I think you should make the full run. After that, categories can be streamlined if needed. Cheers Rsteen (talk) 16:43, 18 August 2023 (UTC)[reply]
Thanks, pending further feedback, I'll start running batches of say 100 items, since there are a few calls to wikidata and commons APIs, stepping it up if no major issues are detected Kim Bach (talk) 08:33, 19 August 2023 (UTC)[reply]
I've noticed that WLKBot has received "Bot status", is that to be considered an approval, or is it still pending an official approval process? Thank you! Kim Bach (talk) 11:26, 25 August 2023 (UTC)[reply]
Bot status has been applied to keep the extended testing out of recent changes and watchlists. I think approval is still pending, and can be applied if all relevant questions have been addressed. I have to admit that I don't understand if that is the case. Please feel free to summarise. Krd 12:58, 1 September 2023 (UTC)[reply]
I'm of the opinion that all questions has been addressed, as well as can be expected,
I've worked with @Rsteen to address the issues, that mostly revolved around categorisation. Kim Bach (talk) 18:48, 10 September 2023 (UTC)[reply]

As there are no recent objections, I'm calling this approved now. --Krd 03:46, 14 September 2023 (UTC)[reply]