Commons:Batch uploading/Nordiska Museet

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Wikimedia Sweden and Nordiska Museet (en) has started a cooperation where they have released around 1.000 old images (and more to come) for use on Commons. I've put up a small page about our cooperation here and uploaded a few images manually. I have all images in a zip-archive (~206MB) and metadata in .docx-, .xlsx- and .rtf-files. If anyone wants to take a look at this it would be highly appreciated. --Haxpett (talk) 00:30, 23 November 2010 (UTC)[reply]

Opinions

[edit]

Great! I would like to help. Can I have a look at the metadata? Multichill (talk) 08:23, 23 November 2010 (UTC)[reply]

Have a look at http://toolserver.org/~prolineserver/NordiskaMuseet/ --Prolineserver (talk) 09:51, 23 November 2010 (UTC)[reply]
Great news! I drafted {{Nordiska museet cooperation project}} and Institution:Nordiska_museet, should be handy (of course, edit at will).
Question: Should « Museet » be capitalized or not ? We have both at the moment (Nordiska museet and Images from Nordiska Museet), that should be harmonised before we have hundreds of uploads. Jean-Fred (talk) 14:38, 23 November 2010 (UTC)[reply]
In Swedish it should be Nordiska museet. Before I saw this page i created Institution:Nordic Museum, in what I thought was following the guidlines for the template. Should the Institution template have the English or the native name? --Ainali (talk) 19:27, 23 November 2010 (UTC)[reply]
Hello, this is just to comfirm that (1)the native name ("Nordiska museet") should be used for the Institution template and (2) "museet" is used with a minuscule m. Best/Jonas Hedberg/Nordiska museet
Hello Jonas, I see you're using commonist to upload these photo's this a very very bad idea. Please stop. This is a lot of work for you and the result isn't good. We have much better solution, just give us a bit more time. Multichill (talk) 15:54, 24 November 2010 (UTC)[reply]
These concerns were also raised and addressed at Commons talk:Nordiska Museet. Jean-Fred (talk) 16:02, 24 November 2010 (UTC)[reply]
Tonight I'm heading to http://uk.wikimedia.org/wiki/GLAM-WIKI and after that to France. I probably have time to do the upload after that. Multichill (talk) 12:04, 25 November 2010 (UTC)[reply]
✓ Done All renamed uncapitalised. Jean-Fred (talk) 17:14, 24 November 2010 (UTC)[reply]
I see there's a talk about this project at glamwiki. I'll be there :-) Multichill (talk) 09:52, 27 November 2010 (UTC)[reply]
Update: We're emailing to get things sorted out. Multichill (talk) 21:57, 30 November 2010 (UTC)[reply]

I don't know how these things are normally done but it might be worth adding the images to a temporary category (e.g. Category:Images from Nordiska museet needing review) to make the post-processing work more organised. Also I saw at at User_talk:Haxpett#Images_from_Nordiska_Museet that there was a question about filenames. Is there a good metadata entry (or combination thereof) which could be used for descriptive filenames? As for the already uploaded images. Is the easiest solution to re-upload them (as part of this batch) and then delete the old ones once uses have been replaced and any (potential) post-upload info has been copied across? /Lokal_Profil 15:00, 21 January 2011 (UTC)[reply]

If you have a good idea for creating a good filename: I'm waiting for it;-) Have a look at this file: http://toolserver.org/~prolineserver/NordiskaMuseet/Wikimedia_export_NY2.csv --Prolineserver (talk) 20:46, 21 January 2011 (UTC)[reply]
If we could filter out some of the info from the "Motiv" field (the longest to which is >600 characters) then {Filtered:Motiv}-NMA.{Identifikationsnummer}.jpg could be a possible name. Obviously the full "Motiv" field would still be used in the image description. Things worth filtering out would be stuff like "Autochrome / Autokromfotografi", "Nordiska museets föremål inv nr xxx", "Ur Lotten von Dübens ...von Düben 1868" and other info which is either to detailed (such as a list of everyone in a group photograph) or too general (such as the mentioning that its a coloured photograph). If this is something others believe works then the easiest would probably be to copy the Motiv field into a new Description field, run a few search and replace on it and then (sadly) looking through it manually. If we put the list on a subpage to Commons:Nordiska museet then we should be able to work through it together in a fairly short time. /Lokal_Profil 15:45, 22 January 2011 (UTC)[reply]
OK, I'll prepare it. I suggest a change to NMA.{Identifikationsnummer}-{Filtered:Motiv}.jpg, then images appear a bit more ordered. Right now about 100 batasets are missing, I'll wait until i get them. --Prolineserver (talk) 08:51, 23 January 2011 (UTC)[reply]
For the Trutat upload, I was asked to stick to « <Title> (<Year>) - <Id> - Fonds Trutat », so since names are often clipped the useful information about the subject of the file (here, Motiv) would appear a bit. As for ordering a bit I had the exact same concern as you and my solution was to use the ID number as the sortkey in the dedicated source category. Jean-Fred (talk) 12:50, 23 January 2011 (UTC)[reply]

Sorry guys, I've been busy doing other things and didn't really get to fixing this properly. I got a nice xml metadata set with the right information to generate good uploads. Example:

Museumsnummer: NMA.0033100 
Motivbeskrivelse: Porträtt av Pava Lars Nilsson Tuorda, 24 år, Tuorpons sameby. Ur Lotten von Dübens fotoalbum med motiv från den etnologiska expedition till Lappland som leddes av hennes make Gustaf von Düben 1868. 
Avbildad - namn :  Tuorda, Pava Lars Nilsson 
Emneord :  Man  
Emneord :  Minoriteter : Samer  
Emneord :  Motivkategori : Porträtt  
Datering :  1868 - 1868 
Fotograf :  Düben, Lotten von 
Registrert 16.11.2000, PRIMUS 
Eksemplar NMA.0033100-1 
  
 pixlar 
Eksemplar NMA.0033100-2 
 Positiv kopia, albumin 
 52x82 mm 

Prolineserver: Are you willing to take over? How's your python? You can just modify one of the bots in https://fisheye.toolserver.org/browse/multichill/bot/ Multichill (talk) 21:24, 23 January 2011 (UTC)[reply]

Well, I'm not really willing, only if you don't manage to upload the files in the next say two weeks. I didn't do much Python so far and I'm usually googling what I need. So if its fast for you to modify one of your bots go ahead, and if it happens that we get more pictures I can take over. Otherwise I would just create a shell script using php. --Prolineserver (talk) 12:26, 25 January 2011 (UTC)[reply]
Multichill : if needed I can help out. With the Trutat upload my wiki-python is not so bad these days and I could probably do the upload without much hassle. As for fixing the previous uploads I need to learn how to do but it’s in the realm of possible. Jean-Fred (talk) 13:31, 25 January 2011 (UTC)[reply]
What did people think about filtering the "motiv" field for use as description? It would require some manual work but there might be other swedish speakers which we could rope in. However we decide to do it we need to sort out the descriptions/filenames before any uploads can take place. /Lokal_Profil 14:24, 25 January 2011 (UTC)[reply]
I like the idea of putting the year in the filename, however since the metadata often seems to specify approximate years or ranges of years I'm not sure how well this would work here. Jean-Fred's motivation for having the description first in the filename and the suggestion of additionally using the id as a sort key are also very good. /Lokal_Profil 15:34, 25 January 2011 (UTC)[reply]
Are the Python-bots uploading the files directly? When I'm doint the uploads of the security conference I'm first creating a very rough shell script which I'm than heavily editing using some regular expressions in some linux edidor. Does the python script allow this? --Prolineserver (talk) 15:40, 25 January 2011 (UTC)[reply]
Yes, the python bots can upload directly. In my repo you can find several examples. Multichill (talk) 16:23, 31 January 2011 (UTC)[reply]

Preview

[edit]

I played a bit more with the XML data and generated first file descriptions, the first ones can be found here: Commons:Batch uploading/Nordiska Museet/preview. I think, we should use the keywords to add pictures into proper categories, I made a csv file containing all available keywords, what is missing are suitable Categories: Commons:Batch uploading/Nordiska Museet/keywords. The XML file contains a lot of information (we are not supposed to use all of it), and I'm not sure how to use it. For example, all dates have different attributes about their sources, see Commons:Batch uploading/Nordiska Museet/dates. Any suggestions for better templates? --Prolineserver (talk) 11:55, 29 January 2011 (UTC)[reply]

Thanks for taking this over Prolineserver. Looks good at first sight.
  • Filename : I would do "<description> - Nordiska Museet - <id>.jpg"
  • Source : Nice template, but you should also include a deeplink to their site based on the id. Use a source template for that.
  • Date : You have to play around a bit more. Probably want to remove leading ", " and use templates like {{Other date}} when a date range is given.
  • Author : Probably only worth using creator templates for some authors.
  • {{NordiskaMuseet}} should be a simple template without parameters in the licensing section. You can find requirements and good examples at Commons:Partnership templates#Requirements.
Multichill (talk) 13:18, 29 January 2011 (UTC)[reply]
Hi Prolineserver, here are my comments:
    • Please use {{Artwork}}, which fits better for GLAM files.
I would amend the descriptions as following : Commons talk:Batch uploading/Nordiska Museet/preview.
Jean-Fred (talk) 14:18, 29 January 2011 (UTC)[reply]
Thanks for making a first stab at a layout, makes it a lot easier to see what we have. Have a few thoughts though.
Filename: I agree with Multichill about changing the order of the information so that description is first and id alst. But in addition I think the filenames need to be pruned. A name such as "NMA.0023952-Teckning av Fritz von Dardel daterad 1839 med texten "Hur långt ha vi fram?". Två män i en hästdragen kärra med lera upp till hjulnaven. Nordiska museet inv.nr. 67.481.jpg" or "NMA.0028324-Brudparet Olinus och Berta Nilsson med familj och bröllopsgäster. Det berättas att det var ett "storbröllop" och att bruden som ursprungligen kom från Samilsgården i Husom, bar kronan i dagarna tre. Bilderna är tagna dag två.jpg" is just to long to be easily handled and inserted into wiki articles. Additionally some of that info isn't suitable for a title (e.g. "Nordiska museet inv.nr. 67.481").
As for the categories I can start matching the more frequent of the keywords to categories tomorrow (well later today).
I took a look at the dates and I'd say a lot of that info isn't really suitable for our use (e.g. "Påskrift på baksida"). I think a lot of the others could probably be replaced by "ca" which is a valid parameter in {{Other date}}.
/Lokal_Profil 08:01, 30 January 2011 (UTC)[reply]


OK, I spend half a day do find the difference between the unicode signs C2A0 and 20 :(. Well, a new preview is available under Commons:Batch uploading/Nordiska Museet/preview. Should we put the original metadata somewhere? What I still didn't touch is the location information. Otherwise, there are several files now controlling templates and categories:

  • dates, defines the additional parameter for the {{Other date}} OK
  • keywords, categories based on keywords OK
  • creators, categories+template for the creator OK
  • depicted, categories based on depicted person
  • profession, profession of the depicted person or photographer OK
  • filename, short filename on Commons OK
  • nationality, nationalities OK
  • places, depicted places OK
  • techniques, used technique, keys are searched in the description using stristr().

These files have to be filled up, I'll create another file for the location. When it comes to filenames I actually like short and unique filenames, so actually just something like "Nordiska Museet - <id>.jpg". Putting the description in the beginning makes a mess when one starts to download the files. But If somebody wants to fix them manually go ahead :) --Prolineserver (talk) 15:08, 30 January 2011 (UTC)[reply]

Quick note : as for depicted persons, there is {{Depicted person}} Jean-Fred (talk) 17:32, 30 January 2011 (UTC)[reply]
Quick change, added. --Prolineserver (talk) 18:33, 30 January 2011 (UTC)[reply]
Do we know anything about the scope of the images. i.e. are they only of Sweden, swedish persons etc? If not then any categories derived from keywords, would be very broad ones (e.g. Category:Men Category:19th century etc.). Prolinserver could you please output a csv of "id;motiv" to a subpage and I'll see how they can be trimmed down for filenames (hopefully I'll also rope in some other Swedish speakers for this task).
Quick comment on the preview. "Adbildad" should be "Avbildad" (but I've since added Swedish to {{Depicted person}} so this might no longer be needed. Also "slagord" means something different so It's probably best to use "ämnesord" (which is what the museum calls it), but I'll see if I can find an even better translation. /Lokal_Profil 14:22, 31 January 2011 (UTC)[reply]
I put the export at Commons:Batch_uploading/Nordiska_Museet/filename. The scope of the pictures is as far as I can see mainly Sweden. --Prolineserver (talk) 15:24, 31 January 2011 (UTC)[reply]
Thanks. I spotted a few photos that said Finland but overall it's probably better to move them out of the "... of Sweden" categories afterwards than shifting all of the others from the general categories into that one. Guess that is one of the things Category:Images from Nordiska museet/check will be used for. Since I'll be looking through the filenames just thought I'd also check if there are any particular characters which should be avoided/replaced (e.g. "/", "'", ":")? /Lokal_Profil 16:19, 31 January 2011 (UTC)[reply]
Is it worth matching {{Technique}} as well? I spotted (and added translations to {{Technique/sv}}) quite many when going through the filenames. Also the professions list seems to contain several false positives since it has also picked up on any labels on the back of the images containing info about the photographer/creator. An example of this is e.g. "Daguerreotypeur" and most likely the majority of the various variations on photographer. /Lokal_Profil 03:41, 5 February 2011 (UTC)[reply]

There are quite some pictures from places out of Sweden, so I suggest to keep the categories a bit more general. Regarding the Technique template it is a bit hard to extract from the metadata (see this, or you find a better tag here). But I added a list of keywords to be searched from the description, and so far added 3 keywords. Otherwise, the script seems to be pretty good running now, you can check it up at the toolserver. --Prolineserver (talk) 10:14, 5 February 2011 (UTC)[reply]

I generated some c&p-templates for the creators-templates: Commons:Batch uploading/Nordiska Museet/creatortemplates --Prolineserver (talk) 13:18, 5 February 2011 (UTC)[reply]
Quick check. Is the english entry in profession supposed to be used with {{Occupation}}? Filenames should be largely done apart from a few end edits which are more easily done after all manual editing. But I'm going to leave a message at Commons:Bybrunnen and ask other users to take a look at that (and some of the other subpages). /Lokal_Profil 15:04, 7 February 2011 (UTC)[reply]
I didn't add the occupation as a translation into the description, this template is as far as I see mainly for the [Commons:Batch uploading/Nordiska Museet/creatortemplates|Creator template]], i can recreate this list if you want to change something (the red professions do not exist and should be either changed or added to the occupation template). I don't know for how many Creators we should generate a template, maybe for those with more than say 5 pictures in the current upload, or just for all of those where we have some useful information that would be a pitty to throw away? And it is probable that we could get more pictures from the museum in the future, so those may be filled up. Similar with the locations. Should we categorize 28 pictures of Tuorpon into Jokkmokk? Where to draw the line, where do more categories start to hurt? --Prolineserver (talk) 17:49, 7 February 2011 (UTC)[reply]
Don't worry about the occupations/professions. I just wanted to know what the list was intended for since it had a third "english text" field. I've now updated the page to remove any red links by changing them to categories one step up in the category tree. As for creator template I'm not sure for how many of them it would be useful. I'm even hesitant if there is a need for one for Mats Landin and Hans Koegel since they are only repro photographers rather than original creators. Looking at Commons:Batch uploading/Nordiska Museet/creators I think most people who need a creator template already has one. Possibly one or two more could be created, e.g. sv:Johan Wilhelm Bergström, but otherwise a category (if even that) should do, could add the info to the category if we were worried of it being wasted. My main reason for not creating to many creator templates (and even categories some times) is that there is a mixture of repro photographers, amateurs, photographic studios, artists and even an automatic photographing machine listed, and many of these won't be notable enough to deserve a category/template. I'd say we create templates for people in /creators with more than X entries (I'll go through that page and search for existing info anyway). Possibly also for some with less than that if they have entries on sv.wiki or many photos in NM's list at digitaltmuseum.se. For the rest just no categories or creator templates. If there is useful identifying info we could consider adding it to the author field.
Place categories suffer the same problem. I think it's probably best to put them in existing categories and wait for more specific ones to be created "naturally". the problem is often that info in the picture links to administrative divisions of the time and might so not reflect the best naming convention for a category here. For the places my main problem is "Lima, Dalarna" (22 images) which can't be properly organised since there exists two places called Lima within the (mainly historical) subdivision Dalarna, but they lie in different categories here since they belong to different municipalities. /Lokal_Profil 16:38, 10 February 2011 (UTC)[reply]
For the place categories I added two more and I think its fine. As far as I know, a the photographer making a reproduction gains an own copyright of 25 years, therefore it is worth keeping them. All persons have some kind of ID, but I'm not sure if we are allowed to use it. Probably we don't need a Template for the machine, but you can also add a Wikilink instead of the Creator template, this string is paced directly into the field, if it is empty the first entry (the key) is taken. I don't know if the creator-template should be used also for Companies, I don't think so. When it comes to natural persons I suggest to create a template if there are more than 5 Pictures, or if we have more Metadata to the photographer, or if he has a Wikipedia article.--Prolineserver (talk)
I don't argue the need for keeping track of the repro photographers, just the need for a creator template for them. Also in this particular case I'm assuming all rights have been given to NM which is how they are able to in turn release them under a free license. But I don't really mind, as long as the creator template has a comment that they work as retro photographers so that people don't get confused by multiple creator templates.
I didn't get any feedback from Commons:Bybrunnen (Swedish village pump) so I guess /filename is ready to go now. I identified which Lima was intended in /places (i.e. I found the photographers which were all based there) so that page should also be done now. I've been doing research on the creators over the weekend and have been populating /creators with creator templates and categories. Have done so for all with more than 5 images. Will take a look at the remaining ones to see if there are any who already have wiki-articles or categories or for which full names (and birth/death years) can be found. This is probably the one area where it might be worth going back to NM 9after uploading) and seeing if they have more info. What type of ID was it that you mentioned? Noticed that there are still a few questionmarks in /keywords that needs sorting but once that is done I think we are good to go. Tell me if there is anything I've forgotten. BTW. is it a problem if there are <!-- comments --> in the csv file? if so I can remove them once they are finalised. /Lokal_Profil 11:20, 14 February 2011 (UTC)[reply]
Minor correction to the above. there are still 3 creator templates to create for 5 images+ people, easy to do though since they have wiki articles. /Lokal_Profil 11:37, 14 February 2011 (UTC)[reply]
Great job! Comments shouldn't be a problem, your changes are immediately visible in the conversion script. The ID is called Jpnr, but you can also have a look if you find somthing else intresting in the XML data. Otherwise I need to get the Bot flag for NordiskaMuseetBot, but nobody seems to be interested in the discussion, maybe you can leave a note there if you think we're done. --Prolineserver (talk) 22:22, 14 February 2011 (UTC)[reply]
Quick thing before I forget. "== Summary ==" should be replaced by "== {{int:filedesc}} ==". Will get back about the other things. /Lokal_Profil 22:47, 14 February 2011 (UTC)[reply]
And so randomly I stumble upon the National Database of Swedish Photographers. Maintained by NM themselves even... but the info there doesn't seem to be synced with the info extracted from their metadata... figures. Anyways this will hopefully give some more info from a central source and provides a unique id to each photographer. /Lokal_Profil 14:34, 15 February 2011 (UTC)[reply]
Great! I made a new version, see [view-source:http://wolfsbane.toolserver.org/~prolineserver/NordiskaMuseet/convert110216.php here]. --Prolineserver (talk) 07:10, 16 February 2011 (UTC)[reply]
Creators done! A last look at keywords and... /depicted... Ainali added lots of categories to it but they mainly seem to be red. Guess they should either be created or removed. Took a look at the convert.php, are the files really getting uploaded with "_2" in their filenames (just after the id)? Also would be good with a line break before Ämnesord and possibly also between different creators and dates. I still think "<description> - Nordiska Museet - <id>.jpg" works better than the current "Nordiska Museet - <id> - <description>.jpg" but I guess another opinion would be best. It's finally looking really good though, can see the light in the end of the tunnel now =) Could you update /preview once we sorted the above issues? Would make it easy to show someone and see if they notice anything which we have missed. /Lokal_Profil 18:14, 16 February 2011 (UTC)[reply]
Ush, the depicted is still a long list. Actually its the same as with the creator templates, we (you :) ) should go through all of them and look if it is useful to have them, if their are articles or already pictures at Commons. The _2 after the ID means that there are two different crops of the same picture. Since they have just one ID they get the same description and the same file name if i remove this index. Regarding the order I still like my version, but if you really insist on it I can change it. I can update the preview, but you could actually do it yourself :)
Yes, I did add a lot of categories. My stance is that every person should have their own category, and if it does not exist it should be created. Whether or not it should be done before the actual upload I don't know. --Ainali (talk) 21:29, 16 February 2011 (UTC)[reply]
[edit clash]keywords done. Had another look at the convert.php and I'm not sure what's going on with the dates. Think you might have put the "-" from the empty entries in /dates directly into {{Other date}} which results into {{other date|-|1937}} which gives "from 1937 until ". I think the correct thing to do is to have a conditional statement which doesn't use other dates if "Grunnlag" in the xml equals one of the empty values in /dates. /Lokal_Profil 21:46, 16 February 2011 (UTC)[reply]
@Ainali: I didn't mean it as a criticism. On the contrary I'm happy someone helped out. The only problem I have with the categories (something which became painfully obvious when I dealt with the creators) is that with only a name it's hard as hell to figure out who we're talking about. Some more unusual names will uniquely identify a person but the more generic names often match several people of the era and one would have to look at each individual image to check a) who it was b)that all entires with that name are of the same person. But in general I agree with "one (wo)man one category".
@Prolinserver: I prefer the other order since it gives the most relevant information (description) first, sorting is done via the id sort-key already. But we can check with the other contributes to this project and I'll bend to whatever suits the most people. Yes I could update the preview but since you know when it had been updated I figured you should do it =p. /Lokal_Profil 22:05, 16 February 2011 (UTC)[reply]
I did not really take it as critism, I was just not sure everybody felt the way i do about person categories (is there documentation of this anywhere?). Lokal Profil, I really thank you for all the hard work you have done! It is a pleasure logging on and see the RSS-feed full with constructive edits. --Ainali (talk) 22:59, 16 February 2011 (UTC)[reply]
Thanks =). I don't know where the appropriate discussions about categorisation are on Commons, apart from COM:CAT which doesn't deal with when to create a new category. One solution would be to leave the categories red and add the task of creating those categories (correctly categorised themselves) to Commons:Nordiska museet/Todo. That's probably the page where the known post-upload needs should be listed, and it could possibly work as a checklist for the pre-AGM editing session [Swedish Wikimedia reference] depending on how you were intending on running that. BTW, I'd completely forgotten about your comment request for BotFlag. The way it works is that you first have to make a test run (~30), then post there, so that's probably the reason no one has said anything. So if we opt to leave the depicted categories red and sort the small problem with {{Other date}} is there then anything left to do? /Lokal_Profil 23:59, 16 February 2011 (UTC)[reply]
Found m:Help:Category#Labels in the list of images regarding filenames. /Lokal_Profil 19:32, 17 February 2011 (UTC)[reply]
OK, thats an argument at least against the Nordiska Museet in the beginning, but NMA.xxxx would work :) I fixed the otherDate and the linebreaks, and updated the preview-page, the new script at the toolserver is [view-source:http://toolserver.org/~prolineserver/NordiskaMuseet/convert110217.php here]. I know the rule with the testrun, but the uploading itself is a standardtask for the pywiki-framework as I use it with Prolinebot and MSCBot. The difficult part here is the conversion, and this can be checked at the preview page. --Prolineserver (talk) 22:09, 17 February 2011 (UTC)[reply]
Now I also generated the shell script controlling the bot: bot.sh. Now just the category of the depicted persons and the botflag is left until I can push the button. --Prolineserver (talk) 10:35, 19 February 2011 (UTC)[reply]
Left a small note on EugeneZelenko's talk page about the bot request which seems to have almost been missed by the people normally active on those pages. Once that is sorted we should be ready to go. As I mentioned above I believe the depicted categories can be dealt with in post processing (due to the need for extra info). If any are left uncreated afterwards these can easily be identified through /depicted and dealt with then. /Lokal_Profil 10:11, 20 February 2011 (UTC)[reply]
OK, than I'll go for a test run of lets say 20 photos? --Prolineserver (talk) 13:18, 20 February 2011 (UTC)[reply]
20 should be fine =) /94.193.242.248 18:34, 20 February 2011 (UTC)[reply]
Post trial run
[edit]
Women?, Portrait?
Done, became 20 XML data sets instead of 20 files. --Prolineserver (talk) 21:46, 20 February 2011 (UTC)[reply]
Looks good. Didn't realise we were putting up all (i.e. back and front) the photos though. Do we really need all? The backs etc. These can always be found through the accession link. Also for an image such as that to the right the categories etc. wouldn't be true. Nor would the image ever be likely to be used in Wikipedia article (possibly one of them would be of use but not all). For me this seems the natural dividing line between what belongs in the museum archive and what belongs at Commons. Just as I expect people might clean up/rotate/crop the images on Commons whereas in the archive you obviously primarily want to show the untouched original (here it's still accessible through the upload history). /Lokal_Profil 11:12, 21 February 2011 (UTC)[reply]
On a sidenote. Just spotted (noticing that that Docu was adding cats) that {{Technique}} doesn't add a category. We could always add a category entry to the technique csv and have that added as well. if it's not to much coding work. /Lokal_Profil 11:18, 21 February 2011 (UTC)[reply]
Added a first batch of category matchings to the /techniques talk page. /Lokal_Profil 14:04, 21 February 2011 (UTC)[reply]
matchings now complete (on talk page). Also realised I'd forgotten one technique which was has now been added. /Lokal_Profil 15:04, 21 February 2011 (UTC)[reply]
About the verso images: maybe we could just add them all into one category rather than categorize them along with the front/recto. File name and description should mention it's the verso (ideally in Swedish I suppose). Personally, I'm in favor of uploading them. --  Docu  at 12:53, 21 February 2011 (UTC)[reply]
That would be a solution. Especially if "other version" can be made to work. The xml field contains the entry "[Antall]" which should relate to the number of images using the same metadata. If "_1" is always the real image then a conditional should be able to construct the "other_versions" field and separate the versos. A separate problem is the case where one of the images contains two photos, in which case it might very well be the "_2" to both those individual ones. See e.g. [1] and [2]. /Lokal_Profil 15:04, 21 February 2011 (UTC)[reply]
Well, I don't think that it is possible to handle this automatically. I can actually add the other versions, this is an possible (not very quick, though) fix. The categories for the technique I'll also fix tomorrow. --Prolineserver (talk) 22:48, 21 February 2011 (UTC)[reply]
re: categorization: One could also select the images of reverse sides with cat-a-lot after upload, place them all in one category and then have a bot remove all other categories from these images. --  Docu  at 07:53, 22 February 2011 (UTC)[reply]
@Docu. Do you mean manually adding them to e.g. Category:Images from Nordiska museet/reverse sides and then running cat-a-lot once populated? If so it could be added as one of the tasks in the todo list.
As for my two images sharing the same "_2" I actually spotted this at File:NMA.0033090_2.jpg and File:NMA.0033091_2.jpg, but it looks like this is primarily limited to one source so it could be dealt with manually afterwards. Can't figure out why mediawiki doesn't identify them as duplicates though, looks as though this feature has broken =(. /Lokal_Profil 16:40, 22 February 2011 (UTC)[reply]
Since I add only one technique to the template, the last matching one will be taken, while I add all categories that match the keys in the description. Do we _really_ need the category photographs? I made a new output which also generates a gallery for the other versions. Should I add an additional maintance category if there exist other versions? --Prolineserver (talk) 22:19, 22 February 2011 (UTC)[reply]
@Local Profil: I thought adding Category:Reverse sides with cat-a-lot, but maybe it's easier to add it automatically everything that doesn't end with "_1" and then remove those that aren't verso views. For reverse sides, IMHO, the descriptions and filenames should be shorter as well. Otherwise it will be virtually impossible to search for anything without coming across these every time.
Category:Photographs is pointless (IMHO), but Category: Daguerreotypes or Category:B&W photos aren't. These could also be added afterwards. There are a few others we might want to skip, but I haven't taken the time to look into the automatically generated categories yet (especially /keywords). The category for the depicted person is definetly a useful one. --  Docu  at 06:38, 23 February 2011 (UTC)[reply]
@Prolineserver: The other versions thing looks good =). We can probably remove categorisation on photographs but the rest should be relevant. Is there a specific reason why only one technique is added to the template? This could cause a problem if it's e.g. a hand coloured daguerreotype and hand coloured gets chosen over daguerreotype.
@Docu: Asuming it's easily implemented by Prolineserver I guess automatic categorisation into e.g. Category:Images from Nordiska museet/check/B-sides (possibly instead of "/check") of any file ending with either of _2,_3,_4,_5 would be desirable. Then one could manually check any false positives before removing categories from them. Also by not putting them in /check this category should get less cluttered by images once people starts working there. Although filename changes would also be good e.g. "reverse side - Nordiska museet -<id>.jpg" this would probably make coding harder and any false positives would have to be manually renamed as well. I don't think them showing up in the search result is to much of a problem since anyone going to the filepage could find the real image through the other versions gallery.
I always saw the keyword categories (and the photograph category as well) as a starting point for manual categorisation afterwards. E.g photograph->BW photo->BW photos of Sweden etc. It's a way of relating the keyword to the category structure in Commons so that you know in which branch you should start looking. To a certain extent I believe an image is more likely to diffuse to the right category if put into rougly the right spot than if it relies on someone putting it straight in there, especially if the categoriser isn't used to Commons. That said there are probably categories such as "objects" which are way to general. /Lokal_Profil 12:30, 23 February 2011 (UTC)[reply]
Can you give an example how such a line with different techniques schould look like? I Don't think its possible to automatically name the files according to a potential backside, some pictures series have more than just one picture of the frontside. When it comes to the number of categories I think its better to put more than less. Its quicker to remove or change a Category than to add one, and a wrong categorized image is attracting more attention than an uncategorized. --Prolineserver (talk) 16:12, 23 February 2011 (UTC)[reply]
I added an additional maintanance category for all pictures which have several versions. --Prolineserver (talk) 20:44, 24 February 2011 (UTC)[reply]
I guess the only time there should be two techniques is when hand colored is combined with something else in which case the output should look like {{technique|panotype|color=hand colored}}. My main worry are cases where the info might say something like "photo of a watercolour painting of a ..." or "ink drawing with a signature in pencil in the corner" in which case either both are relevant techniques or a human edit would be needed to remove the erroneous/irrelevant one. /Lokal_Profil 15:35, 25 February 2011 (UTC)[reply]
OK, I think i fixed everything, and did a new trial run. See e.g. File:Dubbelporträtt,_kvinna_och_flicka_-_Nordiska_Museet_-_NMA.0051950_1.jpg. --Prolineserver (talk) 09:13, 26 February 2011 (UTC)[reply]
Realised I forgot to reply. New upload looks great and Category:Images from Nordiska museet/checkbackside should make it easy to pick out backsides from non backsides. As far as I'm concerned we should be ready to go. /Lokal_Profil 23:24, 28 February 2011 (UTC)[reply]
One small thing: somewhere I saw that the repro photographer was moved into the source field. Should we change this? --Prolineserver (talk) 06:59, 1 March 2011 (UTC)[reply]
Ohh... that's a very interesting suggestion. In my view moving the repro photographer (and repro date) into the source field would make a lot of sense and should make the overall information more digestible (also I now spotted afterwards the field does say source/photographer). Is this something which is easily implemented though? Is it easy to distinguish repro creator from original creator? Have all images got two creators (or one and one "unknown").
When trying to figure out what should go on the "todo" list I had another thought. Would it be better to move the original info out of the {{Sv}}-template (and possibly put it in a box of some sort, similar to the Bundesarchiv images) to make the distinction between info that shouldn't be edited and user created/improved info. I'm working on the assumption that the original info should be "preserved" once again similar to the Bundesarchiv images. /Lokal_Profil 11:54, 1 March 2011 (UTC)[reply]
Random Bundesarchiv image as an example. It doesn't have to be anything as complicated as that though, and (obviously) it's only needed if we intend to keep the original info untouched. /Lokal_Profil 12:09, 1 March 2011 (UTC)[reply]
I think the artist field in Template:Artwork means the creator of the artwork. The creator of the reproduction should seems more like source information to me. For some images I have seen "Tillverkare av avbildat objekt:" (example). I don't know how common that is, but maybe it can help a bot to see who is oiginal artist and who is repro photographer. /Ö 18:45, 1 March 2011 (UTC)[reply]
Well, what should I say? This is finally what I started in the very beginning in my first preview, i.e. putting the original description in an own template [3] :) There are not always two photographers, and I cannot automatically distinguish them. But we can mark the reprophotographers in the csv-file manually, and define that every year after lets say 2000 is the reproduction date. There we should hopefully have no false positives. However, if it is a picture of a scuplpture we may have the artist, the photographer and the reprophotographer. Strictly speaking the photographer should end up in the same field as the reprophotographer, but I don't think that I can make this automatically. Just give me an example of what you like to have and I can see if i can implement it. I'll say "no" if it is to complicated. Otherwise, look in the XML data to see what we have: XML. --Prolineserver (talk) 21:59, 1 March 2011 (UTC)[reply]
Think the problem there was that there was a whole separate template for it. My thought was just something simple like User talk:Lokal Profil/Test2 (obviously with a box non-centred more discrete box). For those reading this page who are not in the e-mail loop Nordiska museet seems to prefer preserving the original info, i.e. any changes/corrections/additions should be clearly separate.
As for repro photographers I think we can probably find most of them and tag them as such in the csv (I've marked most (all?) of them already on their {{Creator}} template or in the /creators subpage so it's just extracting the info). The 2000 cutoff should work well for identifying repro year as well. Will it break everything if I add an extra column to the creators list with "repro" or blank as an indicator? As a failsafe we could add "tag any files with multiple authors/multiple years" to the todo list. /Lokal_Profil 13:31, 2 March 2011 (UTC)[reply]
No, it does not break anything, please go ahead. I would prefer something like User talk:Lokal Profil/Test2#Prolineserver, i.e. a static original file description, and an swedish description which can be edited and translatet. This does not need to contain the keywords. I can categorize images with several authors, but I think we have to check all anyway manually since some files may only contain the reprophotographer. --Prolineserver (talk) 15:24, 2 March 2011 (UTC)[reply]
I.e. essentially seeding the Swedish description with the original description? Think that will work assuming there is something similar to the yellow box in Bundesarchiv which visually separates the two sets of information so that it doesn't look odd. As an additional bonus such a field could contain the "please don't edit the text in the box" and "if you find errors in the box then tell us at ..." text. Keywords shouldn't be included in the editable info and possibly "depicted place/person" should live outside the {{Sv}} since they are already translated. BTW. do you know if there is a reason for why "depicted person" isn't bold font when "depicted place" is?
With categorize images with several authors do you mean several entries into the artist field or several authors in the xml? /Lokal_Profil 16:36, 2 March 2011 (UTC)[reply]
Precis, they are identical in the beginning, but this shouldn't matter. Theres a lot of information that can finally be removed from the description, t.ex. the technique and format. I think there should be a general template without the name of the Museum like {{BArch-description}} (or the name as a parameter). This "tell us errors you found"-hint as very useful for the archives in Germany, they got a lot of corrections from this. The information in depicted i not translated, look at the example I put on your page. I don't know why it is not bold, and right now I don't see a reason why it shouldn't. Regarding the categorization: I can do both, but the metadata regards all photographers as identical, the real artist of a depicted object is missing, somtimes just one of the photographers is given etc. I think we have to check every image manually anyway and any additional marking is useless. --Prolineserver (talk) 07:24, 3 March 2011 (UTC)[reply]
For what I meant with "depicted" being autotranslated compare en to sv and fr. If there is no way of telling with regards to the artist then I guess the best thing to do is to add manual checking to the "todo" list rather then adding a category. Yes a general template like {{BArch-description}} would probably be good. We could always start drafting one at e.g. User:Lokal_Profil/Test3. /Lokal_Profil 17:23, 3 March 2011 (UTC)[reply]
Before inventing everything new we may "just" copy the BArch template and change it, there are already a lot of useful translations in. And may be discuss this issue at the village pump, it should be of general interest. What todolists do you want to have, i.e. which criteria? Regarding the depicted persons (and the location): The name is autotranslated, but not the parameters. I wonder what is the proper way of using these templates, especially if names have different transliterations, writing systems etc. --Prolineserver (talk) 19:58, 3 March 2011 (UTC)[reply]
I updated Commons:Batch_uploading/Nordiska_Museet/preview. I'm now placing all creators+year marked with 'repro' to the source field, I didn't apply the year criterion since many photos do not contain the reproduction date, and there were just 3 dates contradicting with the repro flag and i dodumented them in Commons:Nordiska museet/Todo. Furthermore i splitted up the description in the two templates, you should move your template into the main namespace now. Anything else? --Prolineserver (talk) 12:57, 5 March 2011 (UTC)[reply]
btw: as always, the new descriptions are here. The XML-data distinguishes between 'Fotografering' and 'Produktion', but It seems that this distinction is arbitrarily, and i don't know where to put this information. --Prolineserver (talk) 13:01, 5 March 2011 (UTC)[reply]
Post trial run part II
[edit]
My impression of the BArch template is that it's a bit of an overkill for what we are trying to do. Apart from the desired box they also have it changing colours based on "biased" flags and also seem to use it as a mini info box. I agree that VP is probably the best place to start properly designing (or reusing parts of BArch) but since those discussions have a tendency to drag on I figured it would be good if we have a template which does the trick until then. Once there is one working it should be easy to replace all uses with the new one or making our one call that one with filled in parameters.
For the todo list I was thinking of two things. One would be something similar to what you used it for, i.e. notifications of specific images needing attention due to XML complications or due to being test uploads. The other would essentially be a checklist to follow for taking an image out of the "check" subcategory. Ideally it would work as a handy guide for wikipedians (not used to commons) to do some of the manual work at the same time as they are looking to see which images to incorporate into articles. This second list I've started to sketch out at Commons talk:Nordiska museet/Todo.
As for the "depicted" templates I'm not sure exactly how they are intended to be used. I asked about them at Template talk:Artwork#Depicted_person.2Fplace earlier but still not sure =(
The latest preview looks good. I moved the template to {{Nordiska museet description}} (and put entity and error page in). Btw. we should probably not use the "depicted" templates inside the "original info" template. The description link seems to be broken. Fotografering/Produktion looks as though it's trying to distinguish between repro production and, say, original painting... but I've seen plenty of seemingly arbitrary use so my opinion would be that it's best to just leave that information out. /Lokal_Profil 04:20, 6 March 2011 (UTC)[reply]
The missing data sets (mentioned in todo), are these cases where we have the images but not the metadata? /Lokal_Profil 04:22, 6 March 2011 (UTC)[reply]
Yes, I think so.
Your checklist looks quite complete, right now I don't have any missing issue to add :) I changed the template to {{Nordiska museet description}} in my script. Which description link do you think is not working? Should I hardcode the Swedish Avbildad person/Avbildad personer och Avbildad plats? --Prolineserver (talk) 08:12, 6 March 2011 (UTC)[reply]
The non working link was http://toolserver.org/~prolineserver/NordiskaMuseet/bot.st (above) which I now spotted should say http://toolserver.org/~prolineserver/NordiskaMuseet/bot.sh. Originally I thought that hardcoding would be better in the "original" section... but I no longer think it matters apart from if we would like all headings to be in bold font. On the other hand I don't think using the templates (inside original) gains us anything either. Is there any other information (which we have parsed into the template) which should also go into the "original" section?
After looking further at the "depicted" templates I still believe they should fall outside the {{Sv}} template. For the "place" I guess there could be the possibility (definitely not suggesting we do this) of using {{RelativeLocation}} or {{City}} in the future which would make it completely autotranslated. Similarly for "person" {{Name}} might be used (although it doesn't seem to do that much at this time) or one might link it to a page/category on Commons. At the same time I'd expect any link within {{Sv}} to go to the relevant sv.wiki page.
For the images missing metadata I think we could manually enter the relevant information from digitlatmuseum.se afterwards. Btw. if you find any visible parts of the various templates/pages/categories which we use heavily that lack a Swedish translation then give me a shout (or drop a note at "todo"). I'll put together a translation of {{Nordiska museet description}} and Commons:Nordiska museet/Error reports and also add two examples to the latter. Which reminds me, if I delete an image from the previously uploaded ones (it was wrongly labelled by NM) will that cause havoc and would it be better to do so after the whole batch is done? /Lokal_Profil 12:43, 8 March 2011 (UTC)[reply]
I made a new output file which can also be used as a c&p template for the already uploaded pictures. I moved the depicted person and depicted place in front of the sv template, and hardcoded the keywords, avbildad person och avbildad plats in the descritption template. I would not delete the files completely but may be replace them by redirects. If you don't see any problem I will do another testrun tomorrow evening. --Prolineserver (talk) 20:37, 8 March 2011 (UTC)[reply]
Quick reply. The one file I thought of deleting was the one mentioned in Commons_talk:Nordiska museet/Error reports where the original was removed from NM. For the output file, it seems to be messing up accented or Swedish characters. Also from the output file it looks as though an additional linebreak has sneaked in after each "Avbildad plats:" entry. Apart from that it's all looking fine =) /Lokal_Profil 13:03, 9 March 2011 (UTC)[reply]
You need to change the character encoding of your browser from latin-1 to utf-8 which mediawiki is using, than its working fine. --Prolineserver (talk) 21:28, 9 March 2011 (UTC)[reply]
Strange. The character encoding setting is right but it still looks wrong... anyways it's browser related, saving the file solves the issue. Translated {{Institution}} which looked like the only remaining part of the file description which didn't display in "?uselang=sv". /Lokal_Profil 12:42, 10 March 2011 (UTC)[reply]

next testrun...

[edit]
I did the testrun, I couldn't see anything strange. --Prolineserver (talk) 21:56, 9 March 2011 (UTC)[reply]
All looks fine to me too. /Lokal_Profil 12:50, 10 March 2011 (UTC)[reply]
OK, then I'll do the complete upload in the next days. I gave up the hope to get the botflag at all, and instead the suggestion of doing it without. --Prolineserver (talk) 15:24, 10 March 2011 (UTC)[reply]
As an interesting comparison NMA.0052142 (on Commons) and NMA.0052142 (on europeana). Think our info parsing is a lot better =) One thing though. Is it worth keeping track of the "project"? ("[Historikk] => Projekt: ..." in the xml)? If so then to make it the easiest for us at this point we could simply just create a gallery page for each, rather then entering the info into the file page (this could always be done later if needed since they would all be used in the gallery page. Also this would allow us to make the decision on this separately from the upload process. reason I came to think about it is that this is a grouping used on Flickr to give context to why a particular set of images were digitised. /Lokal_Profil 12:27, 11 March 2011 (UTC)[reply]

final upload

[edit]

The upload is finished, and what I so far saw looks quite nice. I updated the Todo-List and added the Historikk-field as a Category and copied it into the note-field. There were just two different entries, one with 22 and one with more than 400 pictures. --Prolineserver (talk) 13:13, 12 March 2011 (UTC)[reply]

Looks nice =) Well done, saw that the bot flag came through in time as well =). Should discussions progress at Commons talk:Nordiska museet alternatively Commons talk:Nordiska museet/Todo instead of here? /Lokal_Profil 13:24, 12 March 2011 (UTC)[reply]
I don't know, I don't like to split discussions, but I guess if we move it to a discussion page the discussion may get a bit more ordered. --Prolineserver (talk) 13:57, 12 March 2011 (UTC)[reply]
Put a box in the top of this page saying "upload complete" further discussion at page X. And a box on page X saying earlier discussions took place at Commons:Batch uploading/Nordiska Museet? I guess Commons talk:Nordiska museet makes the most sense although Commons talk:Nordiska museet/Todo is probably the one where most discussions should take place in the short run. /Lokal_Profil 14:13, 12 March 2011 (UTC)[reply]
Could also merge those two (talk) pages to simplify things and keep it all in one place =) /Lokal_Profil 14:13, 12 March 2011 (UTC)[reply]
Spotted that the sort key of "checkbacksides" is broken (all the same). Shouldn't be a problem though since that category is temporary. Worth taking note of though so that we don't use that key for the latter backsides category. /Lokal_Profil 14:38, 12 March 2011 (UTC)[reply]
Ups, this last category is added outside the main loop, after i parsed all images. But shouldn't be a big problem. Merging the discussion pages is a great idea. --Prolineserver (talk) 15:06, 12 March 2011 (UTC)[reply]

Filenames

[edit]
Moved to Commons talk:Nordiska museet

Metadata

[edit]
  • Identifikationsnummer: - id, should be used in source template for deeplink
  • Fotograf: - Photographer, author field, creator templates would be nice
  • Motiv: - Motif, as part of the description
  • Datering - Fotografering: - Date the photo was taken
  • Avbildad, ort: - Depicted, location, geolocation, coord
  • Motiv-ämnesord: - Motif, subject, could be categories
  • Avbildad - namn: - Depicted, name, could be a category
  • Tillverkningsort: - City of origin
  • Tillverkare av avbildat objekt: - Manufacturer of depicted object
  • Datering - Produktion: - Date of production
  • Historisk händelse, person - Historical event, person
  • Utsikt över: - View over
  • Tillverkare av kartong, - Manufacturer of box
  • Utgivare: - Publisher
  • Datering - Annat: - Date other
  • Modellnamn/titel: - Name of model, title
  • Reprofotograf, skanning: - Photographer of reproduction, scan

These are the fields. I made a small start of the matching. Will continue later. Feel free to help! Multichill (talk) 18:57, 2 December 2010 (UTC)[reply]

I continued with the rest, but since I'm not sure about all of them I'll ask for help on sv:village pump. /Haxpett (talk) 20:09, 6 December 2010 (UTC)[reply]
Looks pretty good to me. --Ainali (talk) 22:48, 6 December 2010 (UTC)[reply]
I would like to have the word "kartong" in a context, to decide how to translate it! It can be "box", but it's IMO also a sort of paper. -- Lavallen (talk) 14:52, 7 December 2010 (UTC)[reply]
For the context of kartong look at [4] you can also set the filter to 'foto' and 'Nordiska museet' for a better match (which generates a url containging [ and ] so linking to it didn't work. Anyhow the full sentence seems to be "Tillverkare av kartong, ritning, stamp m.m.: " and seems to be some sort of "other creator" category. E.g. in [5] it refers to the fashion designer of the cloths worn in the image, in [6] it refers to the architect, in [7] I think it refers to the designer of the furniture in the image. /Lokal_Profil 17:20, 8 January 2011 (UTC)[reply]
ämnesord would be keyword or category. It should be possible to match these to existing categories. /Lokal_Profil 18:08, 8 January 2011 (UTC)[reply]
Is it worth translating other keywords (from e.g. the "Datering - Fotografering:" field)? /Lokal_Profil 15:45, 22 January 2011 (UTC)[reply]
Added a first batch of category matchings to the /techniques talk page. /Lokal_Profil 13:58, 21 February 2011 (UTC)[reply]

Creators

[edit]

These are the top creators in the collection I got. Should probably all have a creator template and a category to put the photographs in. Multichill (talk) 13:27, 19 December 2010 (UTC)[reply]

Looks fine to me. However, Mats Landin seems do be just the photographer of the reproductions. --Prolineserver (talk) 21:44, 22 December 2010 (UTC)[reply]
Same seems to be true for Hans Koegel (judging from [8]), and yes Okänd means unknown. /Lokal_Profil 17:04, 8 January 2011 (UTC)[reply]
Field contents Number Template Category to add Remark
Düben, Lotten von 54 Creator:Lotten von Düben? Category:Lotten von Düben?
Koegel, Hans 76 Creator:Hans Koegel?
Landin, Mats 215 Creator:Mats Landin? repro photographer
Nilson, Severin 18 Creator:Severin Nilsson? Category:Severin Nilsson?
Okänd 35 {{Unknown|author}} none That's unknown right?
Strindberg, August 10 Creator:August Strindberg? Category:August Strindberg
Wilhelmson, Carl 11 Creator:Carl Wilhelmson? Category:Carl Wilhelmson
Assigned to Progress Bot name Category
Prolineserver NordiskaMuseetBot Category:Images from Nordiska museet/check