User talk:PeterKz

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

-- Wikimedia Commons Welcome (talk) 08:47, 8 February 2016 (UTC)[reply]

GWT XML tweaks

[edit]

Hi, just taking a sample record:

<record>
      <source>https://data.kb.se/datasets/2014/10/suecia/8529114%2C1.tif/</source>
      <title>Conditorium Illustriss: et Exellentiss: Baronis ac D:ni Gustavi Soop S. R. M. Regnique Sveciæ Senatoris, in templo Askersundensi</title>
      <filename>8529114_1.tif</filename>
      <description>Conditorium Illustriss: et Exellentiss: Baronis ac D:ni Gustavi Soop S. R. M. Regnique Sveciæ Senatoris, in templo Askersundensi by Padtbrugge, Herman, 1656-1687.

{{Kungliga biblioteket image|libris-id=8529114}}
</description>
      <date></date>
    </record>

There's a couple of small improvements to consider.

  1. Where dates are blank, it would be great to include a default range, in this case perhaps {{between|1600|1700}} or {{before|1687}} if it can be automated based on the life of the creator.
  2. Filenames on Commons have a few unallowed characters (see the manual at GWToolset), and filenames should be both unique, not too long (i.e. < 200 characters) and easy to find. Normally a naming scheme like "<book title> <page> <unique GLAM identity>" or "<title> (<GLAM ref> <unique GLAM id>)" works well, and the GWT would need to see this in a field in the XML in order to map it to the filename (concatenating fields is not done well in the tool); I suggest adding a "Commons_filename" field with the naming you prefer, perhaps something like Conditorium Illustriss: et Exellentiss: Baronis ac D:ni Gustavi Soop S. R. M. Regnique Sveciæ Senatoris, in templo Askersundensi (NLS 8529114).
  3. GWT will identify the file type, so there is no need to add the extension, in fact adding an extention may lead to the uploaded file having two extensions if the mime type string is not identical.
  4. Please split the {{Kungliga biblioteket image}} credit template to its own "permissions" field, which is the parameter it would be mapped to on the Commons image page. It would be good to add {{CC0}} to this at the same time as an extra line.

I could adjust the XML by hand, but I presume you would like to get your scraping tool working in-house for later projects. If you want to be able to run the GWT yourself, take a look at the manual linked above, as it's easy to start using it on the Commons beta site and you can get the rights added to your account once you can show to a project Bureaucrat that you are unlikely to make any errors you can't fix yourself. If you are planning a future project, be aware, it may take a couple of weeks. :-) -- (talk) 12:15, 6 June 2016 (UTC)[reply]

Suecia antiqua

[edit]

Two files failed to upload, with duplicate errors back from the API:

"message": "Duplicate media file: An identical media file already exists under the title \"File:Suecia antiqua (SELIBR 18036538)-1.tif\".\noriginal URL: https://data.kb.se/datasets/2014/10/suecia/18036728%2C1.tif/\nevaluated URL: https://data.kb.se/datasets/2014/10/suecia/18036728%2C1.tif/"
"timestamp": "2016-06-06T17:38:16Z",


"message": "Duplicate media file: An identical media file already exists under the title \"File:Suecia antiqua (SELIBR 18036537)-1.tif\".\noriginal URL: https://data.kb.se/datasets/2014/10/suecia/18036727%2C1.tif/\nevaluated URL: https://data.kb.se/datasets/2014/10/suecia/18036727%2C1.tif/"
"timestamp": "2016-06-06T17:38:04Z",

I suspect these are duplicated links at the source, worth a double check. -- (talk) 18:56, 6 June 2016 (UTC)[reply]

I will have a look at those. Also, the image files pages says "author missing". I'll check if that data can be collected from the metadata harvesting as well. How difficult is it to add author info after upload? --PeterKz (talk) 07:30, 7 June 2016 (UTC)[reply]
Mass changes are relatively easy after upload, so long as there is a pattern to trigger changes; obviously it's better to have more information added pre-upload where possible. Two useful on-wiki tools for this are COM:VFC and Help:Cat-a-lot (which can apply to search results). As an example I used cat-a-lot to add Category:Erik Dahlbergh to all files in the upload category that matched "Dahlbergh, Erik" in the descriptive text, 677 files, see search. VFC uses regular expression matching for both filtering a directory contents and matching and replacing text, so it's fairly powerful. A key drawback of VFC is that it's designed to edit a couple of hundred files, so selecting 1,500 to work on can be a drag as you have to scroll through them first. However VFC can work on search results, so once you know how to search for exactly the files you want using incategory and insource options, it's just the right sort of thing and saves writing a custom bot. Refer to mw:Help:CirrusSearch for a sophisticated guide to search options. -- (talk) 09:28, 7 June 2016 (UTC)[reply]

A note about colour profiles

[edit]

Hi, there is a known issue with TIFFs that files may display with a purple/pink overcast on Firefox, but not on Chrome. An example is File:Suecia antiqua (SELIBR 8464277)-1.tif. This is a known bug with Firefox, but with no fix by Mozilla so far. If the colour profile is removed from the file, it will display correctly on any browser, but this is not an ideal solution as we would all prefer to keep the archive version digitally unchanged. If anyone raises this issue, please suggest that reliable viewing should be possible by swapping to any non-Firefox browser, or by downloading the file and viewing in an editor such as Photoshop or GIMP. Refer to phab:T123210. -- (talk) 22:15, 6 June 2016 (UTC)[reply]