User:Fæ/Project list/natlibnz
Upload of photographs by Albert Percy Godber
[edit]By request of NearEMPTiness, 22 April 2018.
Example links:
- Old uploaded image to Commons File:Steam railway locomotive the 'Sandfly' on the Karekare beach tramway.jpg
- https://tiaki.natlib.govt.nz/#details=ecatalogue.43759 parent entry for photographer
- https://tiaki.natlib.govt.nz/#details=ecatalogue.131065 source link to single gallery page
- http://ndhadeliver.natlib.govt.nz/delivery/DeliveryManagerServlet?dps_pid=IE74026&dps_custom_att_1=emu is a redirect to the IIP viewer from Access Digital Content button, giving internal PID
- http://ndhadeliver.natlib.govt.nz/iipsrv?FIF=2013/05/02/ac_13/V1-FL16828137.jp2&CNT=1&SDS=0,90&JTL=5,349 is an individual tile using internal file name
In the example, the IIPsrv data shows a source image of 6,407 x 4,704 pixels, while the current image is effectively a thumbnail at 664 x 488 pixels.
Using the ophir online dezoomer to create a png can be done, but it cannot parse the viewer page without first manually finding the call to the info file of http://ndhadeliver.natlib.govt.nz/iipsrv?FIF=2013/05/02/ac_13/V1-FL16828137.jp2&obj=IIP,1.0&obj=Max-size&obj=Tile-size&obj=Resolution-number. Getting this far relies on mapping from external catalogue "IRN" to internal PID (DAO), to IIP filename for the master jp2/JPEG 2000 scan. As there are a couple of thousand files, it is reasonable to do this by batch, plus the script may be reused for other parts of this archive.
Known errors
[edit]- The zoomable image may return a timeout message and other requests may give gateway timeouts
- The server may be offline for maintenance and return a maintenance message
- EAD format links do not download in Chrome, unless browser headers behave like an application
- Error 403 Some digital images are restricted to on-site viewing only, example
- Error 502 "Proxy Error", appears intermittently when downloading image tiles, though the read may succeed after one or more retries
- 'Josephine' repeatedly triggers API error stashfailed on the WMF server as a corrupt file after upload, it is unclear why
- Error 500, this was not seen in the first 300 image uploads, but after that became a persistent error at source, example which uploaded here after multiple read attempts. The error is seen several times for each image, half the time resulting in a restart of the process, and appears to be a general throttling or degrading of the image service rather than targeted at the requesting address. This may be an automatic response of the image server software. After approximately a week, the error virtually disappeared, so it may have been an operational server issue.
Technical
[edit]The image catalogue page has an Encoded Archival Description format available, this is an xml format of the metadata but even when clicked at the image level, appears to return all items in the database for the parent collection, at over 4MB of text file in this example case. Unfortunately this does not contain a reference to the digital media, though it does say whether it exists.
Example extracted EAD item record for IRN=131065:
<c01 level="item">
<did>
<unitid label="Reference Number">APG-0637-1/2-G</unitid>
<unitid label="IRN">131065</unitid>
<unittitle>Steam railway locomotive the "Sandfly" on the Karekare beach tramway.</unittitle>
<unitdate certainty="circa" datechar="creation" type="inclusive">[ca 1915-1916]</unitdate>
<physdesc>Dry plate glass negative 6.5 x 4.75 inches
<genreform source="tgm">Negatives</genreform>
<physfacet type="Inscription">Album page, Locomotive on the beach tram.</physfacet>
<physfacet type="Orientation">Horizontal image</physfacet>
<extent>1.00 b&w original negative(s)</extent>
</physdesc>
<note>
<p>Original print in Godber Album Vol 101, p 30 (PA1-o-195)</p>
</note>
</did>
<scopecontent>
<p>Steam railway locomotive "Sandfly" travelling along a trestle bridge on the Karekare beach tramway. Three men are on a wagon behind the engine and are named as Knutzen, Millington, Austin in the Godber Album (Vol 101, p 30). Millington and Knutzen are identified in other images in the album. Photographed by Albert Percy Godber between 1915 and 1916.</p>
</scopecontent>
<controlaccess>
<subject source="lcsh">Locomotives</subject>
<name role="Subject">Sandfly (Locomotive)</name>
<name role="Subject">Knutzen, Hans Peter, 1863-1949</name>
<name role="Subject">Millington, S, active 1915-1916</name>
<geogname role="subject">Karekare</geogname>
</controlaccess>
<accessrestrict>
<p>Partly restricted - Please use surrogate in place of original.</p>
</accessrestrict>
<altformavail>
<head>Alternative Form of Material</head>
<p>Digital Copy available</p>
<p>File Print available (
<num type="File Print">PFP-000824</num>). (
<num type="File Print">PFP-000825</num>).
</p>
</altformavail>
<dao>
<daodesc>
<p>IE74026</p>
</daodesc>
</dao>
</c01>
The final file could be saved losslessly in TIFF or PNG formats, as the PNGs are half the size, this is the chosen format and matches the way the Ophir dezoomer works.
The workflow looks like:
- Manually download the master EAD xml file
- Use the EAD to deduce which photographs have digital scans available
- Per image, deduce the catalogue URL and extract metadata for the image page
- Go to the IIP viewer, work out the server IIP filename, deduce the tilenames
- Query the IIPinfo file, giving the max height and width available
- Use IIPinfo to pull down the image tiles and stitch with PIL standard calls or let the Ophir tool do the assembly, save as PNG
- Upload new image from local file, loop to the next available from the EAD
Metadata
[edit]Mapping to {{Information}} is intuitive, with a few interpretations:
- Date may not always exist at the item level but can be found as unitdate in both item and parent Descriptive Identification (did). Format is inconsistent, e.g. [1920], 1920?, [ca 1920], circa 1920
- IDs exist as:
- IRN, the catalogue record number for the item
- Reference Number, the original long item archive number which may include forward slashes
- dao, Digital Archival Object, which appears to be the unique digital image number. This would be critical if there were many images for an item such as images of the negative as well as the print, however for this collection there is one scan per photograph item
- geogname, Geographic Name, is an optional field for the place represented in the photograph
Copyright
[edit]{{PD-New Zealand}} applies to all photographs in this collection by date of photograph taken. {{PD-1996}} is added as the Wikimedia Foundation servers host Wikimedia Commons in the USA.
Upload
[edit]Uploaded png files appear in Category:Photographs by Albert Percy Godber, using it as a bucket category. The upload comment includes a link to this project page. A report of images in this upload, excluding pre-batch project versions, is at Petscan, the 28 pre-existing images are listed here, and GLAM dashboard reports are here
The images are intensive in local processing time and number of downloads, taking several minutes each. Images are in the order of 10 to 25 megabytes when downloaded and up to a maximum of 60 megapixels.