Commons:Batch uploading/Rijksdienst voor het Cultureel Erfgoed
- Source to upload from: image bank RCE in Europeana
- Description: 550.000 images from Monuments (buildings) in the Netherlands (of which 3000 in other countries). 50-80% is probably a Rijksmonument. Around 30% is identified. Another part could be identified based on address information
- license: CC-BY-SA-3.0-NL, see here. 1200x1200px release in OTRS 2012121010014322.
- Templates There is a template {{RCE-license)) and a template for linking to the database {{RCE-source}}
- More information:
- User:Basvb/Test some thoughts on how the images could be processed after uploading. (we need to find their Rijksmonument identifiers.
- Commons:Rijksdienst Cultureel Erfgoed
Opinions
[edit]Question: Did I understand it correct: only images up to max. 800x800 px are unter CC-BY-SA-3.0-NL? So we can use only up to this dimensions? --Slick (talk) 11:48, 1 December 2012 (UTC)
- We are still figuring that out because it's unclear, it seems that all sizes are free, but if you want to download images over 800x800 pixels there is a problem with downloading costs. The site states that images up to 800x800 are available under a free license and that all images are free (so it states both after eachother). For 800x800 it states that they can be downloaded freely, for other sizes it does not state this. Basvb (talk) 11:01, 2 December 2012 (UTC)
If you like to download the full size, just look at the html code and analyse the requests do by Flash-Viewer:
- Step 1) Find the numeric picture id, i.E. in the URL: http://beeldbank.cultureelerfgoed.nl/20312817 -> ID=20312817
- Step 2) get the URL: http://beeldbank.cultureelerfgoed.nl/index.php?option=com_memorixbeeld&view=record&format=topviewxml&tstart=0&id=<ID> you will get a XML output [1]. The values of interest are filepath and the layer with the scalefactor=1:
... <filepath>39abc504-df68-c0ad-0c2b-b33296769b30.tjp</filepath> ... <layer no="5" starttile="45" cols="9" rows="12" scalefactor="1" width="2075" height="2880"/> ...
- Step 3) Read the layer line. Now you know that the picture is split in 9 cols, 12 rows and the starttile is 45. You now just download all tiles started by 45 up to 45+(cols*rows) and join them together by cols and rows. To get a single tile use: http://images.memorix.nl/rce/getpic?<FILEPATH>&<TILENUMBER>, i.E. http://images.memorix.nl/rce/getpic?39abc504-df68-c0ad-0c2b-b33296769b30.tjp&102
All should be very easy do this by a script. To check you joined images is fine, match it with the given width and height.
- We didn't know the Tile/col procedure but we did know easy ways to download 1600x1600px files (just change the links). Permissions are the problem there, we are trying to clear that up but as it seems now we will only get permission to download the 800x800 (or maybe 1200x1200px) files. Basvb (talk) 12:54, 6 December 2012 (UTC)
- I got an explicit release up to 1200x1200px, I will forward this to OTRS. Basvb (talk) 20:34, 10 December 2012 (UTC)
Some notes from me:
- It's pretty straightforward to query the api. We have priref 20000000 - larger number [2]
- We have json output: http://cultureelerfgoed.adlibsoft.com/harvest/wwwopac.ashx?database=images&search=priref=20310001&output=json
- Fields: made a start at Template:RCE data ingestion layout. Far from complete
Some json used to play around:
{"adlibJSON": {"recordList": {"record": [ {"@attributes": {"priref":"80000109","created":"2011-04-05T19:07:04","modification":"2011-04-05T21:47:18","selected":"False"}, "Description": [ {"description": ["Schildering op de schoorsteenboezem in de Renzumaborg in Uithuizermeeden.\u000d\u000a- begane grond, linker achterkamer: landschap."] } ], "Monument": [ {"monument.complex_number":["515612"], "monument.geographical_keyword":[""], "monument.house_number":["3"], "monument.name":["Rensumaborg"], "monument.number":["21320"], "monument.number.x_coordinates":["6.71402490110"], "monument.number.y_coordinates":["53.41522222070"], "monument.place":["Uithuizermeeden"], "monument.province":["Groningen"], "monument.record_number":["279499"], "monument.street":["Rensumalaan"], "monument.type":[""], "monument.zipcode":["9982 BH"] } ], "object_number":["100109"], "priref":["80000109"], "Reproduction": [ {"reproduction.reference": ["d6071e44-eb0a-4bb0-f345-d0a311489ae6"] } ] } ] }, "diagnostic":{"hits":"1","xmltype":"Grouped","first_item":"1","search":"priref Equals 80000109","sort":"","limit":"1","hits_on_display":"1","response_time":"0","xml_creation_time":"15,6229","link_resolve_time":"15,6229","dbname":"collect","dsname":"","cgistring":"images"}}} {"adlibJSON": {"recordList": {"record": [{"@attributes":{"priref":"20000001","created":"2009-04-19T11:05:45","modification":"2012-10-12T15:25:58","selected":"False"}, "collection":["Fotocollectie"], "Content_subject":[{"content.subject":["Grachtenpand"]}], "creative_commons":[{"value":["RCE","CC-BY-SA","CC-BY-SA"]}], "Description":[{"description":["Exterieur, overzicht voorgevel pand Vrouwenverband"]}], "Monument": [ {"monument.complex_number":["518301"], "monument.geographical_keyword":[""], "monument.house_number":["15"], "monument.name":["Vrouwenverband"], "monument.number":["518303"], "monument.number.x_coordinates":["4.89397111487"], "monument.number.y_coordinates":["52.36897955310"], "monument.place":["Amsterdam"], "monument.province":["Noord-Holland"], "monument.record_number":["417272"], "monument.street":["Turfdraagsterpad"], "monument.type":[""], "monument.zipcode":["1012 XT"] } ], "object_number":["321.954"], "priref":["20000001"], "Production": [ {"creator": [ {"value":["Dukker, G.J."]} ], "creator.role":["Fotograaf"] } ], "Production_date":[{"production.date.start":["1998-07"]}], "Reproduction":[{"reproduction.reference":["d99c8594-4a8c-acf9-b498-6f3a0a4e5f4b"]}], "Rights":[{"rights.notes":["http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/"]}], "Technique":[{"technique":["zwart wit negatief"]}]}]},
Multichill (talk) 23:10, 16 December 2012 (UTC)
- User:Basvb/Current RCE images - List of images from the database not uploaded by the bot (afterwards have to be watched for duplicates.)
First test running
[edit]Created a lot of templates:
- {{RCE data ingestion layout}} does all the hard work
- {{Netherlands location Dutch}} to convert Dutch location names to the category names here: in use for provinces and ca 30 unique cities
- {{Possible Rijksmonument}} this might be a Rijksmonument
- {{Object location RD}} - We got a lot of coordinates, but in a different system.
- {{RCE-author}} - to get pretty creator templates
- {{RCE-subject}} - to convert Dutch topics into categories here
Looping over the images from 20000000. Only uploading images which have Rights_rights.notes==http://creativecommons.org/licenses/by-sa/3.0/ , data gets flattened into key values from json. Some fields occur more than once (see {{RCE data ingestion layout}}). Some of the fields available for normal users are not available in the api (for example municipality). What do do:
- Add more cases to {{Netherlands location Dutch}} based on discovered errors, or refine it to make use of the province information to determine the right location
- Make a good system to convert {{Possible Rijksmonument}} into {{Rijksmonument}} (maybe like {{Check categories}}?)
- Convert {{Object location RD}} to {{Object location}}
- Expand {{RCE-author}} and create/update creator templates
- Expand {{RCE-subject}} based on [3]
- Replace {{RCE-author}} and {{RCE-subject}} when ready (~/bin/rce-to-subst.sh)
- Manage to whole categorization effort
- Work on possibilities to identify images (semi-)automatic based on address or postal code
Multichill (talk) 22:17, 23 December 2012 (UTC)
Assigned to | Progress | Bot name | Category |
---|---|---|---|
Multichill, Basvb | Mostly done | BotMultichill |
Threat and deletion
[edit]In June 2014 me and some other users got threatened by the RCE to delete 5000 images "or else". The strong arming worked and the images were deleted in July 2014. I don't respond kindly to threats so this marked the end of this project. In the end about 465,000 images were uploaded. Multichill (talk) 12:30, 6 July 2014 (UTC)