Commons talk:Internet Archive/Book Images collection

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Copyright[edit]

Default copyright status for the images will presumably be {{PD-US-1923}}. Given that some of these books were presumably first published in Europe, that doesn't quite match Commons's charter; but given that the scans have been published in the USA, and Commons is very much US-based, it would seem a shame not to include them all. Jheald (talk) 13:27, 30 August 2014 (UTC)[reply]

Note that the dates recorded for items by the IA are not always 100% reliable (dates for periodicals in particular can be unreliable), so should be taken with a grain of salt and sanity-checked. Jheald (talk) 15:08, 30 August 2014 (UTC)[reply]

Google images[edit]

A sizeable fraction (~30% ?) of books with scans at the IA are taken from Google Books, uploaded by Aaron Swartz. Google typically ramp the contrast right up, often making the image quality of any illustrations very very poor. It's not clear how much of these are in the new IA collection (or indeed what is its breakdown by original source institution), but these are images we may or may not want to upload. Jheald (talk) 13:27, 30 August 2014 (UTC)[reply]

Alternate upload approaches[edit]

For file names I have been typically using something like Bk(date) pn.nnn - description.jpg, where Bk is some short identifier for the book, date its year of publication, pn.nnn its volume and page numbers, to give results like this category.
  • User:Metilsteiner has been using a similar approach, but doing more preparation work on his own computer, then using Upload Wizard to load images into a reception category in his own workspace.
Both approaches are highly time consuming.
This can upload very very large numbers of images in an automated way. But it is useful to create metadata for the images first, that can be slotted straight into an {{Artwork}} template; including appropriate file names. So with this approach there is a lot of front-end work to do before one can even see the images, particularly for collections like this one where the metadata is initially quite weak -- eg not even a machine-readable title or caption. Commons doesn't really take too kindly to a mass of undifferentiated images without even meaningful filenames being suddenly dumped into a dispersion category. We also need to make sure that books aren't re-uploaded that have already been uploaded by hand. And we may need to take care as to whether we really want to upload repetitive decorative fillers and other duplicate images. But in principle, with some development, this more automated approach may be a way to upload more of the collection quicker with less manual input. Jheald (talk) 14:30, 30 August 2014 (UTC)[reply]

Notifications[edit]

Notifications cross-posted to

Jheald (talk) 18:25, 31 August 2014 (UTC)[reply]

Category[edit]

We have Category:Internet Archive Book Images and Category:Files from Internet Archive Book Images Flickr stream. I see no purpose of keeping both of them. If nobody objects, I will redirect the first one to the later and move the files in that one. Razvan Socol (talk) 07:14, 4 September 2014 (UTC)[reply]

Hi @Rsocol: . I have no strong feelings about the naming either way, but you're right: having two cats is definitely one cat too many. I'm happy enough with the latter name, and if anyone wants something else, it can always be moved later. Good catch.
BTW, I see you were doing some categorisation of these. I guess somebody should probably take a look at Special:LinkSearch to see if there are any more that should go in the cat. Jheald (talk) 13:04, 4 September 2014 (UTC)[reply]
eg:
Jheald (talk) 13:11, 4 September 2014 (UTC)[reply]
Done (using AWB). Razvan Socol (talk) 14:12, 4 September 2014 (UTC)[reply]

Dates for works are not accurate and many are post-1923[edit]

As I mentioned on Commons-l, many of the images in the Flickr stream are from periodical collections, but the date listed is the date for the beginning of the collection, not the date the image was actually published. For example, all the images from the Highland Echo collection are dated to 1915 even though the collection spans from 1915 to 1925. Please, do not mass upload the entire Flickr stream. My suggestion would be to identify collections within the stream that are especially useful or interesting and are definitely pre-1923 and originally published in the US. Then write a bot to import only those specific collections. Kaldari (talk) 08:33, 21 September 2014 (UTC)[reply]