User talk:DragonflySixtyseven/Old newspapers

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

There are so many newspapers in the Chronicling America archive that, even limiting myself to only the ones published pre-1924, I will probably never finish pillaging them by myself unless I were to give up on all other projects (like, for instance, Wikipedia). So here's my methodology, in case you'd like to help out.

  • First: Install GIMP. It's free. Then, open GIMP and your browser. I previously had a less efficient method which also required that I have the similarly-free ImageMagick installed; I suppose it can't hurt to keep that.
  • Go to the scanned page on Chronicling America. In the upper right corner is the option to get the page as text, pdf, or jp2. Copy the URL for the JP2.
  • In GIMP, click the File menu. One option is "open location". Select that, and give it the URL for the JP2 you just copied.
    • Sometimes Chronicling America links to the wrong .jp2, but you can get around that by manually editing the URL.
    • For OS X users, try sips - a built-in utility you can use to convert .jp2 to .jpeg. Inside the terminal, run..
sips -s format jpeg  -s formatOptions 100 "IN_FILE.jp2" --out OUT_FILE.jpg
  • Once the page is open in GIMP, use the rectangle tool to select the desired image, then crop everything else out. Captions often have important metadata about the image. Include them. Then, export the image in a .jpg format; this is done by simply giving the file a name that ends in .jpg . The program responds to the request for .jpg filetype by offering an array of options; change the default settings by setting the 'Quality' slider to 100 (instead of 90, which is the default), and then opening the 'Advanced Options' sub-box and de-selecting "Progressive". Then click 'export'. If it doesn't save, make sure you've told GIMP what folder you want to save it to.
  • Upload the image to Commons. I usually do all the images from a given newspaper page in a single joint upload, assuming that they all have the same creator (i.e., 'unknown'); any for which the creator is known have to be done separately. The filenames, descriptions, and other metadata can take the longest; sometimes the original pictures had captions, which I transcribe even if I've included them in the .jpgs. (Note: transcribing can get boring; if you don't feel like doing it yourself, add {{transcribe here}} in the description.) When the upload wizard asks for "Source", I give it the URL of the original page on Chronicling America, and then add the newspaper's name in parentheses afterward (linking to the newspaper's Wikipedia article if one exists).
  • Annotate the images if necessary / applicable. using the 'Add a note' function.
  • Once I've pillaged all the pages in a given issue of the newspaper, I add that issue to the list whose talk page this is.

Important point: some images (in particular, images from advertisements) are used on multiple days. I usually only take the image on the first day that it appears, but sometimes a much better version of the image appears later (case in point: File:Bicycle image 1904.jpg - compare the March 23 1904 version to the March 27 1904 version!), so if that happens, I just upload the better version over the worse version.