Commons:Extracting images from PDF

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page explains how to extract images from PDF files. Some PDF files have whole pages as images, while some have images separately.

Extract PDF pages as images[edit]

Pages in a PDF file are often stored as images, in scanned books, for example.

  • Use XPdf command line tools pdfimages, pdftopng, pdftoppm, pdftops or XPdf Reader (File->Save Image).
  • Use freely available programs PDF-XChange_Viewer (File -> Export -> Export to Image) or STDU Viewer (File -> Export -> to image).
  • To SVG: use pdf2svg (Linux: pdf2svg) to convert to an SVG if the entire PDF file should be used as an image, e.g., if it is a diagram generated by some program.
  • PDFCreator can export PDF in several bitmap formats.
  • ImageMagick's convert can split a PDF into single images of pages; it's free.

Extracting images from PDF[edit]

PDF files can contain images that are actually at a higher resolution than the “100%” size of the document. Possible ways to extract images from PDFs include:

  • CropTool (https://croptool.toolforge.org/) can extract and upload images from PDFs already uploaded to Commons
  • Semadox PDF Image Extractor – free online image extractor produces png images that should be converted to jpg for sharp display on our projects per phab:T192744
  • pdfimages command-line tool in the poppler-utils (and prior xpdf) package.
    • Use the -j option to losslessly extract JPEG-compressed images, or -all to losslessly extract all images in their original file type. For example: pdfimages -all '/path/to/your.pdf' ./output-filename-prefix
  • Nitro PDF has a function to pull all images out of a PDF file at full resolution, and you can choose the output format (jpg, png, etc). However, it won't work if the PDF is password-protected. Users can get a free 14-day trial of Nitro PDF Pro with no credit card required. Names, email address, and country are required.
  • Evince, the most common Linux PDF reader, simply lets you right-click on an image and save it.
  • PDF Candy can be used to extract images online. 50MB file size limit. The free web version presents jpg images (better for sharp display on our projects per phab:T192744). PDF Candy Desktop 2.94 for Windows is 138 MB and digitally signed; it extracts all images in that format to a subfolder by default, or you can specify a page range or another folder. The PRO web version is not necessary for this, but is US$6/mo or $48/yr billed monthly. The PRO desktop version is not necessary for this, but is US$99.
  • Get pieces via PrintScreen and stitch them together in Microsoft Paint, GIMP, or a similar third-party program.
  • GIMP can also open pages from a PDF as an image at the resolution you specify. This is not quite the same as extracting the images. It provides no guidance on the ideal resolution for a given image, and it essentially renders the whole page before converting everything to an image. In short, it equivalent to the screenshot approach, but less work.
  • Inkscape, simply deselect “Embed (all) images” on the opening dialog. All images are now automatically extracted (as PNG images) in the folder. You can also simply right click a single image and choose “Extract Image…”.


Resolution issues[edit]

Some PDF readers can tell you the resolution; for documents created using typical “print quality“ settings, 300 ppi is probably the best guess. (Caveat: where the originals are between 300 & 450 ppi they’re often not downsampled to the 300 target, and moreover black-and-white “linework” images, one bit deep, are often kept at 1200 ppi or more.)

Getting around password protection[edit]

If the PDF is password-protected to prevent modification or extraction of content, you may be able to get around that by extracting the page with Inkscape, saving it as an unprotected file, then opening in Adobe Acrobat and passing the image to Photoshop or opening it in Nitro PDF and passing it to GIMP.