Help:Splitting and joining PDF, DjVu and images

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page will explain splitting and joining (merging) pdf, djvu and tiff files. It may be required as a part of conversion process.

Enhancing the command line[edit]

Splitting and joining is usually done with command line tools. While the standard Windows command shell is enough for the task, there are programs that make it more convenient to use. One of them that has positive reception is ConEmu.

Splitting/joining PDF[edit]

Using command line tools[edit]

One of the command line tools that can split and join PDF files (among other features) is Coherent PDF (cpdf).

Splitting:

  • Specifying the page range: cpdf input.pdf 1-10,20,30,100-end -o output.pdf will produce the file with the pages specified: 1 to 10, then 20, 30, then all the pages from 100 to the end.
  • Breaking the files into individual pages: cpdf -split input.pdf -o output%%%.pdf will give files output001.pdf, output002.pdf etc.
  • Splitting on bookmarks:
    • cpdf -split-bookmarks 0 a.pdf -o out%%%.pdf breaks the file on the top-level bookmarks.
    • cpdf -split-bookmarks 1 a.pdf -o out%%%.pdf breaks the file on the top-level bookmarks and also on the first-level child bookmarks.
    • cpdf -split-bookmarks 0 a.pdf -o @B.pdf Uses the bookmarks for file names.

Joining:

  • cpdf input1.pdf input2.pdf [...] -o output.pdf
  • All in the current directory: cpdf *.pdf -o output.pdf

Doing a combined operation:

  • cpdf input1.pdf 1-10 input2.pdf 1,5,10 -o output.pdf will create a file with pages 1-10 from input1.pdf and pages 1, 5 and 10 from input2.pdf.

More details are in the manual.

Splitting using virtual printers[edit]

A PDF virtual printer is a piece of software that installs itself as a printer, which appears on the list of printers in the Print dialog box. When 'printing' with that printer, the result is saved as a PDF file on your computer.

PDF-XChange Lite Printer is an example of a free virtual PDF printer, while there are many others.

So to split a PDF with a virtual printer, you simply need to use any PDF reader program that you open PDF books with and to 'print' the document with a virtual printer, and in the Print dialog, specify the exact page range(s) or number(s) that need to come out.

The resulting PDF file on the binary level will not be the exact copy of the pages of the original one, because the virtual printer encodes it afresh in its own way. So, depending on the algorithm used for that, the output file may gain or lose in terms of quality and size.

Splitting/joining DjVu[edit]

Splitting and joining DjVu is done with the djvm and djvmcvt tools from the DjVuLibre package. They don't allow flexibility like the cpdf program provides. Joining (merging) is done with djvm. Let's just quote here its self-explaining help:

DjVu multipage document manipulation utility

Usage:
   To compose a multipage document:
      djvm -c[reate] <doc.djvu> <page_1.djvu> ... <page_n.djvu>
      where <doc.djvu> is the name of the BUNDLED document to be
      created, <page_n.djvu> are the names of the page files to
      be packed together.

To insert a new page into an existing document:
      djvm -i[nsert] <doc.djvu> <page.djvu> [<page_num>]
      where <doc.djvu> is the name of the BUNDLED DjVu document to be
      modified, <page.djvu> is the name of the single-page DjVu document
      file to be inserted as page <page_num> (page numbers start from 1).
      Negative or omitted <page_num> means to append the page.
      <page.djvu> can be another multipage DjVu document, in which case
      all pages of that document will be inserted into <doc.djvu>
      starting starting at page <page_num>

To delete a page from an existing document:
      djvm -d[elete] <doc.djvu> <page_num>
      where <doc.djvu> is the name of the docyment to be modified
      and <page_num> is the number of the page to be deleted

To list document contents:
      djvm -l[ist] <doc.djvu>

For example, to join all the DjVu files in the current directory, type djvm -c book.djvu *.djvu

Djvmcvt can be used to split a DjVu files into individual pages. From its help:

DjVu multipage document conversion utility

Usage:

To convert any DjVu multipage document into the new INDIRECT format:
          djvmcvt -i[ndirect] <doc_in.djvu> <dir_out> <idx_fname.djvu>
          where <dir_out> is the name of the output directory, and
          <idx_fname.djvu> is the name of the top-level document index file.

The <doc_in.djvu> specifies the document to be converted.

For example, djvmcvt -i input.djvu folder index.djvu Will create a series of one-page files in the 'folder' directory. You can select some of them and join them using djvm as described above.

Splitting using virtual printers[edit]

A virtual printer can be used to print selected pages or ranges of DjVu books into PDF format. So not only splitting, but also a conversion to PDF will take place on-the-fly. It's possible to print into DjVu format this way too, if you can find a DjVu virtual printer.

Splitting/joining TIFF[edit]

Multipage TIFF images can of course be splitted and joined too. Someone familiar with the process is welcome to edit this article and contribute their knowledge on this.