My wife needed some pdfs sliced and diced to put on a work-related website, so I ended up learning a bit about the PDF Toolkit.
My HP printer has a rather nice webscan app, which scans over the network directly into a pdf on the computer. However, the scans average around 1Mb a page, which is far too big. So we need to squish them down a bit first, using a ghostscript invocation I found here:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=1a.pdf 1.pdf
This will usually reduce the size to less than 100Kb.
We can combine individual pdfs using pdftk:
pdftk first.pdf second.pdf cat output combined.pdf
and conversely we can split an existing pdf into individual pages by using:
pdftk mybig.pdf burst
You can open a pdf in the GIMP for touching up (eg removing the darkest bits from a bad scan if you don’t have the original to rescan), and then save it out as a Postscript file. After that, you can convert the ps file to a pdf by running:
ps2pdf mypostscript.ps mynew.pdf
Theoretically, pdfs should orientate themselves correctly automatically. If you have some existing scanned pages that are in portrait format when they should be landscape, and you can’t get them to show up in the correct orientation, you can open them in the GIMP, correct the orientation there, save as Postscript, and then use ps2pdf to create the pdf. Occasionally, however, when you open the reoriented landscape file, you get it appearing in portrait mode, with the right-hand half of the file disappearing off the edge of the sheet. You can fix this by using another ghostscript invocation directly on the Postscript file:
gs -dBATCH -dNOPAUSE -sOutputFile=mylandscape.pdf -sDEVICE=pdfwrite
-c “<< /PageSize [792 612] >> setpagedevice” -f mywonky.ps