September 6, 2011

PDF hacks

I have updated this post numerous times since it was originally written to document newer techniques for my increasingly regular needs. It now includes a toc to jump to the topic of interest without having to read the entire post.

Combining folder-wise pdf files (Jan 28, 14)

From revision control considerations, drawings are produced and archived as individual drawings per file. In the end, they get archived under a folder bearing their respective offshore platform names. I received such a cache today.

The pain of referring to such ordered drawings, when one starts to double-click and open numerous one-page pdf files — all just to refer to some detail — is obvious. To avoid cluttering my desktop with endless instances of Acrobat Reader windows, I started combining these drawings platform-wise (i.e., folder-wise), so I could just open one file per offshore platform containing all associated drawings.

Processing a handful of platforms by hand proved tiring, as I kept moving constantly in and out of directories using cd, giving combined file its platform-name and then running ghostscript at command line. With over fifty platforms, I faced the prospect of spending an hour or more to run this manually. Instead, I wrote this following python script to automate the grunt work (for use in my cygwin environment).

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
2014 ckunte
'''
import os
import platform

platforms = os.walk('.').next()[1]
for platform in platforms:
    msg = 'echo "Processing ' + str(platform) + ' ..";'
    cdr = 'cd ' + str(platform) + ';'
    if platform.system() == 'Darwin':
        cmd = '/System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py -o' + str(platform) + '-layout.pdf *.pdf;'
    else:
        cmd = 'gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=' + str(platform) + '-layout.pdf *.pdf;'
    os.system(msg + cdr + cmd)
pass

When run from the parent directory containing all subfolders, each sub-folder gets a new combined file that contains all drawings of that platform, and uses platform name (or subfolder name) as its filename.

Clean up file and folder names

Be aware that the os.walk() will not pick-up folders with long file names. Folders (and preferably files) with long names will need to be renamed first into short names, which can be done using zmv as follows:

#!/usr/bin/env zsh
autoload -U zmv
# Safety first: Always use -n switch to test changes before commit.

# Clean up file names in subdirectories and remove special characters
zmv -n '(**/)(*)' '$1${2//[^A-Za-z0-9.-]/_}'

# Replace spaces in filenames and folders with a hyphen
zmv -n '* *' '$f:gs/ /-'

My aforementioned case is unique — folder structure, et al. If all one needs is a way to combine pdf files on a Mac, then here’s an Automator workflow. Hit Run.

Workflow in Automator


Watermark workflow (May 25, 14)

This is one of those things I do not (want to) remember, hence the workflow dump here. I use the following Automator workflow to mark all pages of my draft report as such. It uses an image of a draft stamp, and I use some positioning to center it approximately (x = 230, y = 340 seems to work fine for an A4 size paper).

When the workflow is run, Automator asks for a pdf file to be watermarked.


Combining chapters with GS

My daughter brought home a Mathematics book that was distributed by the school on day one in year seven yesterday. The book is essentially packaged as a CD-ROM. It’s generally well laid out, and nice, except for the fact that it has elaborate instructions on how it would not work on Preview (Mac OS X), following that up that it only works with Acrobat Reader. Things like these really bother me, and I am unable to explain in simple terms what the real fuss is about to my family, who just want to get it working by any means necessary.

I looked at the book’s structure, and realized that it uses a simple web-like folder and linked files with links from the table of contents pointing to chapter-wise linked pdf files. No DRM. Preview, obviously does not open linked files; and I like Preview that way; we don’t need complex readers.

I boot ubuntu up, and look for a ghostscript recipe from my notes. We then sat through, and renamed all individual pdf files in the order of the table of contents, so the book is compiled exactly like the original. We then executed the following in Terminal:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOUTPUTFILE=math7.pdf *.pdf

A minute later, and the book is reconstructed as a single file that opens in any reader, including the Preview. My kid can now carry her book on her USB stick to school.


Compressing PDF files with GS (Dec 2017)

While looking for a way to compress my large pdf report, I stumbled upon this tip suggesting the use of Ghostscript to compress. It works great. I could shave 10MB (20%) off a large report! Here’s how:

#!/usr/bin/env sh
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=file-compressed.pdf file.pdf