AppleScript versus Alameda County
I know this because I was interested in reading the ruling in the case of the University of California’s stadium project, which is being challenged by various Berkeley groups. (I’m a Cal football fan, so the subject of Memorial Stadium is near and dear to my heart. And I’ve been known to blog on the subject in my spare time, so I wanted to be up to speed on the ruling.)
Apparently the Alameda County courts are using technology so ancient that, rather than generate a PDF out of whatever computerized document system they used to generate the verdict, they printed out a copy, scanned it in, and posted the images straight from the scanner. (And then relied on a Java applet I couldn’t get working on my Mac to display it.)
Ridiculous. So I made a PDF of the whole thing, allowing me to read it easily online or off by using Apple's Preview. I even made the text on the scanned-in pages searchable, so I could find out exactly which page of the 127-page ruling covered obscure topics like oak trees or the Alquist-Priolo Act. Here’s what I did.
Because I couldn’t find a link to the raw TIFF files (I found them later), I dug out the file format for the TIFFs from the error messages of the Java applet. Then I conjured a simple AppleScript script that would use Interarchy (my FTP client of choice, though just about anything that downloads files from the Internet would do):
set theURL to "http://apps.alameda.courts.ca.gov/fortecgi/fortecgi.exe/frte_DomainWebService101E1C7C1F2D1C2429242324177A24?servicename=DomainWebService&Pagename=Image&Action=21544273&parent=14539963&id=" tell application "Interarchy" repeat with thePage from 1 to 130 set resultNumber to webfetch url (theURL & (thePage as string)) end repeat end tell
Once that was done, I had a folder full of TIFFs. But because the file names didn’t end in .tif, my Mac didn’t know what they were. Easy. I called up Automator:
This simple Automator action (which I didn’t even save, just ran it and then quit) added “.tif” on to the end of every file I downloaded. Presto!
Then I launched Adobe Acrobat Professional (yes, I could have used Preview, but since I have Acrobat Pro at my beck and call, I thought I’d put it to use). I chose File: Create PDF: From Multiple Files, and dragged in all of my images. Once they were assembled into a big PDF, I chose Document: OCR Text Recognition: Recognize Text Using OCR, and Acrobat rotated all the pages to be perfectly aligned (someone in the Alameda County office was a sloppy scanner!) and embedded each page with computer-readable text, which makes the scanned-in document searchable.
When I was all done, I posted my PDF on the Internet and passed the link on to a few people. I believe you’ll find my PDF at the San Francisco Chronicle web site, and I know for a fact that my PDF is the one posted on the UC Berkeley news page.
Not bad for a half hour of work. If only the folks in Alameda County had done it themselves.