Last year, in an article entitled The Real Paperless Office, I explained my system for keeping my home office almost entirely paper-free, using nothing more than a scanner, its included OCR software, and some AppleScripts.
Since that article was published, though, some things have changed, requiring an update to the scripts I provided. In particular, Adobe released Acrobat 9 (]), and I wanted to revise the scripts to be intelligent enough to use that new version, or a previous version, without forcing users to choose among different scripts or perform manual editing. Based on reader feedback, I also wanted to build in some additional error checking.
About the Scripts
If you’re using Acrobat Standard, Acrobat Pro, or Readiris Pro for OCR, the easiest way to automate the process of OCR’ing new scans is with one of these AppleScripts.
Because Acrobat’s support for AppleScript is limited (and Readiris’s support for AppleScript is virtually nonexistent), these scripts use UI scripting for some tasks. That means instead of sending commands directly to the applications in the background, they must make the application believe that menu commands have been chosen, buttons clicked, and so on. Therefore, you can’t be doing other activities while these scripts run, because that may prevent the right controls from being visible to the script at the right time.
In order for UI scripting to work, you must enable access for assistive devices. To do so, go to the Universal Access pane of System Preferences and make sure “Enable access for assistive devices” is checked at the bottom of the window. The updated scripts check for this setting and alert you if it’s incorrect.
Installing the Scripts
To start, download and unzip this archive. Copy the three scripts it contains into /Library/Scripts/Folder Action Scripts.
OCR This (Acrobat).scpt causes Acrobat to recognize the text in PDF documents and then save the file (with the existing name, in the existing location) and closes it with no need for interaction at all.
OCR This (Acrobat) with Save As.scpt causes Acrobat to recognize the text and then prompt you to enter a name and select a location; after saving the file, the script then instructs Acrobat to close the window. (There may be a very brief delay before the window closes.) Note that with this script, the original file remains in the folder to which you’ve attached the folder action (see below); you can later delete it manually if you wish.
Both of these two scripts have been updated to work with Acrobat Standard version 7 and Acrobat Pro versions 7, 8, and 9, without requiring any editing. (Adobe Reader doesn’t have OCR capabilities, alas, so you must use either the Standard or Pro version.) If you have more than one version of Acrobat Standard or Acrobat Pro installed, the script automatically uses the newest version.
Before you can use either of these scripts, you must configure Acrobat’s OCR settings as described in The Real Paperless Office. Note that in Acrobat Pro 9, a new option is available in the Recognize Text – Settings dialog box. In lieu of my earlier recommendation to choose Searchable Image (Exact) from the PDF Output Style pop-up menu, you can opt for ClearScan, which may reduce file size by embedding one or more synthesized fonts in your document that approximate the look of its existing fonts, while storing a lower-resolution version of the scanned image itself. If you’re unsure which you might prefer, try duplicating a scanned PDF and performing the text recognition with both settings, and then open the resulting files to see how they look.
As the name implies, OCR This (Readiris Pro).scpt works with Readiris Pro. It’s been tested with version 11.6.3; I can’t guarantee how well it will work with earlier or newer versions.
To use this script, you need to set up Readiris. In the Settings: Document Type menu, make sure Text is checked; if not, select it. Then, choose Settings: Text Format and, from the Format pop-up menu at the top of the window, choose PDF. From the pop-up menu next to it, choose Image-Text. Uncheck Embed Fonts and Create Bookmarks, and check Ask File Name and Location. Leaving the other settings as they are, click on OK. Finally, choose Settings: Save as Default. (That way, these settings should stick when you use Readiris Pro again.)
To implement these scripts, right-click (Control-click) on the folder you’ve designated to hold new scans and, from the contextual menu, choose More: Enable Folder Actions. Right-click again and choose More: Attach A Folder Action. In the window that appears, navigate to the AppleScript file you want to use, select it, and click on Choose.
Thereafter, whenever you scan a new document and it appears in this folder, the AppleScript will activate automatically, opening the scanned file in Acrobat or Readiris and activating the program’s OCR function. If you’re using Readiris, you’ll be prompted to enter a name and select a location. After you save the file, Readiris creates a new document (which clears all the existing scanned pages from its list).
NOTE: If you happened to have any pages open in Readiris before running a script, the script will close them (so as to avoid adding extra pages to your PDFs). Therefore, before you do any scanning, make sure you’ve saved anything you were previously working on.