OCR Software

Although the paperless office still eludes most of us, the dawn of cheap scanners and capacious disk drives should let you recycle at least a few piles of clutter. Scanning documents to disk, even in color, is an easy task, particularly if you have a sheet-feed scanner geared to high-volume input. However, if you want to manipulate these documents in the future -- searching content, excerpting text, or repurposing the original -- you'll need to convert the text and graphics into some editable form, such as MS Word or Excel.

Two OCR packages promise to do this for you: I.R.I.S.'s Readiris Pro 6.05 and ScanSoft's OmniPage Pro 8.01. Each package has its strong points, but the prices are wildly different -- $99 and $499, respectively. However, both products earned the same Macworld 3.5-mouse rating, showing that, at least with OCR software, one size does not fit all.

Both products have a lot in common: support for virtually any scanner, using either standard TWAIN or Adobe Photoshop plug-in drivers; the ability to capture color graphics; automatic formatting of columns and tables; foreign-language recognition; and automatic multipage input. The differences in these features are modest. OmniPage's multipage input is more sophisticated than Readiris Pro's, but Readiris' foreign-language recognition covers 56 languages -- including non-Roman fonts, such as Cyrillic -- versus OmniPage's eleven European scripts. OmniPage Pro was slightly faster and more accurate than Readiris at scanning and recognition.

The packages differ a lot, however, in ease-of-use and control. Readiris provides a very straightforward, intuitive user interface that's almost impossible to use incorrectly. You click on three buttons in sequence: scan-and-format, choose output options, and OCR a page. The program's main window shows you the page you're working on, along with colored boundary boxes and arrows, indicating how the program's automatic formatting partitions text and graphics on the page. You can manually adjust the program-discovered boundary boxes and change the order in which boxes get processed. Output options are plain text, RTF (Rich Text Format, supported by most word processors), and HTML, and you can choose to retain some or all formatting. During the OCR step, Readiris prompts you when it can't identify cryptic bits of text, displaying the problem text in context and learning as it goes. Readiris does a remarkable job of reconstructing even complex documents with embedded graphics and tables, which is a good thing, because you have few options for tweaking the program's behavior.

OmniPage Pro, on the other hand, has a complex interface with lots of bells and whistles. You get a thumbnail view for when you work on multiple pages at the same time. You also get output options for a huge range of document types, such as MS Word and Excel. Dozens of options give you extensive control over OmniPage's behavior. Time-saving shortcuts help you to move documents quickly in a high-volume production environment. For instance, you can train OmniPage Pro on specific document types, such as bills or forms, and maintain separate dictionaries and memorized patterns for each document type, minimizing time-consuming corrections. A Verify Text command lets you select any output text and display a miniclip of the scanned input that generated the text. Although OmniPage's default mode doesn't produce as faithful a rendition of complex documents as Readiris, OmniPage's controls can produce superior results.

At a Glance
  • Macworld Rating


    • Good batch processing
    • Sophisticated user interface
    • Extensive control over recognition process
    • Time-saving shortcuts


    • High-quality output requires some manual tweaking
    • Expensive
