Accurate optical character recognition (OCR) is difficult to achieve. An OCR program must not only decipher text printed in different fonts, sizes, and alphabets, and convert it to editable text, but also distinguish between text, graphics, and tables. Thanks to a new OCR engine, IRIS’s Readiris Pro 9 largely meets these challenges.
New and Improved
Since we last reviewed
; October 2002), the program has adopted some features of ScanSoft’s
OmniPage Pro X
; June 2002; currently incompatible with OS X 10.2 and later). It can now import PDF files, even read-only ones. It also features a one-button recognition mode that scans a document; maps out text, graphics, and table zones; and saves the file in the output format you choose (text, RTF, HTML, or PDF). Like previous versions, Readiris Pro 9 recognizes 100 languages and the alphabets that support them, including Cyrillic and Greek characters and four Asian add-ons. It offers rotate, deskew, contrast-adjustment, and despeckle tools for cleaning imperfect scans and digital photos.
Readiris Pro 9 accurately recognizes text from a clean scan. In our tests of a scanned Reviews page from Macworld, a scanned product brochure, and two Apple specification sheets in PDF format, the program rarely misidentified common characters. It did have trouble with symbols such as Ω, ©, ±±, and fractions. We performed the same tests with an earlier version (Readiris Pro 7), and the program is now much better at recognizing graphic artifacts and text.
Though recognition is accurate, the spelling checker is too vigilant: it flagged more than 100 characters in a one-page document. Although the spelling checker flags many correct characters and requires that you push either the Learn or the Ignore button for each character it flags, you may want to endure its frustrating persistence — if you do, checking similar documents will be easier. As you check a document, you instruct the program to learn character patterns. It may ask whether the letters ll are really the numerals 11, for example; if so, you can train it to better recognize those letters in the future. You then save the learned characters to a dictionary file that you can apply to similar documents. We compiled a dictionary based on one of our two Apple PDF spec sheets, and then used it with the other one. Without that dictionary, Readiris flagged 98 characters; with the dictionary, it flagged only 36. One inconvenience is that you must manually load a dictionary file every time you open a new document.
Readiris is less accurate when identifying zones, the parts of a document to convert. It misidentified the color table in our Reviews page as a mix of graphic and text zones.
It also tagged the image of a flat-panel iMac on the specification sheet as text. You can delete incorrect zones, but you must either delete all the zones in the document or delete one zone at a time — you can’t select an area of the document and delete just the zones in that area.
The program converts some documents better than others. Readiris created a fine RTF file in Word from our Reviews page, except for the table. And it correctly recognized the accented characters in a French novel. But we got overlapping text columns when we saved the Apple specification sheets as an HTML file and then opened the file in Safari and Internet Explorer. We had to scale down the font sizes to make the page readable.
Macworld’s Buying Advice
Readiris’s new OCR engine is an admirable improvement. Now that IRIS has dispensed with the tough stuff, we look forward to better autozoning and a spelling checker that has greater faith in the program’s judgment.