Business

Extracting content from PDF files

I need to say it: PDF files are supposed to be the final format for content. You aren’t supposed to need to edit anything once it’s been turned into a PDF. In the real world, however, many people find themselves, for a variety of reasons, needing to get material out of a PDF, either for reuse someplace else or for further editing. Here’s how to get that PDF content out of an Adobe Acrobat document. Note that the instructions below are operative for Adobe Acrobat's pro version, as opposed to the free reader.

Images

Assuming the PDF security allows content extraction, getting an image out of a PDF is actually very easy.

To extract one image at a time, choose the Select tool (it looks like an I-beam coupled with a black arrow). Click once on the desired image, which will highlight to show it’s been selected, and then right-click (Control-click with single-button mice) and choose either the Copy Image or Save Image As command. The former puts the image on the clipboard ready to paste into Word, Photoshop, or another application, while the latter saves the image as a separate file on your desktop or wherever you want to store it.

If you need to get a bunch of images out of a PDF there’s a faster way. Go to File > Export > Image and choose your desired format, JPEG, JPEG 2000, PNG, or TIFF. If you expect to print these images, export them as TIFFs. Whichever format you choose, you’ll be asked to save the files to a location on your computer. Before you click Save, note the Settings button that opens into a dialog that lets you specify the document quality and other options like color management and resolution that are relevant to your chosen file format.

It's easy to get an image from a PDF file.

Text

Exporting textual content is even easier than exporting images. Simply go to File > Export and select your desired export format—Microsoft Word, XML, HTML, plain text, or Rich Text Format, which is the vanilla format understood by Word, WordPerfect, OpenOffice, and nearly every other word processor of the last ten years.

When you export images from a PDF they look exactly as you see them in the PDF. That isn’t always the case with textual content. Sometimes the formatting gets lost in translation between Acrobat and, say, Microsoft Word. That’s because PDFs were designed to be the final format for documents, and that content was never meant to be extracted from them for reuse elsewhere.

On Wednesday, we'll talk about how to add, remove, and swap pages in your PDF file. Stay tuned.

Pariah S. Burke is the author of Mastering InDesign CS3 for Print Design and Production (Sybex, 2007), and other books; a freelance graphic designer; and the publisher of the Web sites GurusUnleashed.com, WorkflowFreelance.com, and CreativesAre.com. Pariah lives in Portland, Ore.

This article was updated to clarify the version of Adobe Acrobat referenced in the story.

Subscribe to the Apple @ Work Newsletter

Comments