De-archive Web Archives

One of the (welcome) additions to version 2 of Safari, included with Tiger, is the ability to save an entire Web page—text, images, and all—for offline viewing. You perform this task by viewing the desired Web page, choosing Save As from Safari’s File menu, and then choosing Web Archive from the Format pop-up menu in the Save dialog. You can open the resulting .webarchive file in Safari and it will (roughly) look as if you were viewing the page normally via the Internet.

This is a great feature; however, it has two downsides. The first is that these Web archives can be viewed only in Safari; you can’t open them in another browser. (An inconvenience if you, or someone to whom you send an archive, prefers to use a browser other than Safari; a show-stopper if you send an archive to someone using anything other than Tiger—including Windows.) The second is that if you ever need to get at any of the content of an archive—images or text, for example—you must use Safari to first open the archive, then grab the content from there. (See downside #1.)

As easy solution to both these restrictions—with a few caveats, noted below—is Greg Weston’s free WebArchive Folderizer 1.2.2 (   ). Simply drag a .webarchive file into WebArchive Folderizer’s generic-looking window (or onto its application icon) and in just a few seconds the utility extracts the contents of the Web Archive into a folder (located in the same directory as the original .webarchive file). Inside that folder, the original hierarchy of the archive’s contents is preserved; you can easily grab any of the contents of the archive or—by finding the index.html file of the original Web page—open the archived page in any browser.

WebArchive Folderizer main window

WebArchive Folder’s main window

WebArchive Folderizer folder

A “folderized” version of a Web Archive of the CNN home page

Unfortunately, WebArchive Folderizer can’t perform magic. Because of the way somes sites are coded, you can’t open the “folderized” archive via the index.html file contained inside, and even for those you can, sometimes the result doesn’t look exactly like the original. Similarly, on some Web pages, media isn’t available in the folderized archive because Safari didn’t include it in the original archive. Still, WebArchive Folderizer has already come in handy for me several times.

WebArchive Folderizer works with Mac OS X 10.2 and higher. (Although Tiger’s version of Safari is required to create a Web Archive, you can use WebArchive Folderizer on any Mac running 10.2 or later to extract content from that Web Archive.)

recommended for you

Mac 911

Read more »

Subscribe to the Apple @ Work Newsletter

Comments