Back up Entourage with AppleScript

The way Microsoft Entourage and Apple's Time Machine are designed puts them on a collision course. Entourage's mail database is a single large file. Time Machine backs up files, and can't look inside them or back up the changes within a file. The end result: when even a single new message pops into your Entourage database, Time Machine must back it up in its entirety, rapidly filling up your backup drive.

To be fair, Entourage is not the only application with this issue. Photoshop, Illustrator, Final Cut Studio, Premiere Pro, FileMaker Pro, and many others create huge files that may only change slightly, yet still have to be backed up in their entirety if you're doing a file-level backup. And when you're talking about video, Entourage's database is tiny indeed by comparison.

But just because there are problems out there other than Entourage doesn't mean that backing up Entourage isn't a problem. However, there's an issue there that many overlook: There's a difference between the Entourage Database itself and the data in that database.

If you want the data, well, you have quite a few options, most of which don't require third-party products. If we assume everyone using Entourage is at least on a current version of Mac OS X 10.4 or Mac OS X 10.5, then there are only three potential problem areas: e-mail, links, and attachments. (Due to changes made in Entourage 2008's AppleScript implementation, we're going to concentrate on that version.)

Let me warn you: this can be pretty tricky stuff. If you're a brand-new Entourage user, or new to the Mac, you will probably want to have a friend who is more comfortable with both Entourage and AppleScript to help you. But even if it seems overwhelming, if you take your time with it, proceeding methodically and carefully, you'll be fine.

Here's what we'll be doing:

  • In Entourage, enabling Sync Services for Contacts, Calendars/Tasks, and Notes.
  • Setting up a rule in Entourage that classifies all incoming mail that isn't junk as "New e-mail."
  • Setting up a schedule that runs an "Export new e-mail" script periodically.

It looks simple, and if you know your way around Entourage, it really is. But rather than just say "do this, do that, and life will be good", I'm going to try to give you some of the why behind the how and the what. Again, you don't have to be an AppleScript or Entourage genius to use the information in this article, but if you are not comfortable with rules, complex schedules, or using (not creating) AppleScripts, then you should have someone who is more comfortable with those things help you out.

Use Sync Services

With Mac OS X 10.4, Apple introduced the Sync Services framework. This took the original idea of iSync, and expanded it from a single application to an OS-level framework that anyone can tie into. With Sync Services, the idea is that rather than trying to get every contact application to directly fiddle with Address Book's database or iCal's files, or Mail's settings, they just share data via a central management system and repository.

Sync Services help you back up Entourage data using Time Machine, but only indirectly. Here's how: if you set up Sync Services to synchronize your Entourage address book, calendar, and task data with Address Book and iCal, your data will be copied from Entourage's database and imported into Address Book and iCal's native file formats — namely individual files that work better with Time Machine's one-file-at-a-time backup strategy. Sync Services synchronizes data all the time, in the background. (In my experience, it takes about 20 seconds for Sync Services to propagate data and/or changes.)

Entourage even synchronizes its Notes out via Sync Services, although Apple doesn't, as yet, have an application that synchronizes Notes. Luckily other applications do, such as Bare Bones Software's Yojimbo.

If you want offline backup of Sync Services items, .Mac is an excellent option, and it's one I use regularly. Just enable the correct sync options in .Mac in System Preferences, and sync away. Presto! Offsite backup! Does using Sync Services in this fashion get you the same level of backup as backing up the physical database? No, it doesn't, but it gets you pretty close, and brings Time Machine to the party.

First steps: Contacts, calendar, tasks

In Entourage's Preferences, (Available from Microsoft Entourage: Preferences, or Command-comma), select Sync Services, and you'll get Entourage's Sync Services settings, as seen below:

Sync Services settings

You can choose to synchronize everything, as I do, or whatever combination of the three options you like. For Address book and Calendar, you can also opt to synchronize either your local Entourage Contacts and Calendar/Task info, or the Exchange versions (if you have an Exchange account). From what I've seen, tying Exchange information to Sync Services is more problematic than using local information, so if you are going to use Exchange data with Sync Services, be careful. Along those lines, while you can turn on Sync Services for all three at once, I highly recommend you don't, just to make dealing with any potential problems easier. Finally, even though it would seem that you're only synchronizing with .Mac, fear not. You're actually talking to Sync Services. .Mac only gets involved if you have a .Mac account and set up that synchronization separately.

My recommendation is to enable synchronization one item at a time. First Notes, then Calendar, then Address Book. Wait a day or so between each one, and make sure you have each one running smoothly before enabling the next one. Before you enable syncing, back up your data via Entourage's export function, and the other applications' methods, as backups are a great way to get yourself out of a jam.

When you enable Sync Services for the first time, and click "OK" in the preferences, you'll be asked how do you want to start syncing. Your options are:

  1. Have Entourage and Sync Services combine data
  2. Have Entourage overwrite Sync Services data
  3. Have Sync Services data overwrite Entourage

Of the three, the second and third options are the simplest and safest. The obvious problems would be that if you have data in, say, both Address Book and Entourage, picking the second or third options will lead to data loss. The first option is the most complex, but I've used it, along with the others, and while you have to do more cleanup with this option, if you have a lot of data in both programs, it may be the better choice.

If the data in Sync Services (Address Book and iCal) is exactly the same as the data in Entourage, then pick either the second or third option, since that Sync Services data is a duplicate, for the most part. But if you have a lot of unique data in both, then you should use the first option.

This is not going to create perfect synchronizations. For exmaple, Entourage categories don't carry across well. In Address Book, they tend to show up as groups, although this is imperfect. In iCal, all your Entourage calendar data gets jammed into one calendar named Entourage. This is because of the differences between Entourage's one-calendar-with-categories approach and iCal's multiple-calendars-no-categories approach.

Yes, it's possible that down the road Entourage may create a separate calendar per category, but that might be less of an improvement than you'd think. I have 58 categories in Entourage that I use with great effectiveness, but 58 separate calendars in iCal would be an unreadable mess.

Neither program's approach is fundamentally better or worse. They're just different, and as a result, the act of syncing data between them will run into conflicts.

Another pain point is the recurring task, and from what I can tell, this is a problem with Sync Services, not Entourage or iCal. Contact pictures also won't sync, because every application uses that data differently, and neither will IM data, since Address Book is tied to iChat and Entourage is tied to Microsoft Messenger. So, if you use both, ONLY use Address Book for iChat data and ONLY use Entourage for Microsoft Messenger data. Any other way will lead to madness.

Next step: e-mail

There are a number of strategies for backing up Entourage e-mail. I prefer leaving it on my mail server. I have been a nigh-exclusive IMAP user since 1997, and it's the most relaxing way to deal with e-mail backups. E-mail client crashed? Who cares — it's on the server. Need to check e-mail from someone else's computer? It's on the server.

The problem with this approach is one of space. If you are a mail pack rat, like me, you can have IMAP stores of many gigabytes. E-mail administrators will gripe about this, but any competent IMAP server can handle this, and disk space is cheap.

Still, for utter user convenience, getting your mail administrator to set the highest possible disk quota is going to win. It is, without doubt, the easiest way to back up e-mail from the user perspective, Entourage or not. Just leave it on the server, and buy more disk space.

That's a nice ideal, but it's not practical for a company without a serious storage infrastructure. So in the real world, we have to manage e-mail on our local machines. That includes backing it up. So in Entourage 2008, we take advantage of the new scriptability of the archive export feature. (Note: This new scriptability is why this article is aimed at Entourage 2008.)

The new scriptability in Entourage 2008 adds a new command: export archive. This command creates an Entourage archive via AppleScript. An Entourage archive is a package that contains e-mail, Contacts, Notes, Tasks, or Projects. It can contain all of those types, or just one type. The advantage of the archive is that each type of data is stored in a standards-based format. So e-mail in .mbox, Contacts in .vcard, Events, Notes, and Tasks as vCalendar files and so on.

One thing to keep in mind is that Entourage archives e-mail by the folder, not the message. So if you archive a folder called, say, "Archives" with a thousand messages, then you'll have one .mbox file with all those messages. That's not as granular as one file per message, but it's still a lot smaller and easier to deal with than the Entourage Database.

So now what? We still have to back up e-mail and Notes. What we want to do is back up the last 30 days of e-mail, or more correctly, any e-mail received in the last 30 days that isn't junk. I picked 30 days arbitrarily for this article — you can pick your own "recent e-mail" period based on your needs.

Categorizing incoming e-mail

I'm lazy. I don't like doing anything manually in Entourage if I can get Entourage to do the work for me. Luckily, Entourage's rules make avoiding monkey work really easy. We're going to want to set up an incoming rule that says "All e-mail that's not junk, put it in a category called "New e-mail". That's it. As we'll see, it's all we're going to need. The rule itself looks like this:

New e-mail rule

It's a simple rule. It says: "any e-mail that isn't junk e-mail, set the category to New e-mail." One thing to note, I have de-selected the "Do not apply other rules to messages that meet this criteria", because otherwise, no rules after this one will ever work. Don't worry if you have other rules that have a Set Category step. That just adds the other category to the message, it doesn't delete the other categories. Finally, when save this rule, you want to make sure it's the first rule in the list. Entourage executes rules as it gets to them, from the top of the list down. If you put this under another rule with the "Do not apply other rules..." checkbox enabled, then this one might not run.

Now we've created a rule that labels all new e-mail that's not already junk as "New e-mail." The next step is to archive it. We could do some things with rules that check how old the message is, but there's a simpler way. We let the archival period decide. If we want to archive every two weeks, then we know there's no "New e-mail" over two weeks old on the system. If we want it to run every month, same deal. Using categories in this way gives us a lot of flexibility. If you don't want a message archived, just remove that category from it. Because the archival process will also remove that category, you don't have to worry about things being archived twice.

Making the Archive Script

The next task is building the archive script. As this centers around the "Export Archive" command, it behooves us to examine that command in some detail. If you open the Microsoft Entourage scripting dictionary in Script Editor, or the AppleScript tool of your choice (I personally use Late Night Software's excellent Script Debugger), and select: Entourage Mail and News Suite: Commands: Export Archive, you should see the following:

Export Archive AppleScript command

The command is simple. You export the archive to an alias, or destination file, with a number of optional parameters. You can have the command delete the archived data when it has completed, you can specify that it only archive specific kinds of items, (such as only e-mail, only task items, or everything), you can choose to only archive items of a specific category or project, and you can tell it to always retain items that have additional categories.

In our case, and for simplicity's sake, we will archive everything, and delete nothing, as the only penalty is a somewhat bigger archive file, and the script may take longer to execute. We'll then go through and remove the "New e-mail" category from everything we just archived, so it can't be archived twice with this script. The script to do this is pretty simple, (at least without any major error checking), and is a combination of AppleScript and shell script. Note: for this to work you have to enable Spotlight in Entourage, as this script uses Spotlight info for part of its functionality, which also means you have to be running at least Mac OS X 10.4 for this script to work. So, let's look at the script, (Note: You can download the script here, there's no need to try to copy and paste this in yourself):

with timeout of 1200 seconds
    tell application "Microsoft Entourage"
        set theHomeDir to path to home folder
        set theCurrentDate to current date
        set theExportArchiveName to (year of theCurrentDate) & (month of theCurrentDate) & (day of theCurrentDate) & "_EntourageArchive" as text

        set theArchivePath to (theHomeDir as text) & (theExportArchiveName as text)

        export archive to theArchivePath only category "New Email" without delete

        set theCategoryID to (ID of item 1 of (get every category whose name is "New Email"))
        set theMessageList to my getMessagesInCategory(theCategoryID)
        if (length of theMessageList) > 0 then
            repeat with x in theMessageList
                set theNewCategories to {}
                set theMessage to incoming message id (contents of x)
                set theMessageCategories to category of theMessage
                repeat with y in theMessageCategories
                    if ID of y is not theCategoryID then
                        set the end of theNewCategories to contents of y
                    end if
                end repeat
                set category of theMessage to theNewCategories
            end repeat
        end if
    end tell
end timeout

on getMessagesInCategory(idCategory)

    tell application "Microsoft Entourage"
        set strCachePath to ((path to home folder) as string) & "Library:Caches:Metadata:Microsoft:Entourage:"

        set strYearVersion to "2008"
        set strIdentityCachePath to quoted form of POSIX path of (strCachePath & strYearVersion & ":" & (get name of current identity) & ":")

    end tell

    set strMDContentTypeQuery to "kMDItemContentTypeTree == *'virtual.message'"
    set strMDCategoryIDQuery to "com_microsoft_entourage_categories  == "
    set strAwkCommands to " | awk -F/ '{print $NF}'| awk -F.vRge '$2 ~ \"Message\" { printf($1); printf(\" \"); }'"

    set strMDCategoryIDQuery to strMDCategoryIDQuery & (idCategory as text)

    set strMDQuery to " '" & strMDContentTypeQuery & " && " & strMDCategoryIDQuery & "'"

    -- Build MDFind
    set strMDFind to "mdfind "
    set strMDFind to strMDFind & "-onlyin " & strIdentityCachePath & strMDQuery & strAwkCommands

    -- Run MDFind
    return every word of (do shell script (strMDFind))


end getMessagesInCategory

		

The script contains two parts, the main script, inside the "with timeout…" and "end timeout" lines, and then the getMessagesInCategory subroutine, or handler. The purpose of the getMessagesInCategory handler is to look in the Spotlight cache for every message whose category ID matches the category ID for our "New e-mail" category. Using the Mac OS X Spotlight commands and some basic shell text parsing commands is far faster than trying to do the same thing within Entourage via AppleScript.

The first line includes a timeout of 1200 seconds. Normally, if a specific command takes more than 60 seconds to complete, AppleScript assumes it died, and generates an error. However, if you're talking about a longish archive operation, you can easily take more than a minute, so we tell AppleScript to not consider anything timed out until 20 minutes has gone by. That should be enough for most folks — I know it's enough time for me to completely archive everything in my 1.4GB database on a first-generation MacBook Pro with 2GB of RAM.

Next, we have an AppleScript tell block which makes Microsoft Entourage the focus of the script. This allows us to use Entourage-specific AppleScript commands. Now we set up where we're going to save our archive, and what we're going to name it. In general, I prefer to use the root of my home directory, as it's an easy-to-find location, and that's what home directories are for: my files. So, we get the path to my home directory. Then we get the date. However, the normal format of the date ("Tuesday, 12 February, 2008 12:45:33") won't really work in a file name, so we're going to, in order, extract the year, the month, and the day of the date, jam them all together, and stick an _EntourageArchive on the end. That's what the set theExportArchiveName line does. When it runs, using the date information from the previous line, we get 2008February12_EntourageArchive for the file name. Now, we need to combine that archive name with the path to the home directory, and make them all text, as the export archive step wants a text path, not an alias. Put them all together and we get Aurora:Users:jwelch:2008February12_EntourageArchive.

We now know what our archive will be named, and where it will go. Next is the export. Our export archive line tells Entourage to export an archive to the path and filename we created in the previous line. We want to archive everything, (mail, events, etc.), but only those items with a category of "New e-mail", and we don't want to delete the items after the archive completes. Now, what happens if we leave that archive there? Well, other than wasting space, as long as this script doesn't execute multiple times in the same day, you'll just get multiple archive files. If you do execute the script multiple times in the same day, then it keeps the archive there, but replaces the existing contents of the archive with whatever new contents it finds when it runs.

Once this line has executed, we have an archive. Hooray! Now we want to find all the messages in the "New e-mail" category, and remove that category from them, so that they don't get archived again. That's what the next line does. It calls our getMessagesInCategory handler, and passes it the numerical ID of the "New e-mail" category. I'm not going to go through the handler line by line, but basically, it builds a Spotlight search query to look for all Entourage e-mail messages with the correct category ID, executes that query via the mdfind command, then parses the results to get the unique ID for every message in that category. It returns those IDs, if any, as a list of IDs back to the main part of the program.

The last part of the script is where we take a list of message IDs, and strip the "New e-mail" category from them. First, we make sure that there are really IDs in that list, so we make sure the length of the list, (or how many items it has) is greater than zero. If it's not, we have nothing to do, and we can stop the script. If the length of the list is greater than zero, we want to go through that list and remove the "New e-mail" category. First, we create a new empty list called theNewCategories, and set it to the empty list value, or {}.

Next, take the message with the first ID in the list of IDs we got from the handler, theMessageList, and set theMessage to represent it. (Using incoming message instead of message is an Entourage-ism.) Within an incoming message, the categories, (since there can be many) are stored as a list of categories, so we assign the list of categories for this message to theMessageCategories. We now have a list of at least one item, so we will now iterate through that category list, and grab all the categories that are not "New e-mail", and set the message's categories to that new list of categories. (Removing a specific item from a list can be tedious, and for our uses, this is the simplest way to do this.)

In Entourage, a Category has three components: The color, which is a set of RGB values, the ID, a unique integer number for that Category, and the Name, which is the human identifier, such as "New e-mail". Each time we iterate through our theMessageCategories list, y grabs all three of these values. Each trip through the list, we're going to compare whatever value y has for the ID of the category it's holding, to the category ID for "New e-mail", which we got near the beginning of the script in this line:

set theCategoryID to (ID of item 1 of (get every category whose name is "New Email"))

If they don't match, or the ID of y is not the same as theCategoryID, then we want to preserve that category, so we stick it on the end of theNewCategories list we defined a few lines ago. We keep doing that until we've gone through every category in theMessage. When the script hits the "New e-mail" category ID, it skips that, and goes to the next one in the list, and that category never makes it into theNewCategories. Once we've gone through every category in the list, we set the categories of theMessage to the categories in theNewCategories and go to the next message in the list. Once we've gone through every message that is in the "New e-mail" category, we're done, and none of those messages are in that category any more. So the next time the script runs, it will only grab e-mail that came in since the last time it was run.

But how do we do that? How do we make sure the script runs when we want it to? Schedules, of course!

With Entourage, schedules aren't just for e-mail retrieval. You can not only send/receive e-mail and news, but also delete e-mail, delete junk mail, run Excel's Auto Web Publish feature, launch an alias, open a file, or… run an AppleScript. So we save our script somewhere (I have a folder in my Documents folder called, unsurprisingly, "Scripts" for just this purpose) and create a new Schedule in Entourage, via Tools: Schedules…: New. For our example, we'll set the schedule to run on the last day of every month, forever. We choose the "recurring" option for When, and "Run AppleScript" for the Action.

The recurrence settings look like this:

Recurrence settings

And the finished schedule looks like this:

Finished schedule

Once that's done, the schedule will run on the last day of each month, and you'll get your new e-mail archived out to a package that will be far smaller than your Entourage Database. You do lose links with this method, although you can access links via AppleScript, so you could deal with that in the script if you like. For now I'll leave that as an extra challenge for the reader.

A note on Notes

But what about Notes? Well, there's good and bad news there. The good news is, you can add categories to Notes, so that with some modifications to the script, you could (relatively) easily archive Notes with the same script and schedule that we set up for archiving e-mails. In fact, you could modify this script to archive almost everything in Entourage. The bad news is that you have to assign categories to Notes manually, as you can't use a rule to do it a la e-mail messages. This is probably more of a minor inconvenience than anything, since most people don't create or modify Notes as frequently as they get new e-mail. Again, in the interests of space, I leave adding Notes to the script as an exercise for the reader.

Conclusion

I'm not going to pretend this article is a substitute for Microsoft's Mac Business Unit making it easier to back up Microsoft Entourage data. As with any workaround, it's not going to be a perfect solution. But, it is a solution, and one that can help you back up your Entourage data without having to deal with backing up what can become a massive single database file.

If nothing else, not only do you now know it's possible to do this, but you have the basics of how to do it, and some insight into other features of Microsoft Entourage that you might not have known about. I would like to thank the Entourage team of the Microsoft Mac BU for improving AppleScript in Entourage 2008 so that automatic archiving is now possible, and I would specifically like to thank Andy Ruff, the Lead Program Manager for Entourage, for the script help. Andy wrote the entire "getMessagesInCategory" handler, which made that script run a lot faster, and far better than my line of thinking would have.

[John C. Welch is a Unix/Open Systems Administrator for Kansas City Life Insurance and a long-time Mac IT pundit.]

Subscribe to the Apple @ Work Newsletter

Comments