We Mac users have long been proud that we don’t have to type in commands to use our machines to their fullest. That’s still true, but now that Mac OS X has opened up the Unix command line, we have all the tools necessary to take advantage of some powerful programming and scripting capabilities, so our Macs can do more of our odious work. And after all, isn’t that the whole point of a computer?
Programming is a lot like cooking–a category of activities that spans a broad spectrum, from the complexity of Iron Chef to the culinary travesty of microwaving a hot dog. Programming and cooking both can be done at many different skill levels, but even amateur chefs can make tasty food, just as beginning programmers can create useful scripts. And like learning to cook, learning to write scripts may appear daunting at first.
In this column, the first of a series examining OS X’s geekier innards, you’ll learn how to use the popular scripting language Perl, which is built into OS X. We’ll show you how to build a script that converts a Mac text file’s line endings to line endings that Unix can interpret. (This will enable the use of Unix-based text-processing tools on the file’s contents.) Although developing the script may seem like quite a bit of effort, the results will come in very handy if you ever need to convert multiple text files. (To learn more about specifying multiple files on the command line, see ” Take Command of Mac OS X,” How-to .) We hope that this example will serve as an appetizing taste of OS X’s rich flavors.
How to Write a Perl Script
First, you need to fire up a text editor such as OS X’s TextEdit, BBEdit, or if you’re already familiar with the command-line realm, one of the traditional Unix text editors such as pico or vi. Then jump right in by typing the following line:
#!/usr/bin/perl -w
This first line announces to the operating system that it’s dealing with a Perl script. The -w at the end of the line tells Perl that it should be particularly stringent about its interpretation of the script and display warnings if it encounters code that it considers suspect. Get yourself in the habit of adding -w to your scripts: doing so will often help you discover and fix scripting problems before they become a pain in the neck.
# linebreak characters: x0d – Mac, x0a – Unix
In this line, # indicates a comment for use by the author of the script or someone else reading it, so Perl will ignore the rest of the line. This comment explains the codes for the Mac and Unix line-break characters. Later, outside of the comment, the x notation will tell Perl that we’re using hexadecimal numbers to represent line endings.
{
Perl uses braces (sometimes called “curly brackets”) to group pieces of code. This outermost set of braces in this script is an optional visual indicator of where the script’s main part begins and ends.
foreach $inFileName (@ARGV) {
The script uses this foreach loop to work through all the names of files that the script will convert to Unix-readable text. Each individual file name is stored in a separate element of an array–a collection of variables–called @ARGV, which Perl creates.
This line translates to “Take a file name from the @ARGV array, put it in the variable called $inFileName, and run the code enclosed in the following set of braces; continue doing this until you run out of file names in @ARGV.” In Perl, all variables begin with the $ character except arrays (which are preceded by @, like @ARGV ) and hashes.
open (INTEXTFILE, $inFileName);
This line tells Perl to open the file, whose name it has plucked from @ARGV, and create a reference to it, which we’ve named INTEXTFILE. We’ll use this reference any time we have to read from this file; Perl wouldn’t know which file we were referring to if we didn’t name it explicitly.
open (OUTTEXTFILE, “>”. $inFileName . “.converted”);
This line creates the new file that will contain our converted text, and a reference called OUTTEXTFILE. The rest of the line contains the file’s name; the > character is shorthand for “create the file” and doesn’t actually affect the file’s name. The variable $inFileName contains the name of the original file, and the script will add .converted to the end of its name (so the original is not overwritten). The periods between the elements of the file name tell Perl to combine them into a single string of text.
$textFile = ;
This statement tells Perl to read the entire text file from INTEXTFILE and put it into the variable $textFile. Make sure the file isn’t too big (larger than about 100K); even though OS X has Unix-style virtual memory, you can’t assume that exorbitant amounts of memory are available.
Now for the Heavy Lifting
This line does all the real work in the script and is consequently rather dense:
$textFile =~ s/x0d/x0a/;
Perl has a built in search-and-replace function, represented by s. When invoking this function, you specify what it should search for and replace with; these two strings are bounded by / characters. We want to replace Mac line-ending characters with the Unix ones, so those are the two strings we’ve used in the search and replace fields. Using =~ tells Perl to perform the search and replace on the contents of $textFile and then put the result back into $textFile.
print OUTTEXTFILE $textFile;
Once the conversion’s done, use the print function to write the contents of $textFile to your output file.
close (INTEXTFILE); close (OUTTEXTFILE);
These final statements close the input and output files, to keep things tidy. Add two closing braces to end your bracketed chunks of code, and that’s it.
When you’re done, save this script into a file named “lineconvert.pl”–making sure to give this file Unix line endings. Then use the command line’s chmod command to set the script’s attributes, so the operating system knows it’s an executable script. To do this, type chmod 744 lineconvert.pl into the command line. (To learn more about the chmod command, enter man chmod at the command-line prompt.)
Using Your Perl Script
Say you have a Mac text file named “mac.txt” and you want its contents to have Unix line endings. You invoke your script by typing ./lineconvert.pl mac.txt in the command line, and it performs the conversion. You end up with a file called “mac.txt.converted,” with contents that have Unix line endings. Ta-da! Now you can modify the script to create a Unix-to-Mac version, for example.
One of the many features you can add to your script is improved error handling. This is particularly important because errors outside of your control do occur, and you don’t want them to destroy your data.
Onward
Our example introduces a few of the ingredients in the large and well-stocked kitchen that is Perl programming. And Perl has countless uses beyond changing text files: it can fill the gaps between databases and Web servers to help you create dynamic Web sites. You can even use it to catalog your MP3 archive.
To explore Perl further, browse CPAN, the Comprehensive Perl Archive Network ( www.cpan.org ). If you find packages that seem useful, you may want to get your hands on Learning Perl , second edition, by Randal L. Schwartz and Tom Christiansen (O’Reilly & Associates, 1997). Using it as your beginner’s cookbook, you’ll soon be concocting Perl scripts that save you time and drudgery.
A longtime MacPerl user, Contributing Editor STEPHAN SOMOGYI is no longer afraid of regular expressions.
Perls of Wisdom: Writing a Perl script in OS X’s command line can be easier than it looks if you follow our instructions.