Yes, I have said that. Repeatedly.

How does that square with your results?

There are two problems that I was referring to.

The first problem is that the Perl interpreter takes up a minimum amount of space for itself plus your program. In a tight memory environment this can be an issue - particularly if you have lots of interpreters running concurrently. As happens with mod-perl.

The second problem is that Perl data structures have a lot of fixed overhead that you cannot remove. For instance a string takes up 28 bytes plus whatever the data in the string was. If you've used the string as string and a number, it takes up more. An array takes up 28 bytes. Etc. (You can use Devel::Size to figure out these numbers.)

But in your case your data far outweighs the overhead of the interpreter. Also you had 1 million lines of average size 1200 bytes each. With the string overhead, each line takes 1228 bytes to store. Thus your memory overhead is negligible. When you break those up into smaller pieces, the overhead becomes bigger. If you then tried to save space by trimming off blank space created by the fixed-width format, you'd save space, but you'll find that the overhead becomes a lot bigger proportionately.

Basically if you want to have lots of copies of Perl running, or a data structure with tons of small data elements, Perl burns memory quickly. If you have few Perl interpreters and only big data elements, Perl's memory overhead won't seem bad at all. And since Perl doesn't do anything that is particularly stupid internally, its performance is generally reasonable. But if you start working with data byte by byte, you'll find that the overhead of assigning and unassigning those big Perl data structures kills your performance.

Incidentally the picture has improved in recent Perls. The 5.8 series, as you discovered, enabled copy on write in various circumstances. Which can save a lot of memory for some folks. And the upcoming 5.10 series found ways to trim a lot of basic data structures.

But still there are a lot of cases where people will be left asking how Perl managed to waste so much memory.

Cheers,
Ben