All page numbers refer to the second edition of Code Complete. As do all section numbers.
First let me start by saying that this book - in either edition - is probably the single programming book that I most often recommend to people who are trying to get better at Perl. I'm sure that being told things like this no longer surprises you, but hopefully you are pleased by it nonetheless.
Secondly I'm still reading the second edition, and have skimmed ahead to stuff that interested me. When I realized that I was slowing my reading and skimming ahead for things that I wanted to give feedback on I realized that it is best to send you feedback now so that I could concentrate on reading. I might send you more later if you show interest and I find more things to comment on. (Hopefully this gets me on your list of people that you want to have reviewing the third edition in a few years... :-)
Here we go.
----
p 63: While discussing the Sapir-Whorf hypothesis, it may be worth mentioning "the Blub paradox" (see [link|http://www.paulgraham.com/avg.html|http://www.paulgraham.com/avg.html]). Yes, I know that if you mentioned everything that you want to the book would be twice as long and half as good. When I say "may be worth", please emphasize "may".
----
p 64: C++ has not been able to honestly claim C compatibility for a very long time. At the start it could, but the two long since parted paths. I'm not sure that this is worth correcting though.
----
p 65: Java and JavaScript are not "loosely related" except by name and now-dead marketing dreams. The ONLY relationship between Java and JavaScript is that they are C-derived languages with garbage collection. Virtually every other major design decision that you can name was made differently between them. (Eg static vs dynamic typing, the type of object system, the native datatypes...)
The similarity in names is entirely a marketing decision. A temporary alliance between Sun and Netscape lead to "LiveScript" getting an interface to Java and the name "JavaScript". Microsoft refused to acknowledge this and called their implementation JScript. (The world ignored Microsoft.) The official standard for the language sidestepped the fight by calling it ECMAScript. (The world has ignored the official standard as well.)
----
p 65: Perl is NOT an acronym. I'd suggest replacing that myth with a comment about CPAN. For instance, "Perl's Comprehensive Perl Archive Network (CPAN) claims to be the largest repository of freely available components in any language." That Perl is not an acronym can be verified from `perldoc perlfaq1`:
What\ufffds the difference between "perl" and "Perl"?
One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom\ufffds quip that "Nothing but perl can parse Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and "Python and Perl" look OK, while "awk and Perl" and "Python and perl" do not. But never write "PERL", because perl is not an acronym, apocryphal folklore and post-facto expansions notwithstanding.
----
p 61-66: On the topic of language choice there is a point that I keep on wishing that you'd make about dynamic vs static languages. You've noted elsewhere (eg section 3.2) that the ideal amount of process goes up as your project becomes bigger and more life-critical. That extends to language choice. From anecdotal evidence there seems to be a development sweet spot for teams of up to about 6 or so. Beyond there inter-team communication becomes a major concern. One way to address that barrier is to add lots of support to your language to make it easier for large groups to cooperate. The other is to give teams of that critical size tools to be as productive as possible. People in mainstream languages (C++, Java, C#...) tend to go the first route. It shows as extra process in the form of various declarations, static typechecking, and the like. People who like dynamic languages (Lisp, Smalltalk, Perl, Python...) often go the second. It shows as less process, and more flexibility to settle into a highly customized style.
Note that the two sets of choices have a lot of implicit conflict. Adding process to your language reduces how much a small team can do and reduces what you can accomplish at that "sweet spot". Adding flexibility to your language results in less uniformity, and therefore more trouble getting larger groups to cooperate well. People in these two camps tend strongly to talk past each other.
Depending on the task at hand, either approach may or may not work well. I know that there are plenty of successful Perl projects maintained by experienced small teams with 50,000-150,000 lines of code. (Should you count the CPAN modules that also got installed in lines of code?) Using the estimate that Perl code is 6x as dense as C, that's similar to 300,000 to 900,000 line C projects, only done in less time by fewer people. (OK, it runs more slowly and it takes more memory.) Conversely if something would take more than a million lines of C, Perl is likely not the right language to choose.
----
p 86: Your definition of classes vs objects ("An object is any specific entity that exists in your program at run time. A class is the static thing you look at in the program listing.") is highly biased towards static languages. It fails badly in general. Most dynamic languages offer some facility for metaprogramming, which allows you to do things like create new classes on the fly (without resorting to eval). Thus classes need not appear in the program listing. Some very pure OO languages like Smalltalk and Ruby confound attempts to distinguish the two by making classes into objects of class Class. Prototype based languages like Self and JavaScript confuse things in a different way - they do not HAVE classes. (Alternately you can think of it as every object potentially being its own class.)
I'd suggest something like, "Objects are things that your program manipulates. Classes are the types of things that your program can create." And then hope that nobody asks you what it means to be a "type of thing".
----
p 106: "Design for Test" is not only a good thought process! This paragraph would be a GREAT place to mention "test-first development". Cross-referencing section 22.2 would do it.
----
p 173: "Shortly before finishing this book..." Edition 1 or 2? Also section 7.4 should cross-reference section 19.6 and vice versa.
----
p 181: "Modern languages such as C++, Java, and Visual Basic support both functions and procedures." And modern languages such as Perl, Python and JavaScript see no point in drawing any distinction between functions and procedures. I'd suggest replacing "Modern" with "Many".
----
p 188: I'd like to see you explicitly point out here the dangers of APIs that make it easy to confuse data and metadata. If you're using them, be paranoid about data that can be interpreted as metadata. If you have the choice between APIs, choose the one that makes it harder (preferably impossible) to confuse data and metadata.
To see my point, look at how many of the ways that you list for data to attack your system are cases where metadata and data get confused. Oh, and looking at that list, I'd suggest adding yet another, "format string vulnerabilities". (See [link|http://www.cs.ucsb.edu/~jzhou/security/formats-teso.html|http://www.cs.ucsb.e...formats-teso.html].)
----
p 196: I'd emphasis the security warning on "Display an error message" option far more. This is a big failing in a lot of web development environments - any errors are sent off with full debugging information included. This is great for debugging your attack! For example this is how people escalate SQL injection attacks from, "Entering ' here causes an error?" to "Thanks for your credit card database!"
The best practice with this approach is to have a conditional on the error handling that makes the informative error messages only appear in development.
----
p 252: An interesting point for your amusement (it is not worth making in the book) - dynamic languages take the comment, "In general, the later you make the binding time, the more flexibility you build into your code." to a logical extreme. This section on coding choices parallels the consequences of language design choices.
----
p 264: Many languages (particularly dynamic ones) have built-in associative arrays of some kind, usually hashes. You might want to give advice on naming them. The advice that I follow (per Larry Wall's suggestion in _Programming Perl_) is that a hash lookup should be read as "of". Thus if your hash stores the age of a person, it should be named %age or (if confusion is possible) something like %ageByPersonName. (I'm using your preferred capitalization here...)
----
p 355: Many languages have another conditional: unless. The important thing that needs saying about unless is that people DO NOT successfully perform De Morgan's laws in their head while debugging, so you should never use unless with any complex conditionals. Unfortunately experience tells me that very few programmers can be convinced of this until they've personally been bitten by it while trying to debug something. Here is the coding horror to avoid:
\nunless (conditionA() or conditionB()) {\ndoSomething();\n}\n
Turn that into an if instead no matter how much messier it looks. Later you'll be able to figure out when it is executed:
\nif ( !conditionA() and !conditionB() ) {\ndoSomething();\n}\n
----
p 369: The paper "Loop Exits and Structured Programming: reopening the debate" definitely deserves a mention in this section, and lists several other studies of interest. You can find the raw text at [link|http://www.cis.temple.edu/~ingargio/cis71/software/roberts/documents/loopexit.txt|http://www.cis.templ...ents/loopexit.txt] and [link|http://portal.acm.org/citation.cfm?id=199691.199815|http://portal.acm.or...?id=199691.199815] gives a place where you can cite it.
It may be worth noting that several languages (eg Perl, Ruby) spell "break" as "last".
----
p 372: When talking about foreach, it is worth noting that in code that uses traditional for loops, off-by-one errors are one of the top sources of bugs. Being able to virtually eliminate this class of errors is a huge win and should be emphasized.
----
p 385: You should cross-reference sections 7.4 and 19.6 here.
----
p 399: There is a critical point that I really wish you had here. Knuth mentions in his 1974 article that any algorithm that can be produced with goto can be produced without goto but with named loop control (next and break with loop labels). This is much stronger than Bohm Jacopini's 1966 result - that one said that you could accomplish the same tasks with just a few operations. This one says that you can accomplish the same tasks _with the same efficiency_ if you add named loop control to those operations. I forget who he cited for this and don't have the paper available at the moment, but it should be easy to track down.
This point is far more important today than it was then, because we now have languages (eg Perl, Java) that implement named loop control. It is now a practical alternative to using goto for algorithmic reasons. Alternately if you're working in a language without named loop control (eg C++), this insight encapsulates the situations in which you could reasonably want a goto for algorithmic reasons - the goto is needed to "program into the language" to provide named loop control. Modern exception handling techniques provide a goto-less alternative for the other major set of cases which Knuth and Dijkstra cited goto being useful for.
An anecdotal data point: as a very experienced Perl programmer, I have only seen two uses of a traditional goto in Perl which I consider justified. (Perl has two forms of goto. The other replaces one function call with another - think "jmp".) One was in Damian Conway's Switch.pm where he was creating a control structure that had to cooperate with all other control structures that it might be in contact with. (In particular he wanted to support tricks like Duff's device.) The only way he found to do it was with a goto. The other is from the utility s2p, and is why Perl has goto. s2p translates code from sed to Perl, and it translates "goto" to "goto" rather than trying to figure out how to rewrite things without goto on the fly.
Incidentally at [link|http://www.perlmonks.org/index.pl?node_id=126056|http://www.perlmonks...pl?node_id=126056] Damian Conway demonstrates a C++ macro to provide named loop control in that language. You might want to mention that example.
----
p 409: "20 years ago" was right when the sentence appeared in the first edition. For the second edition it should have been "30 years ago". It'll probably need to be "40 years ago" when the third edition appears. :-P
----
p 413: If you choose to acknowledge the existence of languages with a native hash data type (formally known as "associative arrays"), this section would be a good place to do it. Being able to use arbitrary key/value pairs directly makes direct access tables a lot more natural.
----
p 458: I've done some thinking about this section, and how it relates to section 7.4's comment about how this fits with OO programming. It seems to me that when you're using polymorphism to full effect, every method call has an implicit decision embedded in it. The result is that the complexity of an OO routine is potentially far higher than McCabe's measure would indicate.
Of course I don't have any studies backing this idea up. But it does explain anecdotal observations that I've made. I discussed this further with some discussion at [link|http://www.perlmonks.org/?node_id=298755|http://www.perlmonks.org/?node_id=298755]. You might or might not find that of interest.
----
p 559: The argument that you present against debuggers is a straw man version. The actual argument that I've seen (and made) against debuggers is far more subtle - it is that they help you solve bugs but hide systemic problems. As I put it once:
Note that I didn't say that debuggers are useless for debugging. I said I am constantly amazed at how many good programmers don't like them for that purpose but have plenty of other uses. It should be obvious that good programmers who have found other uses for a debugger actually understand what they are and how to use them. Ignorance is not the reason for not choosing to use them.
No, the reason is more subtle. The limit to our ability to develop interesting things lies in our ability to
comprehend. Debuggers focus intention at the wrong point. They let us blindly trace through code to find the symptoms of problems. That doesn't help us one bit to notice more subtle symptoms of structural problems in the code. It doesn't help one bit in giving us encouragement to be more careful. It doesn't encourage reflection on the process. It doesn't give us reason to think about the poor schmuck that has to handle our code afterwards.
In short it encourages missing the forest for the trees. But in the end projects are felled by overgrown thickets, not by single trees.
The thread that that's from starts at [link|http://www.perlmonks.org/index.pl?node_id=48495|http://www.perlmonks....pl?node_id=48495]. The Linux kernel summaries have moved since that was posted. The link now for the discussion there that I refer to in the middle is [link|http://www.kerneltraffic.org/kernel-traffic/kt20001002_87.html#4|http://www.kerneltra...0001002_87.html#4]. As far as I am concerned, the most fascinating datapoint I've seen from discussions of this topic is IBM's experience with debuggers and RAS as described at [link|http://seclists.org/linux-kernel/2000/Sep/2928.html|http://seclists.org/...000/Sep/2928.html]. I strongly suspect that their experience would hold true for things other than operating systems - good information about the system state when things went wrong is more valuable than a debugger.
Two related notes. In Perl it is easy to set up error reporting so that all error messages include a full stack backtrace. It is amazing how often that simple piece of context makes debugging problems trivial.
The other one is that in my experience the utility of debuggers drops rapidly as code becomes more dynamic. When you have straightline code, it tends to be easy to track things down in a debugger. But, for instance, if you dispatch to an anonymous function using a hash lookup, the point where the decision is made has no relation to the point where the decision is executed.
----
That's all that I have the energy to write right now. I don't know whether you'll actually read it. Or what you'll agree/disagree with. But evem if you don't read it and disagree with what little you did read, in writing it down I've clarified my thinking. So writing this was probably worthwhile.
Regards,
Ben