IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Some corrections, notes, etc on Code Complete 2
The following is the text of an email that I wrote to Steve McConnell. I figure that if I'm going to put all of the energy that I did into writing it down, I might as well get some feedback on my views from people here.

All page numbers refer to the second edition of Code Complete. As do all section numbers.


First let me start by saying that this book - in either edition - is probably the single programming book that I most often recommend to people who are trying to get better at Perl. I'm sure that being told things like this no longer surprises you, but hopefully you are pleased by it nonetheless.

Secondly I'm still reading the second edition, and have skimmed ahead to stuff that interested me. When I realized that I was slowing my reading and skimming ahead for things that I wanted to give feedback on I realized that it is best to send you feedback now so that I could concentrate on reading. I might send you more later if you show interest and I find more things to comment on. (Hopefully this gets me on your list of people that you want to have reviewing the third edition in a few years... :-)

Here we go.

----
p 63: While discussing the Sapir-Whorf hypothesis, it may be worth mentioning "the Blub paradox" (see [link|http://www.paulgraham.com/avg.html|http://www.paulgraham.com/avg.html]). Yes, I know that if you mentioned everything that you want to the book would be twice as long and half as good. When I say "may be worth", please emphasize "may".

----
p 64: C++ has not been able to honestly claim C compatibility for a very long time. At the start it could, but the two long since parted paths. I'm not sure that this is worth correcting though.

----
p 65: Java and JavaScript are not "loosely related" except by name and now-dead marketing dreams. The ONLY relationship between Java and JavaScript is that they are C-derived languages with garbage collection. Virtually every other major design decision that you can name was made differently between them. (Eg static vs dynamic typing, the type of object system, the native datatypes...)

The similarity in names is entirely a marketing decision. A temporary alliance between Sun and Netscape lead to "LiveScript" getting an interface to Java and the name "JavaScript". Microsoft refused to acknowledge this and called their implementation JScript. (The world ignored Microsoft.) The official standard for the language sidestepped the fight by calling it ECMAScript. (The world has ignored the official standard as well.)

----
p 65: Perl is NOT an acronym. I'd suggest replacing that myth with a comment about CPAN. For instance, "Perl's Comprehensive Perl Archive Network (CPAN) claims to be the largest repository of freely available components in any language." That Perl is not an acronym can be verified from `perldoc perlfaq1`:


What\ufffds the difference between "perl" and "Perl"?

One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom\ufffds quip that "Nothing but perl can parse Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and "Python and Perl" look OK, while "awk and Perl" and "Python and perl" do not. But never write "PERL", because perl is not an acronym, apocryphal folklore and post-facto expansions notwithstanding.

----
p 61-66: On the topic of language choice there is a point that I keep on wishing that you'd make about dynamic vs static languages. You've noted elsewhere (eg section 3.2) that the ideal amount of process goes up as your project becomes bigger and more life-critical. That extends to language choice. From anecdotal evidence there seems to be a development sweet spot for teams of up to about 6 or so. Beyond there inter-team communication becomes a major concern. One way to address that barrier is to add lots of support to your language to make it easier for large groups to cooperate. The other is to give teams of that critical size tools to be as productive as possible. People in mainstream languages (C++, Java, C#...) tend to go the first route. It shows as extra process in the form of various declarations, static typechecking, and the like. People who like dynamic languages (Lisp, Smalltalk, Perl, Python...) often go the second. It shows as less process, and more flexibility to settle into a highly customized style.

Note that the two sets of choices have a lot of implicit conflict. Adding process to your language reduces how much a small team can do and reduces what you can accomplish at that "sweet spot". Adding flexibility to your language results in less uniformity, and therefore more trouble getting larger groups to cooperate well. People in these two camps tend strongly to talk past each other.

Depending on the task at hand, either approach may or may not work well. I know that there are plenty of successful Perl projects maintained by experienced small teams with 50,000-150,000 lines of code. (Should you count the CPAN modules that also got installed in lines of code?) Using the estimate that Perl code is 6x as dense as C, that's similar to 300,000 to 900,000 line C projects, only done in less time by fewer people. (OK, it runs more slowly and it takes more memory.) Conversely if something would take more than a million lines of C, Perl is likely not the right language to choose.

----
p 86: Your definition of classes vs objects ("An object is any specific entity that exists in your program at run time. A class is the static thing you look at in the program listing.") is highly biased towards static languages. It fails badly in general. Most dynamic languages offer some facility for metaprogramming, which allows you to do things like create new classes on the fly (without resorting to eval). Thus classes need not appear in the program listing. Some very pure OO languages like Smalltalk and Ruby confound attempts to distinguish the two by making classes into objects of class Class. Prototype based languages like Self and JavaScript confuse things in a different way - they do not HAVE classes. (Alternately you can think of it as every object potentially being its own class.)

I'd suggest something like, "Objects are things that your program manipulates. Classes are the types of things that your program can create." And then hope that nobody asks you what it means to be a "type of thing".

----
p 106: "Design for Test" is not only a good thought process! This paragraph would be a GREAT place to mention "test-first development". Cross-referencing section 22.2 would do it.

----
p 173: "Shortly before finishing this book..." Edition 1 or 2? Also section 7.4 should cross-reference section 19.6 and vice versa.

----
p 181: "Modern languages such as C++, Java, and Visual Basic support both functions and procedures." And modern languages such as Perl, Python and JavaScript see no point in drawing any distinction between functions and procedures. I'd suggest replacing "Modern" with "Many".

----
p 188: I'd like to see you explicitly point out here the dangers of APIs that make it easy to confuse data and metadata. If you're using them, be paranoid about data that can be interpreted as metadata. If you have the choice between APIs, choose the one that makes it harder (preferably impossible) to confuse data and metadata.

To see my point, look at how many of the ways that you list for data to attack your system are cases where metadata and data get confused. Oh, and looking at that list, I'd suggest adding yet another, "format string vulnerabilities". (See [link|http://www.cs.ucsb.edu/~jzhou/security/formats-teso.html|http://www.cs.ucsb.e...formats-teso.html].)

----
p 196: I'd emphasis the security warning on "Display an error message" option far more. This is a big failing in a lot of web development environments - any errors are sent off with full debugging information included. This is great for debugging your attack! For example this is how people escalate SQL injection attacks from, "Entering ' here causes an error?" to "Thanks for your credit card database!"

The best practice with this approach is to have a conditional on the error handling that makes the informative error messages only appear in development.

----
p 252: An interesting point for your amusement (it is not worth making in the book) - dynamic languages take the comment, "In general, the later you make the binding time, the more flexibility you build into your code." to a logical extreme. This section on coding choices parallels the consequences of language design choices.

----
p 264: Many languages (particularly dynamic ones) have built-in associative arrays of some kind, usually hashes. You might want to give advice on naming them. The advice that I follow (per Larry Wall's suggestion in _Programming Perl_) is that a hash lookup should be read as "of". Thus if your hash stores the age of a person, it should be named %age or (if confusion is possible) something like %ageByPersonName. (I'm using your preferred capitalization here...)

----
p 355: Many languages have another conditional: unless. The important thing that needs saying about unless is that people DO NOT successfully perform De Morgan's laws in their head while debugging, so you should never use unless with any complex conditionals. Unfortunately experience tells me that very few programmers can be convinced of this until they've personally been bitten by it while trying to debug something. Here is the coding horror to avoid:
\nunless (conditionA() or conditionB()) {\ndoSomething();\n}\n

Turn that into an if instead no matter how much messier it looks. Later you'll be able to figure out when it is executed:
\nif ( !conditionA() and !conditionB() ) {\ndoSomething();\n}\n

----
p 369: The paper "Loop Exits and Structured Programming: reopening the debate" definitely deserves a mention in this section, and lists several other studies of interest. You can find the raw text at [link|http://www.cis.temple.edu/~ingargio/cis71/software/roberts/documents/loopexit.txt|http://www.cis.templ...ents/loopexit.txt] and [link|http://portal.acm.org/citation.cfm?id=199691.199815|http://portal.acm.or...?id=199691.199815] gives a place where you can cite it.

It may be worth noting that several languages (eg Perl, Ruby) spell "break" as "last".

----
p 372: When talking about foreach, it is worth noting that in code that uses traditional for loops, off-by-one errors are one of the top sources of bugs. Being able to virtually eliminate this class of errors is a huge win and should be emphasized.

----
p 385: You should cross-reference sections 7.4 and 19.6 here.

----
p 399: There is a critical point that I really wish you had here. Knuth mentions in his 1974 article that any algorithm that can be produced with goto can be produced without goto but with named loop control (next and break with loop labels). This is much stronger than Bohm Jacopini's 1966 result - that one said that you could accomplish the same tasks with just a few operations. This one says that you can accomplish the same tasks _with the same efficiency_ if you add named loop control to those operations. I forget who he cited for this and don't have the paper available at the moment, but it should be easy to track down.

This point is far more important today than it was then, because we now have languages (eg Perl, Java) that implement named loop control. It is now a practical alternative to using goto for algorithmic reasons. Alternately if you're working in a language without named loop control (eg C++), this insight encapsulates the situations in which you could reasonably want a goto for algorithmic reasons - the goto is needed to "program into the language" to provide named loop control. Modern exception handling techniques provide a goto-less alternative for the other major set of cases which Knuth and Dijkstra cited goto being useful for.

An anecdotal data point: as a very experienced Perl programmer, I have only seen two uses of a traditional goto in Perl which I consider justified. (Perl has two forms of goto. The other replaces one function call with another - think "jmp".) One was in Damian Conway's Switch.pm where he was creating a control structure that had to cooperate with all other control structures that it might be in contact with. (In particular he wanted to support tricks like Duff's device.) The only way he found to do it was with a goto. The other is from the utility s2p, and is why Perl has goto. s2p translates code from sed to Perl, and it translates "goto" to "goto" rather than trying to figure out how to rewrite things without goto on the fly.

Incidentally at [link|http://www.perlmonks.org/index.pl?node_id=126056|http://www.perlmonks...pl?node_id=126056] Damian Conway demonstrates a C++ macro to provide named loop control in that language. You might want to mention that example.

----
p 409: "20 years ago" was right when the sentence appeared in the first edition. For the second edition it should have been "30 years ago". It'll probably need to be "40 years ago" when the third edition appears. :-P

----
p 413: If you choose to acknowledge the existence of languages with a native hash data type (formally known as "associative arrays"), this section would be a good place to do it. Being able to use arbitrary key/value pairs directly makes direct access tables a lot more natural.

----
p 458: I've done some thinking about this section, and how it relates to section 7.4's comment about how this fits with OO programming. It seems to me that when you're using polymorphism to full effect, every method call has an implicit decision embedded in it. The result is that the complexity of an OO routine is potentially far higher than McCabe's measure would indicate.

Of course I don't have any studies backing this idea up. But it does explain anecdotal observations that I've made. I discussed this further with some discussion at [link|http://www.perlmonks.org/?node_id=298755|http://www.perlmonks.org/?node_id=298755]. You might or might not find that of interest.

----
p 559: The argument that you present against debuggers is a straw man version. The actual argument that I've seen (and made) against debuggers is far more subtle - it is that they help you solve bugs but hide systemic problems. As I put it once:

Note that I didn't say that debuggers are useless for debugging. I said I am constantly amazed at how many good programmers don't like them for that purpose but have plenty of other uses. It should be obvious that good programmers who have found other uses for a debugger actually understand what they are and how to use them. Ignorance is not the reason for not choosing to use them.

No, the reason is more subtle. The limit to our ability to develop interesting things lies in our ability to
comprehend. Debuggers focus intention at the wrong point. They let us blindly trace through code to find the symptoms of problems. That doesn't help us one bit to notice more subtle symptoms of structural problems in the code. It doesn't help one bit in giving us encouragement to be more careful. It doesn't encourage reflection on the process. It doesn't give us reason to think about the poor schmuck that has to handle our code afterwards.

In short it encourages missing the forest for the trees. But in the end projects are felled by overgrown thickets, not by single trees.

The thread that that's from starts at [link|http://www.perlmonks.org/index.pl?node_id=48495|http://www.perlmonks....pl?node_id=48495]. The Linux kernel summaries have moved since that was posted. The link now for the discussion there that I refer to in the middle is [link|http://www.kerneltraffic.org/kernel-traffic/kt20001002_87.html#4|http://www.kerneltra...0001002_87.html#4]. As far as I am concerned, the most fascinating datapoint I've seen from discussions of this topic is IBM's experience with debuggers and RAS as described at [link|http://seclists.org/linux-kernel/2000/Sep/2928.html|http://seclists.org/...000/Sep/2928.html]. I strongly suspect that their experience would hold true for things other than operating systems - good information about the system state when things went wrong is more valuable than a debugger.

Two related notes. In Perl it is easy to set up error reporting so that all error messages include a full stack backtrace. It is amazing how often that simple piece of context makes debugging problems trivial.

The other one is that in my experience the utility of debuggers drops rapidly as code becomes more dynamic. When you have straightline code, it tends to be easy to track things down in a debugger. But, for instance, if you dispatch to an anonymous function using a hash lookup, the point where the decision is made has no relation to the point where the decision is executed.

----

That's all that I have the energy to write right now. I don't know whether you'll actually read it. Or what you'll agree/disagree with. But evem if you don't read it and disagree with what little you did read, in writing it down I've clarified my thinking. So writing this was probably worthwhile.

Regards,
Ben
To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act
- [link|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20|Nebraska Supreme Court]
Expand Edited by ben_tilly Sept. 14, 2004, 11:20:51 AM EDT
New EMACScript?
Scott, your mental mind-rays are working...


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Blog]
New Two additional typos.
Trev caught ECMAScript.

"II've" and "if (!condition!() ..." are two typos I saw.

It's an interesting and well-presented writeup. He should be appreciative. Thanks for posting it here.

[edit: Typo in Subject!]

Cheers,
Scott.
Expand Edited by Another Scott Sept. 14, 2004, 11:23:53 AM EDT
New Thanks to both of you
And yeah, I meant ECMAScript.

You know that article about still being able to read words if the letters are all right, and the first and last letters are in the same place? Well that applies to how I read and (don't) remember ECMAScript...

Cheers,
Ben
To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act
- [link|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20|Nebraska Supreme Court]
New I know a little about that. It's 2 Two too typos in one! :)
New Pathologically Eclectic Rubbish Lister?
Practical Extraction and Report Language?


Of course, this might fall under grep : global regular expression printer, which no one capitalizes either.
New Small Ruby correction ...
... in an excellent posting. Thanks Ben.

It may be worth noting that several languages (eg Perl, Ruby) spell "break" as "last".

Actually Ruby spells "last" as "break".
--
-- Jim Weirich jim@weirichhouse.org [link|http://onestepback.org|http://onestepback.org]
---------------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)
New D'oh
I was sure that Perl isn't alone in spelling "break" as "last", am I wrong about that?

From a quick Google it seems that I was. Or at least no other language using last has achieved anywhere near the popularity of Perl...

Cheers,
Ben
To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act
- [link|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20|Nebraska Supreme Court]
New Perhaps you were thinking of 'next'
Perl and Ruby share 'next' with similar semantics, while C-based languages tend to use 'continue' (and Perl's 'continue' is something different).
--
-- Jim Weirich jim@weirichhouse.org [link|http://onestepback.org|http://onestepback.org]
---------------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)
New No, I just thought that Ruby was more similar than it is
I use Ruby infrequently enough that I tend to be trapped by niggling issues like this whenever I try to use it.

But I get back into the mindset in an hour or so.

Cheers,
Ben
To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act
- [link|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20|Nebraska Supreme Court]
New Re: Some corrections, notes, etc on Code Complete 2

Depending on the task at hand, either approach may or may not work well. I know that there are plenty of successful Perl projects maintained by experienced small teams with 50,000-150,000 lines of code. (Should you count the CPAN modules that also got installed in lines of code?) Using the estimate that Perl code is 6x as dense as C, that's similar to 300,000 to 900,000 line C projects, only done in less time by fewer people. (OK, it runs more slowly and it takes more memory.) Conversely if something would take more than a million lines of C, Perl is likely not the right language to choose.


This leave me wondering, does this imply if something takes more than
150,000 lines of code in Perl, then perl is likely not the right choice.
If yes why?

And does this imply if something is taking more than 900,000 lines of C, I
should start looking for a different language, but not Perl.

I personally think, the answer to the lines of code can be a lot more
structured, for example, if you have more then N lines of code
use namespace, or package (Tcl vocabulary)
or consider using an object system

If c doesn't have those, then c fails
If perl does, then perl will succeed.

Of course I can imagine, other language features can affect the choice
when faced with the problem that too much code is being written

But I believe this porblem is very structured, and can be answered
very specifically, and languages performance and ability to face this
problem can be determined more precisely.

No need for estimates here, we don't have to guess.
New Human dynamics have no precise answers
Depending on the task at hand, either approach may or may not work well. I know that there are plenty of successful Perl projects maintained by experienced small teams with 50,000-150,000 lines of code. (Should you count the CPAN modules that also got installed in lines of code?) Using the estimate that Perl code is 6x as dense as C, that's similar to 300,000 to 900,000 line C projects, only done in less time by fewer people. (OK, it runs more slowly and it takes more memory.) Conversely if something would take more than a million lines of C, Perl is likely not the right language to choose.


This leave me wondering, does this imply if something takes more than 150,000 lines of code in Perl, then perl is likely not the right choice. If yes why?

Yes. Here's why. To handle a large amount of code, eventually you need a large number of people. The question then becomes how well that team will coordinate their efforts.
And does this imply if something is taking more than 900,000 lines of C, I should start looking for a different language, but not Perl.

If you're already using Perl at that point, you probably don't want to switch and rewrite. But if you're planning ahead, I might not choose Perl for this situation. (Others may disagree.)
I personally think, the answer to the lines of code can be a lot more structured, for example, if you have more then N lines of code use namespace, or package (Tcl vocabulary) or consider using an object system

I submit that you're thinking that because you're thinking of this as a technical issue.

It is not.

It has to do with how well a group of people functions together. And this depends on a lot of "soft" criteria, such as the structure of your organization and the mix of personalities that you have.

Technical features interact with human dynamics. Sometimes in complex ways. Checking off a list of features does not give you a good guide to figuring out how things will work out in reality.
If c doesn't have those, then c fails. If perl does, then perl will succeed.

If life was so simple then many discussions could be avoided or greatly streamlined. If only.
Of course I can imagine, other language features can affect the choice when faced with the problem that too much code is being written

But "too much code is being written" is not the problem to solve. Nor does dding language features necessarily make you better at handling large problems.

The real problem to solve is getting humans to accomplish the desired task. The challenge is how to coordinate those humans into a successful group. And language features do not have a simple relationship with this problem.

One strategy is to say that we know that small groups are easier to coordinate than large ones. So make small groups as effective as possible. Perl lends itself well to this strategy.

Another strategy is to say that large groups can accomplish more, and there are things we can do to make large groups work together smoothly. Java is far better than Perl at the latter strategy.

Why is Perl better at the first strategy than the last? One reason is an excess of features. Perl has every feature you named and more. This allows people to pick very efficient solutions. However when 20 people each does things how they think best, that group will be impossible to coordinate as a team. Restricting people's choices is good for teamwork.
But I believe this porblem is very structured, and can be answered very specifically, and languages performance and ability to face this problem can be determined more precisely.

No need for estimates here, we don't have to guess.

What do you base this belief on? More to the point, does your thinking take into account the fact that this is really a problem of human dynamics, not language capabilities?
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New What do you base this belief on?
C'mon Ben, wheres your memory?
Go read some previous posts from this guy.
All academic, mostly technical, zero real world experience.

Sound very good for about 30 seconds and then the bullshit meter goes to red.
New My memory is fine
Ask Scott or Mike.

Furthermore I didn't even need to look at past posts to see this. His lack of experience is indicated by the fact that he missed that I was talking about team dynamics, and those do not admit to precise answers.

However people can learn from others what most wind up learning by experience. And I've never minded being a little patient from time to time. So I decided to put some energy out clarifying my thoughts.

Cheers,
Ben
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
New He needs real world experience
He is pure ivory tower, and that tower is pure tech.
I wonder if he's read "Death March".
New No disagreement
About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsger W. Dijkstra
     Some corrections, notes, etc on Code Complete 2 - (ben_tilly) - (15)
         EMACScript? - (pwhysall)
         Two additional typos. - (Another Scott) - (2)
             Thanks to both of you - (ben_tilly) - (1)
                 I know a little about that. It's 2 Two too typos in one! :) -NT - (Another Scott)
         Pathologically Eclectic Rubbish Lister? - (Simon_Jester)
         Small Ruby correction ... - (JimWeirich) - (3)
             D'oh - (ben_tilly) - (2)
                 Perhaps you were thinking of 'next' - (JimWeirich) - (1)
                     No, I just thought that Ruby was more similar than it is - (ben_tilly)
         Re: Some corrections, notes, etc on Code Complete 2 - (systems) - (5)
             Human dynamics have no precise answers - (ben_tilly) - (4)
                 What do you base this belief on? - (broomberg) - (3)
                     My memory is fine - (ben_tilly) - (2)
                         He needs real world experience - (broomberg) - (1)
                             No disagreement -NT - (ben_tilly)

Do not press reset button is normal operation!
90 ms