How long should routines be in different kinds of code?

Post #121,101

10/13/03 6:30:48 PM

I posted something to perlmonks based on a discussion here a while ago, and some [link|http://www.perlmonks.org/index.pl?node_id=298755|interesting points] came up.

Cheers,
Ben

"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]

Post #121,108

by deSitter

10/13/03 7:48:01 PM

Great question

Some languages are impossible to digest if they are more than a screenful - for example I could easily read your Perl that you recently posted, but if it's more than a function or two, it gets very hard. FORTRAN and COBOL were easy to read in 100 page listings. C is somewhere between - but in general it is much easier to read if broken into smallish pieces (say, a file per major function).

In general, the more a langauge relies on non-alphanumeric symbols, the smaller the digestible chunks - exception FORTH, which should never be more than 60 or so lines per chunk (because of the mental gymnastics needed to interpret the stack).

-drl

Post #121,142

by JimWeirich

10/14/03 1:58:59 AM

Re: Great question

[...] exception FORTH, which should never be more than 60 or so lines per chunk

60 lines per chunk? For FORTH? 60 lines would be a mind-numbingly huge FORTH word (i.e. routine). FORTH words tend to be very small, many one-liners, up to three lines very common, 5 lines would be on the large size, and 20 lines would be HUGE!

Perhaps you didn't mean Chunk = Word.

--
-- Jim Weirich jweirich@one.net [link|http://onestepback.org|http://onestepback.org]
---------------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Post #121,150

by deSitter

10/14/03 6:26:08 AM

Re: Great question

I think I meant 20 - was thinking of a traditional "screen". I transposed rows and columns :)

-drl

Post #121,112

by FuManChu

10/13/03 8:06:33 PM

Are you looking for more discussion? ;)

I was just thinking about this today. My conclusion is that function length is a good question, but not the whole story.

What's more important is complexity, as one of your perlmonks pointed out. But complexity has an odd relationship with function length: programs should vary greatly in complexity; functions should not.

Take subclassing, for example. I tend to have a rule of thumb that subclassing is great when applied once. Once in a while, I might subclass a subclass. I almost never go more than two levels deep. Why? Because this exceeds a level of complexity with which I am comfortable. I tend to think of subclassing in terms like "base class" -> "subtype" -> "instance". Anything more complex, and I start to lose understanding.

Ever try to fold a piece of paper multiple times? Anecdotes say that for any square piece of paper, it's impossible to fold it more than seven times. This is probably due in part to edge formation rules, but it's also a symptom of the fact that, to end up with a square one inch on each side, you'd have to start with a sheet 256 inches (over 21 feet) on each side for 8 folds!

In programming, we have multiple constructs to handle this problem. Expressions and statements at the lowest levels, then combined expressions like f(g(x)), blocks and loops, functions, classes, modules, packages, applications, etc. Add your particular favorite constructs in where you feel like it.

Back to the subclassing example. Once we move into the class hierarchy construct, there's a limit to the amount of complexity we can maintain in that layer. It's possible to have 72 levels of subclasses, but you end up with a problem: a base class so abstract, not even light can escape. At some point, most programmers decide to "bump up" the complexity to the next construct.

Given all of that, it's nice to have a fairly common function length. Like one of the perlmonkeys, I like about a "page length". The point is that functions are constructs; they are able to support infinite complexity; but when other constructs are available, they probably shouldn't.

Side note: it's common for procedural coders to consider their code, with inline comments, as the whole program. OO'ers I know tend to view a program as code plus design docs. They're more used to thinking in terms of multiple levels of complexity (in my experience).

"There's a set of rules that anything that was in the world when you were born is normal and natural. Anything invented between when you were 15 and 35 is new and revolutionary and exciting, and you'll probably get a career in it. Anything invented after you're 35 is against the natural order of things."

Douglas Adams

Post #121,255

by ben_tilly

10/15/03 1:34:53 AM

The responses that I liked best...

were from the AM who was talking about OO as being a way to manage the coupling in your system.

I will need to digest that idea for a bit I think.

Cheers,
Ben

"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]

Post #121,312

by JimWeirich

10/15/03 1:09:06 PM

Re: The responses that I liked best...

[...] OO as being a way to manage the coupling in your system

Bob Martin from Object Mentor has been pushing the "OO is about coupling" message for many years. If you are not familiar with his writings, check out the object mentor site (a link to the design principles area is: [link|http://www.objectmentor.com/resources/listArticles?key=topic&topic=Design%20Principles|http://www.objectmen...sign%20Principles]). The Open/close and Liskov substitution principles are good places to start, but the real practical stuff begins in the Dependency Inversion and Interface Segregation principles.

Post #121,141

by JimWeirich

10/14/03 1:52:53 AM

Re: How long should routines be in different kinds of code?

(from the referenced article): Well what leaps out at me is that every method call has an implicit if in it!

Actually, each method call is an open-ended case statement. Applying McCabe's complexity measurement based on that gives infinite (well, arbitrarily large) complexity for a single method call. That leads me to suspect that applying McCabe in this manner is misleading.

I've found my ideal function length has been growing shorter and shorter. It may be that it is because I'm doing OO now, but I think that if I were programming in straight procedural code, I would still favor smaller routines than I used to.

Post #121,157

by neelk

10/14/03 10:09:27 AM

McCabe's complexity metric is utterly fscked

The complexity metric used in the article is:

The complexity of a function is 1 (for the function), plus:

1 for every if, and, and or,
1 for every while or repeat
1 for every for, and
1 for every case of a case statement.

This does not measure complexity -- it just measures how big the syntax tree is. A proper measure of complexity should reflect something like "how hard is it to prove the program implements some specification". For procedural code, the appropriate place to look is at the Floyd-Hoare proof rules, and to track how hard the proof is. (In this style, you establish preconditions and postconditions on each statement in the program, and try to prove that the postcondition is implied by the preconditions and the semantics of the statement.)

For if, and, or, and case, proving that the preconditions imply the postconditions is just a case analysis over the branches. For a for loop, you have to do a proof by induction over the integers the loop variable ranges over. For a while or repeat loop, you need to first figure out what the loop invariants are, and then do the induction. So establishing the correctness of -- ie, figuring out what the code does -- is a lot harder for a while loop than an if statement. Penalizing a case more heavily than a while loop is just nuts.

Post #121,177

by Simon_Jester

10/14/03 12:27:23 PM

While I agree with your sentiment

McCabe measures (among other things - there are a couple of measurements) code pathways.

Given a simple if statement with one condition there are 2 pathways - one if the statement is true, another if it is false.

However, given a if statement with two conditions there are 4 pathways.

con1 = true con2 = true	con1 = false con2 = true
con1 = true con2 = false	con1 = false con2 = false

Same measurement is used for while loops.

Case statements would have X + 1 conditions where X is the number of possible values the condition can switch on plus the default condition.

Post #121,254

by ben_tilly

10/15/03 1:29:44 AM

I won't disagree...

But it had the advantage that the book I was looking in used it, and it is simple to understand.

Said book used it because however suboptimal it was, there was a bunch of research that used it.

However some of that research might not have meant quite what it seems on the surface. For instance with a lot of small routines they may get changed more per line in maintainance. But if they implement more functionality per line of code (eg because of less duplication of code), then the lines of code measure is less meaningful.

Cheers,
Ben

PS OK, I admit it. It had the benefit of being easy to extend in a way that got me the conclusion I intuitively feel is right (that shortness matters more in good OO than procedural).

"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]

Post #121,243

by broomberg

10/14/03 10:03:00 PM

2 pages max for Perl for me

Less is better if it happens naturally.

3 Lines of SAS is too much.

Check this out:

\nDO T=1 TO 99;\n  IF T =1 THEN COMMX=COMM1; ELSE COMMX=COMM2;\n  EXPS(T)=(1-((LOSS(T)* &LRHO * &LRAGE * &LRVEH * LROCT * PLK_LRF)\n            + (&GOE+&PTE+&COC+COMMX) - &IIR));\n  EXPY(T)=(1-((LOSS(T)* &LRHO * &LRAGE * &LRVEH * LROCT * PLK_LRF)\n            + (&GOE+&PTE+COMMX) - &IIR));\n  R= T-1;\n

Imagine hundreds of lines of this stuff.

Post #121,380

by tuberculosis

10/15/03 9:34:19 PM

My rules of thumb

About 1 - 10 lines for an OO method
15 lines or so for functions (like in C)
with exceptions for stuff that reads and writes file formats (these tend to be longer).

In Java, you can't escape the creepy feeling.

--James Gosling

Welcome to IWETHEY!