IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Smalltalk question
I've got a problem that I can't solve nicely (elegantly) in Smalltalk. Can the Smalltalkers here help me out?

I have a random number generator. I need to invoke it N times and compute how many results fall into each of m ranges. Ranges are specified by numbers from r[1] to r[m-1], first range being 0.0 to r[1] and last r[m-1] to 1.0 .

I simplified the problem by having three ranges of 1/3rd, and used groupedBy:aBlock on collection of results. This looks sort of OK in the code. Since the whole thing is a unit test, I don't particularly care. However, the unsolved problem sort of nags at me. Any takers?
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New What's the unsolved part?



"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."

     -- Martin Fowler, JAOO 2003
New The original problem
Count the number of values generated in each of the ranges, with arbitrary number of arbitrary ranges
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New Why not solve it like you would in any language?
The simplest way is to just keep the data in an array. Here that is in Perl:\r\n
\r\n# Example values.\r\nmy $n = 1000;\r\nmy $m = 6;\r\n\r\n# Populate the ranges.  The map is not needed in Perl,\r\n# but might be in another language.  You didn't specify\r\n# so I am making them half-open.  (1.00 is not in the\r\n# top range, 0 is in the bottom one.)\r\nmy @ranges = map 0, 1..$m;\r\nfor (1..$n) {\r\n  my $random = rand();\r\n  $ranges[int($random * $m)]++;\r\n}\r\n\r\nuse Data::Dumper;\r\nprint Dumper \\@ranges;\r\n
\r\nIf you want to get fancy about it, you could create a class Range, and give Range objects the ability to figure out which numbers belong to them, and which ones you like. Then create your Ranges, generate your random numbers, pass them to each Range, and then ask the Ranges for a summary.

\r\n\r\nBut I think that that is overkill for this...

\r\n\r\nCheers,
\r\nBen
"good ideas and bad code build communities, the other three combinations do not"\r\n- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New That is the problem I solved
The one that still needs solving is doing the ranges that look like:

0 - 0.1
0.1 - 0.4
0.4 - 0.8
0.8 - 1.0

All ranges are different, IOW.
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New 2 ways
1st way
create 3 arrays

Counter array
Lower-bound array
Upper-bound array

Lower-bound array has values 0, 0.1, 0.4, 0.8
Upper-bound array has values 0.1, 0.4, 0.8, 1

Then, for each generated value X, if X > lb_array[i] && X < ub_array[i] --> counter_array[i}++
(Which way do you do if X == lb or X == ub?)

2nd way
Have an array of structures

Struct {
Counter,
lb,
ub.
}

then, same logic,
for each generated value X, if X > array.lb && X < array.ub --> array.counter++
New Re: 2 ways
That's how I'd do it in C (In fact, only one array is needed for ranges: the end of one range is the beginning of next)

Smalltalk is a different story. Iterators on collections normally don't deal with indexes.
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
Expand Edited by Arkadiy Dec. 4, 2003, 09:37:43 AM EST
New The iterator doesn't change the algorithm (with assumption)
(Assumption: Smalltalk iterators work similar to Java && C++ iterators)

Because you're walking through the array in both cases.

1st example (with iterator)
collection collect
collection ub
collection lb

Create 3 iterators,
C = collect.iterator
U = ub.iterator
L = lb.iterator
\nfor each result X  {\n\n     for (C = collect.begin(), U = ub.begin(), L = lb.begin();\n         C != collect.end();\n         C++, U++, L++)  {\n        if ((X < U.value()) && (X > L.value())  (C.value())++;\n     }\n}\n



Hint, it's even easier in method two, since you only need one iterator in this case.
\nit = array.iterator();\nfor each result X {\n    for (it = array.begin(); it != array.end(); it++) {\n       if ((X < it.value().U) && (X > it.value().L))  (it.value().C)++;\n    }\n}\n

New Smalltalk iterators are different
You pass a block of code to a collection's method, and this block is executed for every array element:

myArray do: [ :element| element modifyAsNeeded]

A variation I used in my cheating implementation was:

randomNumbers groupedBy: [:number | (number / increment) truncated]

This produces a dictionary of sets that has integers as keys, and set of all numbers that produced that integer in the block as vaules.
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New Okay...so your 'cheat' is the method
Rather than walk through our collection, we apply to each element of the collection at one time.

I still would use my 2nd way, making the collection a set of objects, each object knowing whether or not to increment it's counter when presented with a random number X. (Each object would have an upper/lower bound and test for them).

Add a method to pull out the counter out of each object afterwards and you're done.
New see my answer to Ben below...
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New Some iterators do provide indexes
Here's my cut at it

| sequence ranges buckets |
sequence := Random seed: 5. "yes this is lame - pick a better seed"
ranges := #( 0.0 0.1 0.4 0.8 ).
buckets := Array new: (ranges size).
buckets atAllPut: 0.
500 timesRepeat: [ "or however many times you want"
\t| n index |
\tn := sequence next.
\tindex := ranges findLast: [:value | n >= value].
\tbuckets at: index put: ((buckets at: index) + 1)].

"This just prints out the results and could likely be done better with a stream"
ranges := ranges copyWith: 1.0. "stick a 1.0 on the end for completeness"
1 to: (ranges size-1) do: [:i |
\t| lo hi |
\tlo := ranges at: i.
\thi := ranges at: i+1.
\tTranscript show: ((lo asString), ' - ', (hi asString), ' : ', ((buckets at: i) asString)); cr]

There's also an iterator - pairsDo: which is handy for this sort of thing. The ranges array would have to look like

ranges := #(
0.0 0.1
0.1 0.4
0.4 0.8
0.8 1.0
).

But you'll still need to run your own counter to index into the buckets array.

ie:

index := 1.
ranges pairsDo: [:lo :hi |
(n isBetween: lo and: hi)
ifTrue: [buckets at: index put: (buckets at: index) + 1]
ifFalse: [index := index +1]]

The problem with this is you don't get a loop exit once you've found what you're looking for. So I use findLast which searches until found.




"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."

     -- Martin Fowler, JAOO 2003
New Yay!
Exactly what I hoped to see.

I knew Smalltalk would have something nice, even without a bucket class. Thanks.
--

"There's nothing more nervous than a million dollars. It does not speak French, it does not speak English, it does not speak German and it moves very fast."

-- Jean Chretien
New Why worry about that factor of 2?
The problem with this is you don't get a loop exit once you've found what you're looking for. So I use findLast which searches until found.

Sure, on average you will cut your search time by a factor of 2. But if you have enough buckets to care, then you really should use a binary search strategy instead.

So either I wouldn't care, or I would care and do a lot better.

So why worry about the factor of 2?

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New Habit, I suppose
From working on powerflow simulators that ran simulations over long periods of time. A factor of 2 might be the difference between getting an answer Friday or late next week.

:-P

But I do see your point on the BSearch. I don't know if there's a handy one built in though. Don't have squeak handy on the work machine. I'll look later.



"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."

     -- Martin Fowler, JAOO 2003
New It isn't just you though
I've seen a lot of people from Smalltalk environments who are perfectly happy to use an array with a detect method where something like a hash would make a lot more sense.

I have been left to wonder if they got in the habit of abstracting so far that they forgot the algorithm implications of their choices?

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New I know what you mean
But you can't really just say something like SortedCollection should implement #detect: as binary search. Its a general purpose method and the aspect you are searching on may not have anything at all to do with the sort ordering.

I've found some find: methods that return the index. But again, because of the nature of the evaluation block, they do a linear search. I've spent a few minutes looking for a #binarySearch: or similar and haven't found one. Of course, you can add one without much effort.

Smalltalk culture is pretty much about efficiency of coding - with efficiency of implementation being seldom considered until its crucial.

Actually, there appear to be 2 cults - one dedicated to elegant simplicity of coding - which is nearly everybody - and the other dedicated to really clever (and transparent) optimizations - like the people working on croquet that are transparently offloading graphics operations to hardware as much as possible - or the people who work on the VMs.



"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."

     -- Martin Fowler, JAOO 2003
New Here is my second strategy in Ruby
\n#! /usr/bin/ruby\nclass RangeBucket\n  \n  def initialize (min, max)\n    @min = min;\n    @max = max;\n    @count = 0;\n  end\n\n  def add_if_in (element)\n    if @min < element && element < @max\n      @count += 1;\n    end\n  end\n\n  def to_s ()\n    sprintf("%1.5f - %1.5f: %d", @min, @max, @count);\n  end\n  \nend\n\nranges = [\n  RangeBucket.new(0, 0.1),\n  RangeBucket.new(0.1, 0.4),\n  RangeBucket.new(0.4, 0.8),\n  RangeBucket.new(0.8, 1),\n];\n\n1000.times{\n  value = rand();\n  ranges.each {|range| range.add_if_in(value)};\n}\n\nputs ranges;\n

That should be translatable into any language that you want, including Smalltalk. (Ruby chosen because I know it and its object model is similar to Smalltalk.)
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New I realized that introducing a bucket class makes it easy
Does it constitute an elegant Smalltalk solution? I had high hopes for iterators, but I guess in OO language, the first answer to anything is to create a class :)
--

The rich, as usual, are employing the elected.
-- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
New Elegance is in the eye of the beholder
That solution looks pretty clean and natural. Perhaps (now that I am awake) it would be better to have the description and test of when you are in the bucket factored out of the class and into the call. That puts the bucketing behaviour under the caller's control, allowing you to generalize the concept. (For instance one bucket might be even integers, and the next odd ones.)

If you have a lot of buckets and a lot of numbers, though, it won't scale very well. It is O(n*m). In that case you probably want to go to an array implementation and use a binary search. Which again can be hidden in a class if you wish...

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
     Smalltalk question - (Arkadiy) - (19)
         What's the unsolved part? -NT - (tuberculosis) - (1)
             The original problem - (Arkadiy)
         Why not solve it like you would in any language? - (ben_tilly) - (16)
             That is the problem I solved - (Arkadiy) - (15)
                 2 ways - (Simon_Jester) - (11)
                     Re: 2 ways - (Arkadiy) - (10)
                         The iterator doesn't change the algorithm (with assumption) - (Simon_Jester) - (3)
                             Smalltalk iterators are different - (Arkadiy) - (2)
                                 Okay...so your 'cheat' is the method - (Simon_Jester) - (1)
                                     see my answer to Ben below... -NT - (Arkadiy)
                         Some iterators do provide indexes - (tuberculosis) - (5)
                             Yay! - (Arkadiy)
                             Why worry about that factor of 2? - (ben_tilly) - (3)
                                 Habit, I suppose - (tuberculosis) - (2)
                                     It isn't just you though - (ben_tilly) - (1)
                                         I know what you mean - (tuberculosis)
                 Here is my second strategy in Ruby - (ben_tilly) - (2)
                     I realized that introducing a bucket class makes it easy - (Arkadiy) - (1)
                         Elegance is in the eye of the beholder - (ben_tilly)

tar -C /usr/local/dev/zope/import zxf lrpdisms.tar.gz
124 ms