Post #128,230
12/1/03 6:06:29 AM
|
Smalltalk question
I've got a problem that I can't solve nicely (elegantly) in Smalltalk. Can the Smalltalkers here help me out?
I have a random number generator. I need to invoke it N times and compute how many results fall into each of m ranges. Ranges are specified by numbers from r[1] to r[m-1], first range being 0.0 to r[1] and last r[m-1] to 1.0 .
I simplified the problem by having three ranges of 1/3rd, and used groupedBy:aBlock on collection of results. This looks sort of OK in the code. Since the whole thing is a unit test, I don't particularly care. However, the unsolved problem sort of nags at me. Any takers?
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #128,318
12/1/03 2:42:36 PM
|
What's the unsolved part?
"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."
-- Martin Fowler, JAOO 2003
|
Post #128,328
12/1/03 3:38:19 PM
|
The original problem
Count the number of values generated in each of the ranges, with arbitrary number of arbitrary ranges
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #128,726
12/3/03 7:18:29 PM
|
Why not solve it like you would in any language?
The simplest way is to just keep the data in an array. Here that is in Perl:\r\n \r\n# Example values.\r\nmy $n = 1000;\r\nmy $m = 6;\r\n\r\n# Populate the ranges. The map is not needed in Perl,\r\n# but might be in another language. You didn't specify\r\n# so I am making them half-open. (1.00 is not in the\r\n# top range, 0 is in the bottom one.)\r\nmy @ranges = map 0, 1..$m;\r\nfor (1..$n) {\r\n my $random = rand();\r\n $ranges[int($random * $m)]++;\r\n}\r\n\r\nuse Data::Dumper;\r\nprint Dumper \\@ranges;\r\n \r\nIf you want to get fancy about it, you could create a class Range, and give Range objects the ability to figure out which numbers belong to them, and which ones you like. Then create your Ranges, generate your random numbers, pass them to each Range, and then ask the Ranges for a summary. \r\n\r\nBut I think that that is overkill for this... \r\n\r\nCheers, \r\nBen
"good ideas and bad code build communities, the other three combinations do not"\r\n- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|
Post #128,829
12/4/03 9:22:59 AM
|
That is the problem I solved
The one that still needs solving is doing the ranges that look like:
0 - 0.1 0.1 - 0.4 0.4 - 0.8 0.8 - 1.0
All ranges are different, IOW.
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #128,839
12/4/03 9:35:42 AM
|
2 ways
1st way create 3 arrays
Counter array Lower-bound array Upper-bound array
Lower-bound array has values 0, 0.1, 0.4, 0.8 Upper-bound array has values 0.1, 0.4, 0.8, 1
Then, for each generated value X, if X > lb_array[i] && X < ub_array[i] --> counter_array[i}++ (Which way do you do if X == lb or X == ub?)
2nd way Have an array of structures
Struct { Counter, lb, ub. }
then, same logic, for each generated value X, if X > array.lb && X < array.ub --> array.counter++
|
Post #128,840
12/4/03 9:37:11 AM
12/4/03 9:37:43 AM
|
Re: 2 ways
That's how I'd do it in C (In fact, only one array is needed for ranges: the end of one range is the beginning of next)
Smalltalk is a different story. Iterators on collections normally don't deal with indexes.
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
Edited by Arkadiy
Dec. 4, 2003, 09:37:43 AM EST
|
Post #128,844
12/4/03 9:47:03 AM
|
The iterator doesn't change the algorithm (with assumption)
(Assumption: Smalltalk iterators work similar to Java && C++ iterators) Because you're walking through the array in both cases. 1st example (with iterator) collection collect collection ub collection lb Create 3 iterators, C = collect.iterator U = ub.iterator L = lb.iterator \nfor each result X {\n\n for (C = collect.begin(), U = ub.begin(), L = lb.begin();\n C != collect.end();\n C++, U++, L++) {\n if ((X < U.value()) && (X > L.value()) (C.value())++;\n }\n}\n Hint, it's even easier in method two, since you only need one iterator in this case. \nit = array.iterator();\nfor each result X {\n for (it = array.begin(); it != array.end(); it++) {\n if ((X < it.value().U) && (X > it.value().L)) (it.value().C)++;\n }\n}\n
|
Post #128,869
12/4/03 10:50:10 AM
|
Smalltalk iterators are different
You pass a block of code to a collection's method, and this block is executed for every array element:
myArray do: [ :element| element modifyAsNeeded]
A variation I used in my cheating implementation was:
randomNumbers groupedBy: [:number | (number / increment) truncated]
This produces a dictionary of sets that has integers as keys, and set of all numbers that produced that integer in the block as vaules.
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #128,884
12/4/03 11:44:55 AM
|
Okay...so your 'cheat' is the method
Rather than walk through our collection, we apply to each element of the collection at one time.
I still would use my 2nd way, making the collection a set of objects, each object knowing whether or not to increment it's counter when presented with a random number X. (Each object would have an upper/lower bound and test for them).
Add a method to pull out the counter out of each object afterwards and you're done.
|
Post #128,891
12/4/03 11:59:42 AM
|
see my answer to Ben below...
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #129,114
12/5/03 12:48:07 PM
|
Some iterators do provide indexes
Here's my cut at it
| sequence ranges buckets | sequence := Random seed: 5. "yes this is lame - pick a better seed" ranges := #( 0.0 0.1 0.4 0.8 ). buckets := Array new: (ranges size). buckets atAllPut: 0. 500 timesRepeat: [ "or however many times you want" \t| n index | \tn := sequence next. \tindex := ranges findLast: [:value | n >= value]. \tbuckets at: index put: ((buckets at: index) + 1)].
"This just prints out the results and could likely be done better with a stream" ranges := ranges copyWith: 1.0. "stick a 1.0 on the end for completeness" 1 to: (ranges size-1) do: [:i | \t| lo hi | \tlo := ranges at: i. \thi := ranges at: i+1. \tTranscript show: ((lo asString), ' - ', (hi asString), ' : ', ((buckets at: i) asString)); cr]
There's also an iterator - pairsDo: which is handy for this sort of thing. The ranges array would have to look like
ranges := #( 0.0 0.1 0.1 0.4 0.4 0.8 0.8 1.0 ).
But you'll still need to run your own counter to index into the buckets array.
ie:
index := 1. ranges pairsDo: [:lo :hi | (n isBetween: lo and: hi) ifTrue: [buckets at: index put: (buckets at: index) + 1] ifFalse: [index := index +1]]
The problem with this is you don't get a loop exit once you've found what you're looking for. So I use findLast which searches until found.
"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."
-- Martin Fowler, JAOO 2003
|
Post #129,616
12/8/03 10:56:26 AM
|
Yay!
Exactly what I hoped to see.
I knew Smalltalk would have something nice, even without a bucket class. Thanks.
--
"There's nothing more nervous than a million dollars. It does not speak French, it does not speak English, it does not speak German and it moves very fast."
-- Jean Chretien
|
Post #129,758
12/8/03 7:24:30 PM
|
Why worry about that factor of 2?
The problem with this is you don't get a loop exit once you've found what you're looking for. So I use findLast which searches until found.
Sure, on average you will cut your search time by a factor of 2. But if you have enough buckets to care, then you really should use a binary search strategy instead.
So either I wouldn't care, or I would care and do a lot better.
So why worry about the factor of 2?
Cheers, Ben
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|
Post #129,853
12/9/03 1:29:41 PM
|
Habit, I suppose
From working on powerflow simulators that ran simulations over long periods of time. A factor of 2 might be the difference between getting an answer Friday or late next week.
:-P
But I do see your point on the BSearch. I don't know if there's a handy one built in though. Don't have squeak handy on the work machine. I'll look later.
"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."
-- Martin Fowler, JAOO 2003
|
Post #130,170
12/11/03 1:06:39 AM
|
It isn't just you though
I've seen a lot of people from Smalltalk environments who are perfectly happy to use an array with a detect method where something like a hash would make a lot more sense.
I have been left to wonder if they got in the habit of abstracting so far that they forgot the algorithm implications of their choices?
Cheers, Ben
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|
Post #130,258
12/11/03 10:53:11 AM
|
I know what you mean
But you can't really just say something like SortedCollection should implement #detect: as binary search. Its a general purpose method and the aspect you are searching on may not have anything at all to do with the sort ordering.
I've found some find: methods that return the index. But again, because of the nature of the evaluation block, they do a linear search. I've spent a few minutes looking for a #binarySearch: or similar and haven't found one. Of course, you can add one without much effort.
Smalltalk culture is pretty much about efficiency of coding - with efficiency of implementation being seldom considered until its crucial.
Actually, there appear to be 2 cults - one dedicated to elegant simplicity of coding - which is nearly everybody - and the other dedicated to really clever (and transparent) optimizations - like the people working on croquet that are transparently offloading graphics operations to hardware as much as possible - or the people who work on the VMs.
"I believe that many of the systems we build today in Java would be better built in Smalltalk and Gemstone."
-- Martin Fowler, JAOO 2003
|
Post #128,848
12/4/03 9:51:44 AM
|
Here is my second strategy in Ruby
\n#! /usr/bin/ruby\nclass RangeBucket\n \n def initialize (min, max)\n @min = min;\n @max = max;\n @count = 0;\n end\n\n def add_if_in (element)\n if @min < element && element < @max\n @count += 1;\n end\n end\n\n def to_s ()\n sprintf("%1.5f - %1.5f: %d", @min, @max, @count);\n end\n \nend\n\nranges = [\n RangeBucket.new(0, 0.1),\n RangeBucket.new(0.1, 0.4),\n RangeBucket.new(0.4, 0.8),\n RangeBucket.new(0.8, 1),\n];\n\n1000.times{\n value = rand();\n ranges.each {|range| range.add_if_in(value)};\n}\n\nputs ranges;\n That should be translatable into any language that you want, including Smalltalk. (Ruby chosen because I know it and its object model is similar to Smalltalk.)
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|
Post #128,871
12/4/03 10:52:59 AM
|
I realized that introducing a bucket class makes it easy
Does it constitute an elegant Smalltalk solution? I had high hopes for iterators, but I guess in OO language, the first answer to anything is to create a class :)
--
The rich, as usual, are employing the elected. -- [link|http://unfit2print.blogspot.com/|http://unfit2print.blogspot.com/]
|
Post #128,964
12/4/03 2:40:45 PM
|
Elegance is in the eye of the beholder
That solution looks pretty clean and natural. Perhaps (now that I am awake) it would be better to have the description and test of when you are in the bucket factored out of the class and into the call. That puts the bucketing behaviour under the caller's control, allowing you to generalize the concept. (For instance one bucket might be even integers, and the next odd ones.)
If you have a lot of buckets and a lot of numbers, though, it won't scale very well. It is O(n*m). In that case you probably want to go to an array implementation and use a binary search. Which again can be hidden in a class if you wish...
Cheers, Ben
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|