IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New PHP vs Perl questions
I have a simple little loop.
It just tests to see if a value is set in a lookup, and
counts the results in a scoreboard.


FIELD_LOOP: for (my $i = 0; $i <= $#field_arr; $i++){
my $tmp = $field_arr[$i];
foreach my $field (@field_names){
if (defined($LOOKUPS{$field}{$tmp})){
$SCORE_BOARD{$field}{$i}++;
}
}
}


Runs about 16,666 records per second, which is
fine by me.

But: This type of logic may need to run
in a PHP script.

I have no recent PHP experience, but I can read it
(usually) when I have to pull business logic out.

So, can anyone point me to a quick example of loading
a hash of hash lists for this style of lookup in PHP?

And then the loop for the test?

The only PHP code I've been reading recently was written
by an interesting guy. He has many years of brain damage
while coding in RPG on IBM mid-range boxes. And then he
spent a couple more with Visual Basic. He brought that
experince to him to the PHP code I see.

Note: He's pretty good. But he can't help the fact his
code runs about 100 records per second, so I gotta make
sure it goes vroom or shell-out to a Perl script.
New Ooops,
wrong forum.

Oh well, close enough
New Try this
foreach( $field_arr as $i => $tmp ){

foreach( $field_names as $field ){
if( in_array( $LOOKUPS[$field], $tmp ) ){
$SCORE_BOARD[$field][$i]++;
}
}
}


I wouldn't normally use $i and $tmp there, but it matches yours. PHP allows a foreach loop with or without naming the index. Since you don't care about the index in the second one, I didn't include it.*

That's also the brute force -- AKA clear and readable -- way. You might get something faster using some combination of array_count_values, array_intersect, and array_intersect_accoc, but that would depend on the data structure.

http://www.php.net/m...-count-values.php
http://www.php.net/m...ray-intersect.php
http://www.php.net/m...tersect-assoc.php


* BTW all PHP arrays are automatically hashes. If you don't specify the key, it's an integer, and you can refer to it by that integer.


[edit] Whoops, forgot the comma in the in_array check.
--

Drew
Expand Edited by drook Oct. 30, 2009, 07:25:51 PM EDT
Expand Edited by drook Oct. 30, 2009, 07:26:19 PM EDT
New Here's a better version.

foreach( $field_arr as $i => $tmp )
foreach( $field_names as $field )
if( isset($LOOKUPS[$field][$tmp]) )
$SCORE_BOARD[$field][$i]++;


I'd nornmally try to use array_key_exists() on array elements, but isset() will work, too, if the value of the element isn't null. I can't do much more than that without knowing the larger algorithm.

(Some will argue that you should always use braces, but I will omit them in cases like this.)

Wade.

Q:Is it proper to eat cheeseburgers with your fingers?
A:No, the fingers should be eaten separately.
New Why better?
I haven't been hands-on with code in a while, so I don't remember all the nitty gritty. Is it better from a speed perspective, or less likely to fail on nulls/empty strings and such?
--

Drew
New Well, it depends on what defined() means in PERL.
Maybe I should have said "I don't think that will work." :-)

If "defined()" in PERL means that that array lookup will return a value, then the equivalent in PHP is "isset()" except it will still fail if the return value would be null. The function "in_array()" will search the given array for the given value and return the index. The function "array_key_exists()" will look in the given array for the given index and return true if it is there (even if the value is null).

Wade.

Q:Is it proper to eat cheeseburgers with your fingers?
A:No, the fingers should be eaten separately.
New So two issues
First is correctness, and various ways to deal with nulls, empty strings, etc. Totally dependent on how the data structure is built and how you want various flavors of nothingness handled.

Second issue is efficiency, which I wouldn't even bother worrying about until it's correct and I've benchmarked it.
--

Drew
New That's the problem
Using arrays in the current code, using the same logic as yours, runs as about 100 RPS. This is an interactive data check, and will have a sample of about 1,000 records.

So, you gotta design for contraints as well as correctness or you end up married to an implementation when time runs out, or you rewrite, poorly, since time ran out.

That is why I specified a hash of hash lists in my original post, since I'm seeking a particular implementation. I'd say Static nailed it. Thanks Wade.

But thanks for responding, you gave me some helpful info.
New *All* arrays in PHP are hashes
The only difference between my code and Wade's was how we checked for the existence: in_array vs. isset. And without benchmarking, I don't know enough about the internal implementation to know which is better. Interesting to see it's an order of magnitude difference.
--

Drew
New Big difference
Here:

http://brian.moonspo...ay-is-quite-slow/

But it only matter when you are trying to do large lookups - which I am.
My lookup tables are about 20,000 entries long, and I have 6 of them. Their key size varies from 2 characters to 50 characters, with the vast majority between 12 and 35 characters.

New Multiple orders of magnitude difference
is_set will do a hash calculation bucket lookup, and then possibly a chain walk via a linked list if your keys hash too close together. It's a pretty constant access time, no matter how large your list gets.

If I'm coding in Perl, it is a choice to use it. It takes a bit longer to construct and load a hash list though, so I don't use it by the default if I don't need it, and instead want standard array access (queuing, push, pop, numeric access, etc).

I understand that it is not an option in PHP, hashes are simply used. It simply depends on what method of access you use.

I ASSUME in_array has to access the first item, compare it, access the next item, compare it, read the next item, compare it.

But the time you've gotten to the 3rd compare (or so, no I haven't timed this), a hashed lookup would be complete.

So how many orders of magnitude is it if is is a 20,000 item lookup table, which would mean (assuming that 3 record walk vs hash access time holds) it is 6,666 times faster.
New It's an index vs. value thing.
array_key_exists() and isset() both look at the array's key, which *is* the hash-table lookup. It's wicked fast: probably the fastest thing you can do with an array in PHP.

in_array() looks in the array's *values*. These are *not* in a hash-table and is therefore much much much slower.

Wade, who is paid to know this stuff.

Q:Is it proper to eat cheeseburgers with your fingers?
A:No, the fingers should be eaten separately.
New Ahh, but look at my corrected* version
You're checking if the array key exists in the second dimension of the array. I'm checking for the value in the first dimension. Both versions are looking at the same hash.


* I had the parameters reversed until I just now updated it.
--

Drew
New Why is this an argument?
in_array == bad for large sets.
Dead end on this path.
New Because I didn't read all the comments on that thread
Which I now have. So I now understand that the keys are indexed, but the values are not.
--

Drew
New Good
I love educational threads.
New Thank you...
Sister Barry Catherine.

Please no rulers today!
New I prefer the gentle approach
Like my daughter says to me all the time:

GET OUT OF MY HEAD!

I never tell her what to do or think. But little anecdotes and hints are enough to send the wheels spinning. She hates that.
     PHP vs Perl questions - (crazy) - (17)
         Ooops, - (crazy)
         Try this - (drook) - (15)
             Here's a better version. - (static) - (14)
                 Why better? - (drook) - (13)
                     Well, it depends on what defined() means in PERL. - (static) - (12)
                         So two issues - (drook) - (11)
                             That's the problem - (crazy) - (10)
                                 *All* arrays in PHP are hashes - (drook) - (9)
                                     Big difference - (crazy)
                                     Multiple orders of magnitude difference - (crazy) - (7)
                                         It's an index vs. value thing. - (static) - (6)
                                             Ahh, but look at my corrected* version - (drook) - (5)
                                                 Why is this an argument? - (crazy) - (4)
                                                     Because I didn't read all the comments on that thread - (drook) - (3)
                                                         Good - (crazy) - (2)
                                                             Thank you... - (folkert) - (1)
                                                                 I prefer the gentle approach - (crazy)


I see you're making a vacuous presentation!

Would you like help with this feature?

311 ms