Comparing Big Lists in Perl

Post #199,673 by pwhysall 3/21/05 5:04:42 AM Reply	Comparing Big Lists in Perl I have two lists of things. @old and @new. Each list item is itself a list. Let's say, for argument's sake, an item looks like this (comma separated for human readability): \nM1/0001A, 30/2/001/002, "somestring"\ngeog elec yadda\n The difference between @old and @new is that geog hasn't changed (and is unique), but I need to examine the last two elements of elec (in this case, /001/002). I don't care about somestring. At the moment I'm trying to store these things in a hash keyed on geog. As usual, I think I'm making this somewhat more complex than I need to. Would a better solution be to just split each line and push it onto a list, then chop up elec on demand? Also, is there any non-hash-based method for doing this that isn't going to suck? Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,684 by broomberg 3/21/05 8:28:08 AM Reply	I'd probably do something like this Note: I would not store the whole record if i was tight on memory. \n#!/usr/bin/perl -w\n\nuse strict;\nuse Data::Dumper;\n\n\nmy @list_a = ( \n\t\t\t\t\t[ 'M1/0001A', '30/2/001/002', 'somestring_a_0' ] ,\n\t\t\t\t\t[ 'M2/0001A', '30/2/001/222', 'somestring_a_1' ] ,\n\t\t\t\t);\n\nmy @list_b = ( \n\t\t\t\t\t[ 'M1/0001A', '30/2/011/002', 'somestring_b_0' ] ,\n\t\t\t\t\t[ 'M2/0001A', '30/2/001/222', 'somestring_b_1' ] ,\n\t\t\t\t);\n\n\ncompare_lists(\\@list_a, \\@list_b);\n\n\nsub compare_lists{\n\tmy ($list_ref_a, $list_ref_b) = @_;\n\n\tmy @fields = qw/geog elec yadda/;\n\n\tmy (%h1);\n\n\tforeach my $ref (@{$list_ref_a}){\n\t\tmy %rec; \n\t\t@rec{@fields} = @{$ref};\n\t\t($rec{tail}) = $rec{elec} =~ m{([^/]+/[^/]+)$}; # match the last 2 pieces\n\t\t$h1{$rec{geog}} = \\%rec;\n\t}\n\n\tprint Dumper (\\%h1);\n\n\tforeach my $ref (@{$list_ref_b}){\n\t\tmy %rec; \n\t\t@rec{@fields} = @{$ref};\n\t\t($rec{tail}) = $rec{elec} =~ m{([^/]+/[^/]+)$}; # match the last 2 pieces\n\n\t\tif (defined($h1{$rec{geog}})){\n\t\t\tif ($h1{$rec{geog}}->{tail} ne $rec{tail}){\n\t\t\t\tprint " Different: $h1{$rec{geog}}->{tail} ne $rec{tail}\\n";\n\t\t\t} \n\t\t}\n\t}\n\n}\n This produces: \n$VAR1 = {\n 'M2/0001A' => {\n 'yadda' => 'somestring_a_1',\n 'tail' => '001/222',\n 'elec' => '30/2/001/222',\n 'geog' => 'M2/0001A'\n },\n 'M1/0001A' => {\n 'yadda' => 'somestring_a_0',\n 'tail' => '001/002',\n 'elec' => '30/2/001/002',\n 'geog' => 'M1/0001A'\n }\n };\n Different: 001/002 ne 011/002\n
Post #199,687 by pwhysall 3/21/05 8:35:29 AM Reply	Thanks Yes. Better. Much :-) This produces a Z feature request: A syntax colourising weecode. Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,688 by admin 3/21/05 8:47:30 AM Reply	ROFL Uh huh... :-D Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #199,689 by pwhysall 3/21/05 8:48:06 AM Reply	Why rofl? It'd rock. Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,691 by admin 3/21/05 8:50:14 AM Reply	That's not the issue. Of course it would rock. Which syntax? And who writes the parser to figure out what needs highlighting? The ROI isn't high on that one. Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #199,694 by pwhysall 3/21/05 9:00:35 AM Reply	All of them. You. There. Not so hard, was it? Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,696 by folkert 3/21/05 9:09:27 AM Reply	So, you want EMACS in Z? -- [link\|mailto:greg@gregfolkert.net\|greg], [link\|http://www.iwethey.org/ed_curry\|REMEMBER ED CURRY!] @ *i_wethey* [link\|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230\|"Microsoft Security" is an even better oxymoron than "Military Intelligence"] No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link\|http://www.eweek.com/article2/0,1759,1622086,00.asp\|source]
Post #199,697 by pwhysall 3/21/05 9:10:38 AM Reply	I can't think of a single valid objection. Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,717 by broomberg 3/21/05 10:26:08 AM Reply	No, gvim But gvim does get confused occasionally. Page-Up / Page-Down usually fixes it.
Post #199,754 by ben_tilly 3/21/05 1:48:04 PM Reply	You're being too lazy Let me outline how to do it in such a way that you'll find most of your work already done for you. Install gvim. Write a small utility to save the current file to a filename with an extension indicating the current filetype. (.html, .pl, .c, etc) Then run the following shell command: \ngvim -f +"syn on" +"run\\! syntax/2html.vim" +"wq" +"q" $filename\n Now read back $filename.html. Post-process that slightly if you want. (You may not need to escape ! depending on how you execute this - in bash I need it.) This will work somewhat better if gvim has access to X. (According to the documentation it does a better job of picking colors, whatever that means.) Voila! Cheers, Ben I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
Post #199,775 by admin 3/21/05 2:44:13 PM Reply	Au contraire I'm more interested in the system being self-contained. With such a constraint, implementing this suggestion entails quite a bit more work than "shell out to gvim"... Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #199,780 by ben_tilly 3/21/05 2:56:29 PM Reply	You can still leverage the effort though... Many editors have syntax highlighting files. If you write a parser that parses some editor's syntax highlighting files and then displays based on that, then you at least don't have to write your own syntax files - you just need the parsing and the display code. Cheers, Ben I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
Post #199,787 by admin 3/21/05 3:17:53 PM Reply	"just"... Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #199,789 by ben_tilly 3/21/05 3:22:15 PM Reply	Compared to what you save... the effort that you'd have to expend is the merest trifle. :-P Cheers, Ben I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
Post #199,792 by admin 3/21/05 3:25:15 PM Reply	Compared to doing nothing and laughing at Peter... It's a lot of work, and not nearly as satisfying... Regards, -scott anderson "Welcome to Rivendell, Mr. Anderson..."
Post #199,772 by ChrisR 3/21/05 2:37:32 PM Reply	Don't forget the pretty print feature as well... ...to fix the indentation the way it should be. :-)
Post #199,703 by pwhysall 3/21/05 9:18:46 AM 3/21/05 9:19:51 AM Reply	Here's the whole (working, ugly) program #!/usr/bin/perl -w\n# checksigs.pl\n# 1.0.0.0 pw 18-03-05\n# Check PCO signals data has been correctly merged into RCC\n# Usage: checksig <pco_tree_path> <rcc_data_dir>\nuse strict;\nuse File::Find;\nuse Data::Dumper;\n\n# Check for incorrect invokation and populate parameters\nmy $argc = scalar @ARGV;\ndie "Usage: checksig <pco_tree_path> <rcc_data_dir>\\n" unless $argc == 2;\nmy $pco_dir = $ARGV[0];\nmy $rcc_dir = $ARGV[1];\nmy @pco_sigs; # array to contain PCO signals data\nmy @rcc_sigs; # array to contain PCO signals data\nmy $sig_array; # Reference to point to required destination array\nmy $debug = 1; # debug flag: set to 1 to enable debug output\nmy $sig_count; # count of processed signals. Info only.\n\nsub debug\n{\n\tif ($debug)\n\t{\n\t\tmy $msg = shift;\n\t\tprint "DEBUG: $msg\\n";\n\t}\n}\n\n# utility functions\nsub is_lit_device \n{\n\n\t# this function will determine if a device is one of those\n\t# transferred to the LIT\n\treturn 1;\n}\n\nsub process_file\n{\n\tmy $file = $_;\n\tmy $dir = $File::Find::dir;\n\tif ( $file =~ /^DEVICE.SIG/i )\n\t{\n\t\topen DEVICE, $file \|\| die "Cannot open $file: $!\\n";\n\n\t\t# read the signals into the pco_sig hash\n\t\tdebug("processing $file in $dir");\n\t\twhile (<DEVICE>)\n\t\t{\n\t\t\tnext if (/^!/); # skip comments\n\t\t\tmy @line = split(/,/); # split line at commas\n\n\t\t\t#\t\t\tmy $geo_addr = $line[0]; # get the geog addr\n\t\t\t#\t\t\tmy @ele_addr =\n\t\t\t#\t\t\t split( ///, $line[1] ); # split elec addr at /\n\t\t\t#\t\t\tmy $tpr = $ele_addr[2]; # get the TPR\n\t\t\t#\t\t\tmy $lnk = $ele_addr[3]; # get the elec addr on the link\n\t\t\t# add device to hash, omitting LIT devices\n\t\t\tpush @$sig_array, \\@line;\n\t\t\t$sig_count++;\n\t\t}\n\t\tclose DEVICE;\n\t}\n}\n\nsub compare_lists\n{\n\tmy ( $list_ref_a, $list_ref_b ) = @_;\n\tmy @fields = qw/geog elec params/;\n\tmy (%h1);\n\tforeach my $ref ( @{$list_ref_a} )\n\t{\n\t\tmy %rec;\n\t\t@rec{@fields} = @{$ref};\n\t\t( $rec{tail} ) =\n\t\t $rec{elec} =~ m{([^/]+/[^/]+)$}; # match the last 2 pieces\n\t\t$h1{ $rec{geog} } = \\%rec;\n\t}\n\n\t#\tprint Dumper (\\%h1);\n\tforeach my $ref ( @{$list_ref_b} )\n\t{\n\t\tmy %rec;\n\t\t@rec{@fields} = @{$ref};\n\t\t( $rec{tail} ) =\n\t\t $rec{elec} =~ m{([^/]+/[^/]+)$}; # match the last 2 pieces\n\t\tif ( defined( $h1{ $rec{geog} } ) )\n\t\t{\n\t\t\tif ( $h1{ $rec{geog} }->{tail} ne $rec{tail} )\n\t\t\t{\n\t\t\t\tprint\n" Different: $rec{geog} : $h1{$rec{geog}}->{tail} ne $rec{tail}\\n";\n\t\t\t}\n\t\t}\n\t}\n}\n\n# Traverse the $pco_dir tree, process entries with the process_file sub\n$sig_array = \\@pco_sigs;\nfind( &process_file, $pco_dir );\ndebug "added $sig_count signals to PCO list";\n$sig_count = 0;\n$sig_array = \\@rcc_sigs;\nfind( &process_file, $rcc_dir );\ndebug "added $sig_count signals to RCC list";\nmy $pco_ref = \\@pco_sigs;\nmy $rcc_ref = \\@rcc_sigs;\ncompare_lists( $pco_ref, $rcc_ref );\n Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes! Edited by pwhysall March 21, 2005, 09:19:51 AM EST Expand All History
Post #199,715 by broomberg 3/21/05 10:20:57 AM Reply	Suggestion \nopen DEVICE, $file \|\| die "Cannot open $file: $!\\n";\n Either: open (DEVICE, $file) \|\| die "Cannot open $file: $!\\n"; or open DEVICE, $file or die "Cannot open $file: $!\\n"; but NEVER open DEVICE, $file \|\| die "Cannot open $file: $!\\n"; The "or" has lower precedence. You can get into a situation where logic on the right hand side of the \|\| gets executed before the open. Very bad habit.
Post #199,716 by pwhysall 3/21/05 10:22:34 AM Reply	Thanks. That's just the sort of thing I'm not really very aware of. Well, that and the small issue of "writing good Perl". Peter [link\|http://www.ubuntulinux.org\|Ubuntu Linux] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Home] Use P2P for legitimate purposes!
Post #199,718 by broomberg 3/21/05 10:26:55 AM Reply	My pleasure I figure I'm working off my mail seed fee.
Post #199,720 by broomberg 3/21/05 10:51:23 AM Reply	sig_array global bad. Just send the \\@pco_sigs and \\@rcc_sigs to the process file function.

Welcome to IWETHEY!