This wheel has been invented already :-)

Post #174,109 by ben_tilly 9/14/04 2:12:40 AM Reply	This wheel has been invented already :-) First some comments on the code that you wrote off of the top of your head. Then discussion of the ways to solve this problem that I'd actually use. The code review will essentially wind up with what Barry gave, so some of you may wish to skip ahead. Here's what you gave. \nopen FILE, "myfile.txt";\nwhile(read FILE, $buf, 1024) { $str = $str . $buf; }\nmy @lines = split /\\n/, $str;\nmy %aHash;\nfor my $line (@lines)\n{\n ($key, $val) = split / /, $line;\n $aHash{$key} = $val;\n}\n Not bad. Syntactically correct. Some stylistic errors. It is harder than it needs to be. It has a couple of bugs, only one of which is serious. But not bad for someone who was only thinking of learning Perl under a week ago. Let's start with open FILE, "myfile.txt"; As stated in [link\|http://www.perldoc.com/perl5.8.4/pod/perlstyle.html\|perlstyle], any attempt to open a file should include an error check. Like this: \nmy $file_name = "myfile.txt";\nopen(FILE, $file_name) or die "Cannot read '$file_name': $!";\n If I'm being paranoid (eg the file name comes from CGI) I'd use the 3 argument version of open so that a bad file name cannot accidentally run a program: \nopen(FILE, "<", $file_name) or die "Cannot read '$file_name': $!";\n Once you've done that, you do the following to get the lines: \nwhile(read FILE, $buf, 1024) { $str = $str . $buf; }\nmy @lines = split /\\n/, $str;\n First of all you are going through too much work to read things in by line. What you're doing works, but the built-in <> operator does that. If I wasn't eliminating this section entirely I'd also fix $str = $str . $buf; to say $str .= $buf; because Perl uses a smart internal buffering algorithm to avoid recopying data when you incrementally build a string. By explicitly recopying a string into itself you defeat that optimization. Anyways that is going away, so we're left with the last section: \nfor my $line (@lines)\n{\n ($key, $val) = split / /, $line;\n $aHash{$key} = $val;\n}\n Obviously the fact that you're not removing newlines is a bug - I'd call that serious. We can fix that with an appropriately placed [link\|http://www.perldoc.com/perl5.8.4/pod/func/chomp.html\|chomp]. And there is a minor bug in that spaces in the value might cause your value to be chopped off, we can fix that with the third argument to [link\|http://www.perldoc.com/perl5.8.4/pod/func/split.html\|split]. Not a bug, but a stylistic nit is that people in Perl don't tend to use the aHash naming scheme. If you need a reason, mine is that it sometimes helps to mentally read $ as "the", so it isn't natural to be mentally saying, "the a hash". That is obviously unimportant, nor does it matter that I like placing braces differently than you do. So here is what we get: \nmy $hash;\nforeach my $line (<FILE>) {\n chomp($line);\n my ($key, $val) = split / /, $line, 2;\n $hash{$key} = $val;\n}\n Also note that I declared every variable that I used. That is important. I'm a big fan of using [link\|http://www.perlmonks.org/index.pl?node=strict.pm\|strict] to catch any variables that have not been so declared. This catches a lot of my typos. And a technical note that you could easily miss. This construct actually slurps the whole file into memory before reading any of it. You can avoid that with while like so: \nmy $hash;\nwhile (my $line = <FILE>) {\n chomp($line);\n my ($key, $val) = split / /, $line, 2;\n $hash{$key} = $val;\n}\n But wait, why does <> sometimes read the whole file in, and sometimes not? This is a result of a basic design idea in Perl known as context. This is explained in detail in [link\|http://www.perldoc.com/perl5.8.4/pod/perlstyle.html\|perlstyle]. As a matter of personal opinion I believe this to be a design mistake in Perl, but many others cite it as one of their favorite parts. In any case the idea is that if a construct looks like it wants a list, it gets a list. If it looks like it wants a single thing, it gets a single thing. It is up to each function how it wants to deliver a single thing or a list and [link\|http://www.perlmonks.org/index.pl?node_id=347416\|no convention exists] on this. You can use [link\|http://www.perldoc.com/perl5.8.4/pod/func/wantarray.html\|wantarray] to make your functions behave differently in different contexts. Incidentally the final solution that I gave looks like this: \nuse strict;\n# time passes\nmy $file_name = "myfile.txt";\nopen(FILE, "<", $file_name) or die "Cannot read '$file_name': $!";\nmy $hash;\nwhile (my $line = <FILE>) {\n chomp($line);\n my ($key, $val) = split / /, $line, 2;\n $hash{$key} = $val;\n}\n Change variable names and use implicit variables and change the open line in minor ways to get Barry's version. People often throw implicit variables into beginning examples, but I like leaving them out. They are another moving part in something that has enough moving parts to keep track of, wait until you are more comfortable with the rest before using them. Besides which, in real code they don't tend to be used as densely as they can be in the small utilities that people use a demos. But this is the mechanics of how to roll your own wheel. A large part of the strength of Perl is how many pre-built wheels you have to choose from. Your wheel has obvious deficiencies. For instance you can't put a space or a newline in the key, nor can you put a newline in the value. Let me go through some of the wheels that you might choose instead. For this specific problem there are two basic classes of solutions. One are tools to turn a data structure to/from a string, and the other are tools to have a data structure that looks like it is in memory actually live on disk. I'll include some of both: [link\|http://search.cpan.org/~ilyam/Data-Dumper-2.121/Dumper.pm\|Data::Dumper]: Core Perl module. It turns a reference to a data structure into pretty-printed code that will produce that data structure. There are things that it cannot figure out properly, but most of the time it does well. You can [link\|http://www.perldoc.com/perl5.8.4/pod/func/eval.html\|eval] the code, or if it is in a file, [link\|http://www.perldoc.com/perl5.8.4/pod/func/do.html\|do] it. I use this mostly for debugging. I've even seen the output be successfully given to non-programmers for debugging complex calculations. [link\|http://search.cpan.org/~ams/Storable-2.13/Storable.pm\|Storable]: Core Perl module (as of Perl 5.8). It turns Perl data structures into a (not human readable) string representing the data. It has convenience functions to write directly to/from the file for you. If I'm not debugging, this is what I use. If you will be writing the file on one machine and reading it on another, I strongly advise using nstore rather than having someone else later find out about the pitfalls of store. [link\|http://search.cpan.org/~ingy/YAML-0.35/YAML.pod\|YAML]: A module for converting data structures to/from a simple pretty-printed format. Converters for this format are easy to write and are available in a number of languages. The format makes for nice configuration files, and is also good if you need to cooperate between different languages. [link\|http://search.cpan.org/~nwclark/perl-5.8.5/lib/AnyDBM_File.pm\|AnyDBM_File]: Core Perl module. This is a prototypical example of the second approach that I mentioned. You can make a hash be "tied" to a dbm file which is kept on disk. This module actually is implemented by a variety of back-ends, the worst of which is in Perl's core (SDBM_File - the only one to limit the size of key/value pairs) and my favorite which is not, but is widely used (DB_File - that interfaces to Berkeley DB). I'd strongly advise using one of the back-end modules directly, else you can run into a situation where you install a module, AnyDBM_File automatically switches to that, and you don't know how to get your old data... [link\|http://search.cpan.org/~jhuckaby/DBM-Deep-0.94/Deep.pm\|DBM::Deep]: I mention this mainly to illustrate the limitations of standard dbm implementations. This is like DB_File etc, except that it allows you to build a complex data structure with references to hashes, arrays, etc. Traditional dbms turn references into strings that lose data when you ask for them back. This is a little..smarter. Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,111 by pwhysall 9/14/04 2:16:29 AM Reply	<fawn>Wonderful post</fawn> Sorry for the obsequiousness, but it is. Peter [link\|http://www.debian.org\|Shill For Hire] [link\|http://www.kuro5hin.org\|There is no K5 Cabal] [link\|http://guildenstern.dyndns.org\|Blog]
Post #174,116 by ben_tilly 9/14/04 2:28:23 AM Reply	<preen /> To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,206 by inthane-chan 9/14/04 12:51:16 PM Reply	I am so glad I gave up on tying my programming skills... ...to my ego. This would have killed it completely. :) Powered by the Hammer of the Gods
Post #174,208 by deSitter 9/14/04 12:58:01 PM Reply	Why? -drl
Post #174,245 by inthane-chan 9/14/04 3:34:03 PM Reply	Because there's no way I could match that answer.
Post #174,276 by ben_tilly 9/14/04 5:23:03 PM Reply	But that isn't programming That is knowing a specific environment and its tools very well, knowing what issues people tend to run into, and knowing how to present that information. The initial knowledge is helpful while programming, but is only one needed skill of many that a programmer has to put together at the same time. The skill at exposition makes me look good, but that's a documentation skill, not directly a programming skill. Furthermore that particular explanation is a variant on one that I've both seen given, and attempted to give, before. It is much easier to come up with a nice way to say something when already you've tried to say that thing a few times. Even if you don't clearly remember how you said it before, the experience guides you. Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,279 by ChrisR 9/14/04 5:30:39 PM Reply	Nice to see a teacher/craftsman in action. But having the masters around does present the problem that sometimes we don't venture guesses. It's probably a good thing, though, that I don't post attempts at Perl code. :-)
Post #174,290 by ben_tilly 9/14/04 5:41:16 PM Reply	A good thing for my spare time you mean? There is no better way to learn than exposing yourself to constructive criticism. It isn't always the most fun process, but it is effective. Just be careful about where you place your ego when you try it: [link\|http://www.perlmonks.org/index.pl?node_id=270083\|http://www.perlmonks...pl?node_id=270083] Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,324 by deSitter 9/14/04 9:03:38 PM Reply	Why? -drl
Post #174,354 by ChrisR 9/14/04 10:58:00 PM Reply	Jack of all languages... ...master of none. Remember back in early '92 being told that Perl was the way to go. Trying to get things done in a hurry, I stuck to writing a lot of scripts in sed. My career path might've been much different had I listened back then and took the time. Have played with Perl here and there, but never found the time to really get my head wrapped around it. To this day, Perl seems too much a conglomeration of disparate ideas rather than a cohesive whole. Perl is dangerous in that it can be easy to do things, but rather harder to master.
Post #174,209 by tuberculosis 9/14/04 12:58:31 PM 8/21/07 6:35:59 AM Reply	I figured it had - one more quickie question Thanks for that writeup. This is actually a mason page that tabulates a survey. The hash is userid=>url (the url points to an image). Does the error handling for the open still apply or is there a better technique? Seems to me telling my web page to die (this is like calling exit?) is a bad idea. The other problem I'm having is how to get a directory path to my page's local directory. Still trying to figure this out to build the path to the file in the first place (I have a crummy windows explorer view of part of the directory structure so I don't know what the real unix path is and the default path is some log directory not below document root). That was lovely cheese. --Wallace, The Wrong Trousers Edited by tuberculosis Aug. 21, 2007, 06:35:59 AM EDT Expand All History
Post #174,223 by ben_tilly 9/14/04 2:06:31 PM Reply	Error checking remains a best practice How to handle errors becomes a coding choice. At the least you want to notice them and log the information somewhere useful. In a web environment you can just print to STDERR and it goes into the webserver error logs. The warn operator adds the current line, and Carp's carp function will add a stack backtrace. None of this will affect the rest of the execution of the program. But the way that I like doing it is to make errors fatal with die. And then have in the server configuration a place where fatal errors are trapped with a signal handler for DIE. In development they are displayed on screen and in the log with a stack backtrace. In production the user gets a generic error message and the full error is in the logs. (And you work to keep error logs clean. Do not ignore them.) In production you do not want to show a stack backtrace - this debugging information is very helpful to attackers, allowing them to use standard techniques to convert minor coding errors into full-blown exploits. Setting this configuration up, of course, takes some wizardry. I'd bet that you can find a good starting example from a co-worker though. My attitude is that if errors are big and visible, then I'm more likely to catch them, and if I don't then testing does. The more you hide errors, the longer they last in the development process, and the longer they last, the more expensive they become to fix. (There is a lot of research backing up the point that errors caught later cost more. I don't know of any that specifically addressed whether having big visible crashes was a net gain or loss though - all I have is my anecdotal experience that this works pretty well.) As for the directory path issue, ugh. Getting that kind of environment issue sorted out is always a PITA for me. Luckily Mason makes the actual decision configurable. Unluckily you have to figure out how Mason is set up to configure it. In your position, I'd strongly be inclined towards getting fairly direct access on the Unix machine. Either by setting up VNC there, or setting up X on your machine, or just sshing and using a command-line environment. It isn't just your view of the path issue, it is also being able to run code out of the webserver environment, testing what modules are installed, etc, etc, etc. Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,297 by tuberculosis 9/14/04 6:19:32 PM 8/21/07 6:37:51 AM Reply	Agree More experimentation shows that Mason does something sane and decent when you die fatally. This is an intranet app - its got maybe a dozen users - all of whom are techies - so its not terribly crucial to keep it ultra robust. Interestingly, I've found that Jon Swartz, creator of mason, is in my group and just down the hall, so I've got a pretty good resource for mason questions. The rest is getting into the perl mindset. I can't believe nobody has implemented plists in perl though. That was lovely cheese. --Wallace, The Wrong Trousers Edited by tuberculosis Aug. 21, 2007, 06:37:51 AM EDT Expand All History
Post #174,298 by FuManChu 9/14/04 6:24:06 PM Reply	You lucky duck. Nobody is down the hall from me. No peers in the whole building. :(
Post #174,300 by ben_tilly 9/14/04 6:30:32 PM Reply	What do you mean by plists? There are a lot of things that are called plists in the world, I have no idea which one you're referring to. (I started with no idea, I hit Google, I still have no idea but now I know that, without context, I really shouldn't be able to make heads or tails of your comment.) Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska Supreme Court]
Post #174,315 by tuberculosis 9/14/04 8:05:29 PM 8/21/07 6:38:24 AM Reply	I meant NextStep style plists hashes or dictionaries are { key = value; } arrays or lists are ( one, two, three ) strings are quoted when necessary (contain spaces or punctuation) otherwise not. All three may be nested arbitrarily. Interestingly, I've found that here at big river books the code modules have plist format compatibility descriptors. This is a dictionary with two entries - one is strings, the second has a list of strings for a value. { "Todd's Key" = "This is cool"; NSRecentDocuments = ( "/Users/todd/Text File.txt", /Users/todd/Desktop/Documents/Readme.rtf, /tmp/file.txt, /tmp/file2.txt ); } That was lovely cheese. --Wallace, The Wrong Trousers Edited by tuberculosis Aug. 21, 2007, 06:38:24 AM EDT Expand All History
Post #174,332 by ben_tilly 9/14/04 9:18:19 PM Reply	Do you mean kind of like this? \n{\n "Todd's Key" => "This is cool",\n NSRecentDocuments => [\n "/Users/todd/Text File.txt",\n "/Users/todd/Desktop/Documents/Readme.rtf",\n "/tmp/file.txt",\n "/tmp/file2.txt",\n ],\n};\n That's valid Perl. Like the plists that you're looking for, you can nest arbitrarily to produce any data structure that you want. It shouldn't be particularly hard to write a parser that loads/dumps between plists and the above internal Perl data structure. The main problem that you might run into is in coming up with a good name to use when trying to release the module into the wild. (Because "plist" is such an ambiguous term.) Incidentally what is the relationship between that kind of plist and the XML-based one that [link\|http://search.cpan.org/~bdfoy/Mac-PropertyList-0.9/lib/PropertyList.pm\|http://search.cpan.o...b/PropertyList.pm] deals with? Cheers, Ben To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act - [[link\|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20\|Nebraska\|http://www.techworld...gePos=20\|Nebraska] Supreme Court]
Post #174,434 by tuberculosis 9/15/04 1:02:18 PM 8/21/07 6:40:25 AM Reply	Pretty close The plist format (which I would name NSPList for NextStep PList) I describe is now known by the J-heads at Apple as "Classic" or "old style" plists. They are trying to deprecate them. They are fools. There are still plenty of them around and thus the libraries to read and write them are still (and ever will be) supported. However - the J-headed XML fashion followers decided to "modernize" the plist and thus created a really lame and bloated XML-ization of them. So, while dictionaries used to be written: { key = value; key2 = value2; } Now we have to have something like \n<dict>\n <key><string>key</string></key><data><string>value</string></data>\n <key><string>key2</string></key><data><string>value2</string></data>\n</dict>\n See, lots more readable (NOT). The only advantage seems to be the addition of type tags - string, date, real, int, etc. Frankly I was happy with "everything is a string". Anyhow, I still use NS style plists all the time. I have parsers in Java, C++, and Smalltalk. You can write a recursive descent plist parser in about a page of code. Once you have one you never write IO code to save data again. That was lovely cheese. --Wallace, The Wrong Trousers Edited by tuberculosis Aug. 21, 2007, 06:40:25 AM EDT Expand All History
Post #174,440 by ben_tilly 9/15/04 1:28:44 PM Reply	Ah So for you the old-style plists are your One True Data Format for data interchange. In the Perl world there is no such thing - largely because a lot of Perl exists in the boundaries of converting between different people's home-grown data formats. You aren't going to figure out how to deal with those without doing I/O. However a million and one attempted universal formats have been produced. The ones that seem to have mindshare in the Perl world are XML, YAML, and native Perl code. Each of which has Perl tools that keep you from having to think about I/O. I don't think that another would get mindshare, but there is no harm in throwing it out there. Incidentally for writing recursive descent parsers, the gold standard in Perl seems to be [link\|http://search.cpan.org/~dconway/Parse-RecDescent-1.94/lib/Parse/RecDescent.pod\|Parse::RecDescent]. Unfortunately since it was added, Perl added the /g attribute to REs but the module has never been rewritten to use that. (Damian intended to, but never got around to it in Perl 5. His rewrite will be built into the language in Perl 6.) So its memory use tends to be quadratic in the size of the string, and performance on long strings is correspondingly bad. If you write a recursive descent parser in Perl, my suggestion is therefore either to use that, or to study the pos function and use REs with \\G and /gc liberally in your tokenizing. Oh, and never, ever use $`, $& or $'. (Use once and the RE engine slows down a lot, permanently.) Cheers, Ben About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. -- Edsgar W. Dijkstra
Post #174,445 by tuberculosis 9/15/04 1:49:44 PM 8/21/07 6:40:37 AM Reply	One True Data Format Lets just say I've got a lot of them around, plenty of code for dealing with them, and I can read/write/edit them by hand easily. That was lovely cheese. --Wallace, The Wrong Trousers Edited by tuberculosis Aug. 21, 2007, 06:40:37 AM EDT Expand All History
Post #174,446 by deSitter 9/15/04 1:53:32 PM Reply	Big River Books - rofl Sure that's not "Big Mean Lesbian Books"? Watch that right breast! -drl
Post #174,374 by Meerkat 9/15/04 12:10:08 AM Reply	ICLRPD (new thread) Created as new thread #174373 titled [link\|/forums/render/content/show?contentid=174373\|ICLRPD] Two out of three people wonder where the other one is.

Welcome to IWETHEY!