Regular Expression Apocalypse out

Post #41,203 by ben_tilly 6/4/02 9:51:13 PM 6/4/02 9:52:18 PM Reply	Regular Expression Apocalypse out [link\|http://www.perl.com/pub/a/2002/06/04/apo5.html\|Apocalypse 5] in the design of Perl 6 is out. It is Larry Wall's complete rethinking of regular expressions and regular expression syntax. Considering that the last time he had bright ideas on regular expressions, the world + dog wound up advertising "Perl compatible regular expressions", it is worth a peek even if you aren't interested in Perl. Be warned though, it is a little long and a lot radical. But it is interesting, it is how Perl 6 will likely do things, and other language designers are at least likely to peek and think about it... Cheers, Ben "... I couldn't see how anyone could be educated by this self-propagating system in which people pass exams, teach others to pass exams, but nobody knows anything." --Richard Feynman Edited by ben_tilly June 4, 2002, 09:52:18 PM EDT Expand All History
Post #41,346 by ben_tilly 6/5/02 10:20:48 PM Reply	Brief summary Once upon a time someone came up with this handy thing called regular expressions. Over the years, people added more and more features, they developed minor dialects, and it was good. Then Larry Wall came along and added a lot more features. Ever since then other languages have picked up his features, and programmers have been trying to use regular expressions for things that they are just Not Able To Do. Enter Apocalypse 5. Larry has decided that the basic syntactic problem is that regular expressions have virtually no metacharacters available. So he is stealing a bunch. That allows him to make the old extensions available in a much more readable way. That left him with a bunch of metacharacters left over, so he has had to think about where he really wants to go with them. Well people use regular expressions for parsing. Which they can't do. OTOH what Perl doesn't have is any way to write a grammar natively. But grammars are used for parsing, and grammars are built out of regular expressions so... That is right. Perl's regular expression syntax is now going to be rich enough to be able to write arbitrary grammars. Ones that can be extended on the fly if you want. It has the kinds of hooks back to Perl that you would want for your basic parsing tasks. His "eat your own dogfood" approach to making sure that this works is that the Perl 6 grammar will be to write Perl 6's grammar as a Perl 6 regular expression. It will be interesting to watch how this goes for Perl 6, and I have to say that I think Perl is (again) setting a bar that other scripting languages will eventually follow. People need to parse structured text for a lot of reasons, and a built-in facility for handling grammars (sort of a YACC built into the language) is a good way to do it. Cheers, Ben "... I couldn't see how anyone could be educated by this self-propagating system in which people pass exams, teach others to pass exams, but nobody knows anything." --Richard Feynman
Post #41,578 by tuberculosis 6/7/02 3:39:55 PM Reply	I like it in principal I've been using ANTLR for parsing Java source code and some other things lately. Its cool but a bit weird. The cool bit is you can embed bits of java source code on the RHS of parse rules to do nifty things. But the rule syntax itself is freaky. So I get the idea behind this philosophy and I know it has power. It will be interesting to see what people do with it. Sadly, Java has just set their regex package in stone forever. The average hunter gatherer works 20 hours a week. The average farmer works 40 hours a week. The average programmer works 60 hours a week. What the hell are we thinking?
Post #42,764 by neelk 6/18/02 1:05:38 PM Reply	Re: Brief summary Actually, there are a lot of languages with support for sophisticated parsing built in -- Icon, Haskell, Ocaml, and Bigloo Scheme all come immediately to mind. I'm not wholly convinced that this will be a successful experiment, though. A quick look at the Apocalypse suggests that Perl 6 grammaras are going to implement a recursive-descent-with-backtracking parsing engine. This is pretty nice, but has the downside that writing rules with left-recursion can cause a loop and stack overflow. That's a real annoyance because eliminating all left-recursion can make a grammar much less natural to read. Haskell parser combinators and Ocaml's stream parsers are also LL(1), so this is obviously not a critical weakness. However, Bigloo's define-grammar form calculates an LALR(1) parser, and that makes writing grammars much easier. I also didn't see much about error handling and reporting. This is a shame, because the great strength of recursive-descent is that it makes error reporting relatively easy to do. In fact, it is nearly the only parsing technique for which I would rather write an error reporting routine than have rabid werewolves rip out and devour my brains. Perl6 could definitely use a little bit of extra semantics for rules that lets you chain and build error messages, so that you can report something like. Parse error at line 50, column 6 in rule tlf, in rule method, in rule expr, in rule phrase, in rule if. Found token "{", when expecting token "else". (This is an example from the parser of the language I'm writing.) It's not impossible to roll your own, but it would be nice if there were some preexisting machinery to hook into.
Post #42,817 by ben_tilly 6/18/02 6:52:39 PM Reply	I will pass that suggestion to Damian Conway He is in town tomorrow and is giving a public talk. Considering that Perl itself plans to use this for its own grammar, and considering that Perl needs to do error handling and reporting for its own needs, that topic deserves to be addressed. Cheers, Ben "... I couldn't see how anyone could be educated by this self-propagating system in which people pass exams, teach others to pass exams, but nobody knows anything." --Richard Feynman
Post #42,824 by neelk 6/18/02 7:43:58 PM Reply	Re: I will pass that suggestion to Damian Conway Cool. Here's why I think that some compiler support is helpful. In Ocaml, a parser is just a function defined with a slightly funny syntax, so I can use the normal exception-handling system to build errors, like so: exception Parse_error of string list let rec if' = try parser [<'Kwd "if"; test = expr; \t'Kwd "then"; cons = expr; \t'Kwd "else"; alt = expr; \t'Kwd "endif" >] -> If(test, cons, alt) with Parse_error msgs -> raise (Parse_error ("in rule if" :: msgs)) and ... (* code for parsing expressions here, etc. ) Most of the time (like in this example), this works fine. It has one down side, though. Namely, if you call a parser for a subexpression in tail position, you should be able to match the stream without growing the stack (this is exactly* why regular grammars only permit nonterminals in the rightmost position). However, the addition of the 'try...with' code ruins the tail recursion (because you need to push an exception handler onto the stack), and this means that you get linear stack growth. Changing the space complexity of an algorithm to add error reporting is not a win. :( Fixing this means dropping the use of exceptions, converting the parser to a continuation-passing style, and doing all of the error-handling work manually by passing around failure continuations everywhere. OR -- the compiler writer can do a little bit of hackery that exploits the fact that he knows this is a grammar. Everything looks and smells the same to the user, except that it's all magically efficient. :)
Post #42,828 by ben_tilly 6/18/02 8:03:55 PM Reply	Something I should point out I will be passing on the comment, but I have no idea what to answer if he starts asking difficult questions. Giving me your email address might be good. Or you can just drop a mail to [link\|mailto:damian@conway.org\|The Damian] yourself. (And I am my user name at operamail dot com if you want to drop yours privately.) BTW same random poke I just gave Barry. Are you likely to make Philadelphia on July 6? (Yeah, long ways from Boston. Still worth a shot...) Cheers, Ben "... I couldn't see how anyone could be educated by this self-propagating system in which people pass exams, teach others to pass exams, but nobody knows anything." --Richard Feynman
Post #42,863 by neelk 6/19/02 7:19:49 AM Reply	Contact info It looks like my post last night didn't work, so I'll try again. You can email me at <neelk@alum.mit.edu>. I don't know if I'll be able to come in July. I quit my job so I could spend some time hacking and preparing for a grad school application in the fall. As a result, I'm trying to cut back on my expenses to maximize my "hang time", so to speak. I'll check Amtrak, and post whether I can come to Philly on the Open Forum.
Post #42,968 by ben_tilly 6/19/02 11:56:18 PM Reply	There is pre-existing machinery You can use caller() to pull out information about what the currently executing rules are. Cheers, Ben "... I couldn't see how anyone could be educated by this self-propagating system in which people pass exams, teach others to pass exams, but nobody knows anything." --Richard Feynman

Welcome to IWETHEY!