Post #148,361
3/24/04 9:39:44 PM
|

*That* I would like to see. HTMLTidy, anyone? ;)
|
Post #148,366
3/24/04 9:50:04 PM
|

It's right here:
[link|http://z.iwethey.org/forums/SourceCode/src/htmlparse.py|http://z.iwethey.org.../src/htmlparse.py]
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,373
3/24/04 10:17:49 PM
|

Feature request
If I'm reading the code right, I'd like retURLFinderRaw to modify the RE to have one of the characters that you can end a URL with be "?". The reason being that it is kind of irritating to type in a URL at the end of a question - the ? is assumed to be part of the URL. (Besides, most ? marks in URLs are firmly embedded in the middle...)
Thanks, Ben
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|
Post #148,380
3/24/04 10:35:29 PM
|

Re: Feature request
Like this [link|http://z.iwethey.org/forums/render/content/show?contentid=148373|http://z.iwethey.org...?contentid=148373]?
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,381
3/24/04 10:36:41 PM
|

I wondered why I briefly got the Devil's Tower page...
You're too quick. :-)
Cheers, Scott.
|
Post #148,384
3/24/04 10:39:40 PM
|

Actually I just randomly take Zope down every so often
Just to look at that cool picture... ;-)
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,387
3/24/04 10:40:48 PM
|

Ha!
|
Post #148,400
3/24/04 11:18:06 PM
3/24/04 11:24:18 PM
|

Gracias
Out of curiousity, had I identified the right place to make the fix?
Thanks, Ben
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]

Edited by ben_tilly
March 24, 2004, 11:24:18 PM EST
|
Post #148,406
3/24/04 11:28:03 PM
|

Yep.
All it took was a \\? in the mix.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,388
3/24/04 10:40:53 PM
|

Looks straightforward.
Thanks for sharing! Dispatching single-pass parser, it looks like (unless I missed something on a cursory glance). The bit about constructing a huge descending-length regex was interesting. I had forgotten about the WeeCodes, since I never use 'em. ;) That's a good chunk of the handlers.
Reading parts of it makes me sad for being stuck on 1.5.2, however. I can see some significant speedups if one could use just newer libs*. Which of course a full rewrite would address. Plus the logic is already built; that's 90% of the battle.
Saludos,
FuMan
* _Perhaps_ with HTMLParser, but I wouldn't be stuck on that.
|
Post #148,392
3/24/04 10:48:22 PM
|

The 2.X code is much cleaner.
I've got older versions for 2.1 and 2.2 around here somewhere.
The code is a Python rewrite (plus additions) of Ben Tilly's perl HTML filter. The code is somewhat messy, since it was just done to prove a point to Ben. :-) Come to think of it, I've got a much nicer ObjectiveC version too.
The nice thing about it is the ability to pass in your own set of handlers.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,399
3/24/04 11:08:14 PM
|

Dig that funky extensibility groove. :)
I went more of the "plugin" route with Junct, that collab tool I mentioned. By that I basically mean you write a class instead of a function for new behavior. Example signature: class TableRule(RegexRule):\n def process(self, topic, content, UI): Point being I tried to make it easy to write a new module when you wanted new parsing functionality. Still just a callable. But then it was expected with heavyweight parsers (like Calendar, Progress tracking, etc) that the "plugin" module would also include the UI code for new pages. In addition, I had a need to do multiple passes on the content, and allow some handlers to "lock down" the portions of the content which they altered, so they couldn't then be munged by a later handler. I ended up making a LockableString class with finditer() and sub() methods (a la RE). Plus there were some handlers which a simple regex couldn't handle--Wiki-style formatting stuff. Hmm... that's been running stably for a month now here at work. I should probably put it up somewhere...
|
Post #148,404
3/24/04 11:27:34 PM
|

Why functions and regexes (new thread)
Created as new thread #148403 titled [link|/forums/render/content/show?contentid=148403|Why functions and regexes]
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #148,402
3/24/04 11:26:24 PM
|

And the point was proven...
"good ideas and bad code build communities, the other three combinations do not" - [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
|