I hate to think a PIII 800 is 5x faster than an Athlon 500, even if it sits on a 100 MHz FSB board. I grabbed the 0.9.6 source to see what difference it would make with an i686 compilation. Unfortunately, the default compile produces a binary that takes 440 s to render the manual and gobbles up 70-80 MB in the process, presumably because it is loaded with debug options and uses no optimization.
I'm now recompiling with all the debug options turned off and optimization set to -O2. See what that gives...