Post #6,787
8/27/01 11:14:52 AM
|
HTML 2 PDF?
(This if for some background research on a project.)
There are a number of HTML to PDF converters out there. Some go directly. Others take HTML to LaTeX, and then you turn that to PDF. And so on.
However nothing I found will handle HTML 4.
Does anyone know of one that does? (I was hoping that there might be one based on the Gecko engine...) If not, then can anyone recommend tools for producing both output formats?
Thanks, Ben
|
Post #6,809
8/27/01 12:46:18 PM
|
Obvious answer: Ghostscript?
That's her, officer! That's the woman that programmed me for evil!
|
Post #6,820
8/27/01 1:10:23 PM
|
Ghostscript does HTML? GSView v4 doesn't seem to.
|
Post #6,827
8/27/01 2:33:32 PM
|
Re: Ghostscript does HTML? GSView v4 doesn't seem to.
IIRC there's a way to make GhostScript print to a PDF file. If you can do this, anything you can print can be turned into a PDF file.
More details, I don't have right now. Stay tuned.
-- Peter Shill For Hire
|
Post #6,828
8/27/01 2:42:28 PM
|
Update: Seems possible
[link|http://www.cs.wisc.edu/csl/doc/howto/pdf_generation/|http://www.cs.wisc...._generation/]
"man ps2pdf" should be of some interest too.
On my system, ps2pdf is part of the GhostScript package.
-- Peter Shill For Hire
|
Post #6,829
8/27/01 2:50:18 PM
|
Workee!
The command : "ps2pdf -dPSLevel1 printertest.ps" produced a PDF file that the Acrobat Reader could read.
-- Peter Shill For Hire
|
Post #6,830
8/27/01 2:55:47 PM
8/27/01 3:00:04 PM
|
Yeah. Ghostscript is a wonderful EPS or PS to PDF tool.
But it doesn't take HTML as input. At least nothing I've seen says it does. GSView is a nice front-end to ghostscript which takes the learning curve out of it.
File -> Convert... -> (Device: pdfwrite, Resolution: 600 - OK) -> filename.pdf - Save.
Addendum - I see your point now.
You can setup your web browser to print to a PS file and have Ghostscript massage it and convert it to PDF. But then you're relying on your web browser's interpretation of how the page should look. I've had experiences with printing web pages which were far from satisfactory (e.g. Nav/Mac shrinking text to microscopic sizes). Having a tool which wasn't browser based would probably be more reliable.
Cheers, Scott.
|
Post #6,836
8/27/01 3:12:41 PM
|
Re: Yeah. Ghostscript is a wonderful EPS or PS to PDF tool.
Web browsers suck at printing. It's a rule of the universe. Doesn't matter what platform, or OS, or browser.
I will investigate other HTML-capable apps and see if they suck a bit less.
Stay tuned :)
-- Peter Shill For Hire
|
Post #6,840
8/27/01 3:28:39 PM
|
Gah. Everything sucks.
Test document is the main page of ZIWETHEY, saved from Mozilla 0.9.3.1.
AbiWord - doesn't load HTML. KWord - loads HTML, throws away everything but the text. BlueFish - uses an external browser for previewing/printing.
Ideas?
-- Peter Shill For Hire
|
Post #6,842
8/27/01 3:53:31 PM
|
Gee. I'd think a Perl guru like Ben should be able to whip
something together pretty easily...
:-)
I've done a little browsing around with Google.
[link|http://xml.apache.org/fop/index.html|FOP] is a Java XSL to PDF tool. Since HTML can be viewed as a subset of HTML, perhaps something like this could be useful. Also on the Apache site above are Perl tools for XML, but nothing seems to be related to PDF output (that I could find).
Adobe has tools for PDF to HTML, but not the other way around (that I've found) and they're limited to Mac and Win.
[link|http://www.pdfzone.com/products/software/tool_activepdfwebgrabber.html|activePDFWinGrabber] is a Win tool which does HTML to PDF.
[link|http://www.pdfzone.com/products/software/tool_html2ps.html|html2ps] is a Perl HTML to PS script. It claims to support much of the HTML4 spec and "incidentally, the PostScript and PDF versions of the HTML 4.0 draft, were generated using html2ps" and "When converting the PostScript document to PDF - using some other program such as version 5.0 or later of Aladdin Ghostscript, or Adobe Acrobat Distiller - the original hyperlinks in the HTML documents will be retained in the PDF document.".
[link|http://www.pdfzone.com/products/software/tool_HTMLDOC.html|HTMLDOC] has similar claims about some HTML 4 support.
Looks like the last 2 links above are worth investigating.
HTH.
Cheers, Scott.
|
Post #6,821
8/27/01 1:14:23 PM
|
Didn't see HTML support in it
Particularly not HTML 4.
Cheers, Ben
|
Post #6,959
8/28/01 11:19:12 AM
|
As discussed above, print from browser (or any app) to PDF.
That's how we do it successfully every day. I don't buy Acrobat anymore.
That's her, officer! That's the woman that programmed me for evil!
|
Post #6,909
8/27/01 9:56:00 PM
|
html2ps, ps2pdf
I was able to render, for example, this thread, using the combination of these tools under Debian/Woody.
I'm not sure where to find an HTML 4 page, could you point me to something that you've tried which doesn't work?
Cheers.
-- Karsten M. Self [link|mailto:kmself@ix.netcom.com|kmself@ix.netcom.com]
What part of "gestalt" don't you understand?
|
Post #6,933
8/28/01 7:34:43 AM
|
Re: html2ps, ps2pdf
Err, this one is, excepting a small mis-nested tag in the lerpadism.
[link|http://validator.w3.org/check?uri=http%3A%2F%2Fz.iwethey.org%2Fforums%2Frender%2Fforum%2Fshow%3Fforumid%3D27&doctype=Inline|It almost validates :-)]
-- Peter Shill For Hire
|