Whoa, there.

IWETHEY v. 0.3.0 | TODO

1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User

Welcome to IWETHEY!

Post #198,539

3/14/05 1:31:43 PM

Whoa, there.

Character encodings and font sets are very ugly in HTML, as the early versions where an english centric defacto-standard. XHTML cleans up most of these issues. Web browsers have to use some guesswork and follow some unwritten conventions for handeling these issues in HTML. The browser has to peek at the web page and try to figure out what the encoding is in many cases. In pracitce, in HTML, iso-8859-1 is the default if the browser can't find anything else.

\r\n\r\n

Actually, XHTML makes it more complicated. If you serve XHTML without sending the charset parameter in your Content-Type header, then the MIME-type of the document can determine the character encoding to be used for parsing. If you sent text/html or text/xml, a conforming parser must assume that the document is encoded in us-ascii, no matter what you've specified inside it; you have [link|http://www.ietf.org/rfc/rfc3023.txt|RFC 3023] and the legacy of text/* media types and transcoding proxies to thank for that.

\r\n\r\n

If you sent application/xhtml+xml or application/xml, then the receiving parser is allowed to read the XML prolog inside the file, but if that's not present then the character set must be assumed to be utf-8 or utf-16, depending on whether the document begins with a byte-order mark; you're not allowed to look at the meta tags for this information.

\r\n\r\n

This is one of the reasons why Mark Pilgrim claimed [link|http://www.xml.com/pub/a/2004/07/21/dive.html|XML on the web has failed], and it just barely represents the tip of the iceberg as far as character-encoding issues in HTML, XHTML and XML are concerned.

--\r\nYou cooin' with my bird?
\r\n[link|http://www.shtuff.us/|shtuff]

Post #198,566

3/14/05 2:36:05 PM

Your right mostly

I said that XHTML cleans up the issue, not that it made it simpler. And you are right, XHTML blew their chance by failing to specify a good solution to the problem. However, XHTML at least has a manditory specification standard.

With HTML the browser really has to guess in many cases. The current method of reading the file till you find a content-type tag and then restarting the process of reading the file in the specified type is horribly ugly and depends on no non-ASCII characters being put at the top of the file.

Jay

What the heck is text? - (systems) - (56) - March 12, 2005, 11:48:38 AM EST

It depends on the context. - (Another Scott) - (2) - March 12, 2005, 12:23:39 PM EST

Unicode and ASCII - (StevenYap) - (1) - March 12, 2005, 06:54:00 PM EST

Re: Unicode and ASCII - Nitpick II - (jb4) - March 25, 2005, 01:40:46 PM EST

you are confusing text with display - (boxley) - (12) - March 12, 2005, 06:53:13 PM EST

Uhhh..Not quite, Bill - (jb4) - (11) - March 25, 2005, 01:50:46 PM EST

And that is one thing that sucks about Unicode - (ben_tilly) - (9) - March 25, 2005, 03:09:44 PM EST

At least they're consistent - (jb4) - (8) - March 25, 2005, 06:02:42 PM EST

But it is a problem - (ben_tilly) - March 25, 2005, 06:13:17 PM EST

Except for that full width/half width ascii thing - (tuberculosis) - (5) - March 27, 2005, 09:40:24 PM EST

I dunno... - (jb4) - March 28, 2005, 12:11:15 PM EST

My personal take on it - (jake123) - (3) - March 28, 2005, 12:15:20 PM EST

Perhaps, but it makes searching tricky - (tuberculosis) - (2) - March 28, 2005, 02:49:36 PM EST

Well, if it was an easy problem - (jake123) - March 28, 2005, 03:29:43 PM EST

ICLRPD (new thread) - (jb4) - March 29, 2005, 07:57:14 PM EST

Have you all seen the HUGE unicode poster? - (FuManChu) - March 28, 2005, 04:33:12 PM EST

close enough to debug a table entry :-) - (boxley) - March 26, 2005, 09:26:51 AM EST

Text is not as simple as it seems - (ben_tilly) - March 12, 2005, 11:00:34 PM EST

This is one thing that Java handles pretty well - (bluke) - March 13, 2005, 11:35:09 AM EST

Rule #1 - Everything you think you know is wrong - (tuberculosis) - (29) - Aug. 21, 2007, 06:28:08 AM EDT

Why xenophobic? - (drewk) - (28) - March 14, 2005, 10:44:16 AM EST

Because they didn't think... - (pwhysall) - March 14, 2005, 11:24:31 AM EST

Because if they had spent any time at all - (tuberculosis) - (25) - Aug. 21, 2007, 06:30:01 AM EDT

Now how about addressing my example - (drewk) - (17) - March 14, 2005, 12:13:47 PM EST

The best explanation that I've seen of why 2 digits... - (ben_tilly) - March 14, 2005, 12:46:32 PM EST

No, but they were xenophobic etc - (jake123) - (15) - March 14, 2005, 02:50:01 PM EST

xenophobic's probably the wrong word - (SpiceWare) - (14) - March 14, 2005, 04:12:18 PM EST

Yeah, you're right - (jake123) - (13) - March 14, 2005, 05:02:32 PM EST

How about "escessively humble"? - (drewk) - (4) - March 14, 2005, 05:35:25 PM EST

Look, the point about the two digits for a year is well - (jake123) - (1) - March 14, 2005, 06:09:09 PM EST

Disagree - (jb4) - March 25, 2005, 02:15:41 PM EST

Maybe... - (tuberculosis) - (1) - Aug. 21, 2007, 06:31:43 AM EDT

How about simply "provincial". - (a6l6e6x) - March 14, 2005, 08:02:42 PM EST

The people who coded for teletypes and green terminals - (Arkadiy) - (7) - March 14, 2005, 05:37:55 PM EST

Yes, a typographer - (jake123) - (3) - March 14, 2005, 06:10:49 PM EST

Internationalization would not have been so easy - (ben_tilly) - March 14, 2005, 07:23:05 PM EST

Text layout in 80 by 24 grid of monspaced font? - (Arkadiy) - (1) - March 14, 2005, 07:59:49 PM EST

Phone books back then - (jake123) - March 15, 2005, 11:18:10 AM EST

Please don't use the letter "e" in your code. - (pwhysall) - (2) - March 14, 2005, 06:20:45 PM EST

I certainly used to do without "e" - (Arkadiy) - March 14, 2005, 08:01:39 PM EST

I couldn't use "e" either ... - (JimWeirich) - March 15, 2005, 05:52:09 PM EST

Oh, come ON already - (jb4) - (6) - March 25, 2005, 02:20:51 PM EST

The C++ standard i18n library is awful - (tuberculosis) - (5) - March 27, 2005, 09:56:44 PM EST

Dont know ICU - (jb4) - (4) - March 28, 2005, 12:13:20 PM EST

ICLRPD (new thread) - (drewk) - March 28, 2005, 12:33:42 PM EST

You can find it here - (tuberculosis) - (2) - March 28, 2005, 02:44:08 PM EST

Time line? - (jb4) - (1) - March 29, 2005, 07:59:43 PM EST

Released in 1988 - (tuberculosis) - March 29, 2005, 08:07:25 PM EST

Actually, Algol 68 was designed from the ground up - (Arkadiy) - March 14, 2005, 12:16:18 PM EST

Re: What the heck is text? - (JayMehaffey) - (3) - March 14, 2005, 10:51:50 AM EST

I must correct you - ASCII is a 7-bit encoding - (tuberculosis) - Aug. 21, 2007, 06:30:04 AM EDT

Whoa, there. - (ubernostrum) - (1) - March 14, 2005, 01:31:43 PM EST

Your right mostly - (JayMehaffey) - March 14, 2005, 02:36:05 PM EST

Using a pencil, it's unambiguous. -NT - (mmoffitt) - (3) - March 28, 2005, 01:14:03 PM EST

You haven't seen my handwriting.... -NT - (Another Scott) - (2) - March 28, 2005, 02:04:44 PM EST

Uh-oh. I wouldn't confess that ;0) - (mmoffitt) - (1) - March 28, 2005, 02:56:59 PM EST

My father's handwriting was so bad... - (broomberg) - March 28, 2005, 09:21:00 PM EST

I'm the best there is at what I do. But what I do isn't very nice.
134 ms