IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Whoa, there.

Character encodings and font sets are very ugly in HTML, as the early versions where an english centric defacto-standard. XHTML cleans up most of these issues. Web browsers have to use some guesswork and follow some unwritten conventions for handeling these issues in HTML. The browser has to peek at the web page and try to figure out what the encoding is in many cases. In pracitce, in HTML, iso-8859-1 is the default if the browser can't find anything else.

\r\n\r\n

Actually, XHTML makes it more complicated. If you serve XHTML without sending the charset parameter in your Content-Type header, then the MIME-type of the document can determine the character encoding to be used for parsing. If you sent text/html or text/xml, a conforming parser must assume that the document is encoded in us-ascii, no matter what you've specified inside it; you have [link|http://www.ietf.org/rfc/rfc3023.txt|RFC 3023] and the legacy of text/* media types and transcoding proxies to thank for that.

\r\n\r\n

If you sent application/xhtml+xml or application/xml, then the receiving parser is allowed to read the XML prolog inside the file, but if that's not present then the character set must be assumed to be utf-8 or utf-16, depending on whether the document begins with a byte-order mark; you're not allowed to look at the meta tags for this information.

\r\n\r\n

This is one of the reasons why Mark Pilgrim claimed [link|http://www.xml.com/pub/a/2004/07/21/dive.html|XML on the web has failed], and it just barely represents the tip of the iceberg as far as character-encoding issues in HTML, XHTML and XML are concerned.

--\r\nYou cooin' with my bird?
\r\n[link|http://www.shtuff.us/|shtuff]
New Your right mostly
I said that XHTML cleans up the issue, not that it made it simpler. And you are right, XHTML blew their chance by failing to specify a good solution to the problem. However, XHTML at least has a manditory specification standard.

With HTML the browser really has to guess in many cases. The current method of reading the file till you find a content-type tag and then restarting the process of reading the file in the specified type is horribly ugly and depends on no non-ASCII characters being put at the top of the file.

Jay
     What the heck is text? - (systems) - (56)
         It depends on the context. - (Another Scott) - (2)
             Unicode and ASCII - (StevenYap) - (1)
                 Re: Unicode and ASCII - Nitpick II - (jb4)
         you are confusing text with display - (boxley) - (12)
             Uhhh..Not quite, Bill - (jb4) - (11)
                 And that is one thing that sucks about Unicode - (ben_tilly) - (9)
                     At least they're consistent - (jb4) - (8)
                         But it is a problem - (ben_tilly)
                         Except for that full width/half width ascii thing - (tuberculosis) - (5)
                             I dunno... - (jb4)
                             My personal take on it - (jake123) - (3)
                                 Perhaps, but it makes searching tricky - (tuberculosis) - (2)
                                     Well, if it was an easy problem - (jake123)
                                     ICLRPD (new thread) - (jb4)
                         Have you all seen the HUGE unicode poster? - (FuManChu)
                 close enough to debug a table entry :-) - (boxley)
         Text is not as simple as it seems - (ben_tilly)
         This is one thing that Java handles pretty well - (bluke)
         Rule #1 - Everything you think you know is wrong - (tuberculosis) - (29)
             Why xenophobic? - (drewk) - (28)
                 Because they didn't think... - (pwhysall)
                 Because if they had spent any time at all - (tuberculosis) - (25)
                     Now how about addressing my example - (drewk) - (17)
                         The best explanation that I've seen of why 2 digits... - (ben_tilly)
                         No, but they were xenophobic etc - (jake123) - (15)
                             xenophobic's probably the wrong word - (SpiceWare) - (14)
                                 Yeah, you're right - (jake123) - (13)
                                     How about "escessively humble"? - (drewk) - (4)
                                         Look, the point about the two digits for a year is well - (jake123) - (1)
                                             Disagree - (jb4)
                                         Maybe... - (tuberculosis) - (1)
                                             How about simply "provincial". - (a6l6e6x)
                                     The people who coded for teletypes and green terminals - (Arkadiy) - (7)
                                         Yes, a typographer - (jake123) - (3)
                                             Internationalization would not have been so easy - (ben_tilly)
                                             Text layout in 80 by 24 grid of monspaced font? - (Arkadiy) - (1)
                                                 Phone books back then - (jake123)
                                         Please don't use the letter "e" in your code. - (pwhysall) - (2)
                                             I certainly used to do without "e" - (Arkadiy)
                                             I couldn't use "e" either ... - (JimWeirich)
                     Oh, come ON already - (jb4) - (6)
                         The C++ standard i18n library is awful - (tuberculosis) - (5)
                             Dont know ICU - (jb4) - (4)
                                 ICLRPD (new thread) - (drewk)
                                 You can find it here - (tuberculosis) - (2)
                                     Time line? - (jb4) - (1)
                                         Released in 1988 - (tuberculosis)
                 Actually, Algol 68 was designed from the ground up - (Arkadiy)
         Re: What the heck is text? - (JayMehaffey) - (3)
             I must correct you - ASCII is a 7-bit encoding - (tuberculosis)
             Whoa, there. - (ubernostrum) - (1)
                 Your right mostly - (JayMehaffey)
         Using a pencil, it's unambiguous. -NT - (mmoffitt) - (3)
             You haven't seen my handwriting.... -NT - (Another Scott) - (2)
                 Uh-oh. I wouldn't confess that ;0) - (mmoffitt) - (1)
                     My father's handwriting was so bad... - (broomberg)

I'm the best there is at what I do. But what I do isn't very nice.
134 ms