IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New It depends on the context.
The meaning of "text" depends on the context. For example, if you're doing SMS stuff - "text messaging" using the "Short Message Service" - on a mobile phone, you're limited to whatever character encoding is supported by the system and the phone. It's may not be Unicode, but might be if you have a [link|http://people.netscape.com/ftang/paper/SMS_and_Unicode.html|GSM] phone.

Saying everyone needs to understand Unicode and encoding and such is simplistic because it assumes every "Software Developer" is programming Windows or Web stuff that needs multilanguage support.

The meaning depends on the context.

It's my understanding that Unicode includes ASCII as a subset.

A C program doesn't care about character sets; the compiler assumes ASCII. At least it did. Some discussion about extending gcc for 2-byte Unicode support is [link|http://mail.nl.linux.org/linux-utf8/2000-08/msg00101.html|here].

A text file in a computer context doesn't exist on its own. It's a set of bytes on some form of storage media. If the file is to present non-ASCII characters to a program or a person, then it has to have a way of representing non-ASCII characters to the program or person, so an encoding method must be indicated. But then I wouldn't call it "text" myself - without any qualification, I assume "text" means ASCII.

That's my take, anyway.

HTH.

Cheers,
Scott.
New Unicode and ASCII
It's my understanding that Unicode includes ASCII as a subset.
\r\nNitpick - A particular variable length encoding of Unicode (UTF-8, using 1 to 6 8-bit bytes) is compatible with the 7 bit encoding of ASCII when only characters from the 7-bit ASCII encoding is used.
New Re: Unicode and ASCII - Nitpick II
ASCII and UNICODE define a set of code points, a binary representation of a character. As it turns out, The ASCII code points are identical to the UNICODE code poitns for the characters represented by ASCII

UTF-8, UTF-16 (both versions*), UTF-32, UCS-2, etc. are all encoding schemes; that is mechanisms through which the code points can be represented. In ASCII, such things are not necessary because ASCII is defined to be fully representable in a singe byte. UNICODE is not, and so we have come up with all sorts of ways to represent the 97,000+ characters that UNICODE currently represents (and more coming RSN!). The encoding schemes listed above (along with UCS-4) are specifically for UNICODE. So talking about representing ASCII as UTF-8 is (pedantically) meaningless. You can represent the ASCII subset of UNICODE using UTF-8, however (its a "null translation"), but then you're really representing UNICODE.


* Both versions means big-endian and little-endian, but you already nkew that...
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT

     What the heck is text? - (systems) - (56)
         It depends on the context. - (Another Scott) - (2)
             Unicode and ASCII - (StevenYap) - (1)
                 Re: Unicode and ASCII - Nitpick II - (jb4)
         you are confusing text with display - (boxley) - (12)
             Uhhh..Not quite, Bill - (jb4) - (11)
                 And that is one thing that sucks about Unicode - (ben_tilly) - (9)
                     At least they're consistent - (jb4) - (8)
                         But it is a problem - (ben_tilly)
                         Except for that full width/half width ascii thing - (tuberculosis) - (5)
                             I dunno... - (jb4)
                             My personal take on it - (jake123) - (3)
                                 Perhaps, but it makes searching tricky - (tuberculosis) - (2)
                                     Well, if it was an easy problem - (jake123)
                                     ICLRPD (new thread) - (jb4)
                         Have you all seen the HUGE unicode poster? - (FuManChu)
                 close enough to debug a table entry :-) - (boxley)
         Text is not as simple as it seems - (ben_tilly)
         This is one thing that Java handles pretty well - (bluke)
         Rule #1 - Everything you think you know is wrong - (tuberculosis) - (29)
             Why xenophobic? - (drewk) - (28)
                 Because they didn't think... - (pwhysall)
                 Because if they had spent any time at all - (tuberculosis) - (25)
                     Now how about addressing my example - (drewk) - (17)
                         The best explanation that I've seen of why 2 digits... - (ben_tilly)
                         No, but they were xenophobic etc - (jake123) - (15)
                             xenophobic's probably the wrong word - (SpiceWare) - (14)
                                 Yeah, you're right - (jake123) - (13)
                                     How about "escessively humble"? - (drewk) - (4)
                                         Look, the point about the two digits for a year is well - (jake123) - (1)
                                             Disagree - (jb4)
                                         Maybe... - (tuberculosis) - (1)
                                             How about simply "provincial". - (a6l6e6x)
                                     The people who coded for teletypes and green terminals - (Arkadiy) - (7)
                                         Yes, a typographer - (jake123) - (3)
                                             Internationalization would not have been so easy - (ben_tilly)
                                             Text layout in 80 by 24 grid of monspaced font? - (Arkadiy) - (1)
                                                 Phone books back then - (jake123)
                                         Please don't use the letter "e" in your code. - (pwhysall) - (2)
                                             I certainly used to do without "e" - (Arkadiy)
                                             I couldn't use "e" either ... - (JimWeirich)
                     Oh, come ON already - (jb4) - (6)
                         The C++ standard i18n library is awful - (tuberculosis) - (5)
                             Dont know ICU - (jb4) - (4)
                                 ICLRPD (new thread) - (drewk)
                                 You can find it here - (tuberculosis) - (2)
                                     Time line? - (jb4) - (1)
                                         Released in 1988 - (tuberculosis)
                 Actually, Algol 68 was designed from the ground up - (Arkadiy)
         Re: What the heck is text? - (JayMehaffey) - (3)
             I must correct you - ASCII is a 7-bit encoding - (tuberculosis)
             Whoa, there. - (ubernostrum) - (1)
                 Your right mostly - (JayMehaffey)
         Using a pencil, it's unambiguous. -NT - (mmoffitt) - (3)
             You haven't seen my handwriting.... -NT - (Another Scott) - (2)
                 Uh-oh. I wouldn't confess that ;0) - (mmoffitt) - (1)
                     My father's handwriting was so bad... - (broomberg)

Stop thinking in all caps.
158 ms