It depends on the context.

Post #198,203 by Another Scott 3/12/05 12:23:39 PM Reply	It depends on the context. The meaning of "text" depends on the context. For example, if you're doing SMS stuff - "text messaging" using the "Short Message Service" - on a mobile phone, you're limited to whatever character encoding is supported by the system and the phone. It's may not be Unicode, but might be if you have a [link\|http://people.netscape.com/ftang/paper/SMS_and_Unicode.html\|GSM] phone. Saying everyone needs to understand Unicode and encoding and such is simplistic because it assumes every "Software Developer" is programming Windows or Web stuff that needs multilanguage support. The meaning depends on the context. It's my understanding that Unicode includes ASCII as a subset. A C program doesn't care about character sets; the compiler assumes ASCII. At least it did. Some discussion about extending gcc for 2-byte Unicode support is [link\|http://mail.nl.linux.org/linux-utf8/2000-08/msg00101.html\|here]. A text file in a computer context doesn't exist on its own. It's a set of bytes on some form of storage media. If the file is to present non-ASCII characters to a program or a person, then it has to have a way of representing non-ASCII characters to the program or person, so an encoding method must be indicated. But then I wouldn't call it "text" myself - without any qualification, I assume "text" means ASCII. That's my take, anyway. HTH. Cheers, Scott.
Post #198,260 by StevenYap 3/12/05 6:54:00 PM Reply	Unicode and ASCII It's my understanding that Unicode includes ASCII as a subset. \r\nNitpick - A particular variable length encoding of Unicode (UTF-8, using 1 to 6 8-bit bytes) is compatible with the 7 bit encoding of ASCII when only characters from the 7-bit ASCII encoding is used.
Post #200,487 by jb4 3/25/05 1:40:46 PM Reply	Re: Unicode and ASCII - Nitpick II ASCII and UNICODE define a set of code points, a binary representation of a character. As it turns out, The ASCII code points are identical to the UNICODE code poitns for the characters represented by ASCII UTF-8, UTF-16 (both versions), UTF-32, UCS-2, etc. are all encoding schemes; that is mechanisms through which the code points can be represented. In ASCII, such things are not necessary because ASCII is defined to be fully representable in a singe byte. UNICODE is not, and so we have come up with all sorts of ways to represent the 97,000+ characters that UNICODE currently represents (and more coming RSN!). The encoding schemes listed above (along with UCS-4) are specifically for UNICODE. So talking about representing ASCII as UTF-8 is (pedantically) meaningless. You can represent the ASCII subset of UNICODE using UTF-8, however (its a "null translation"), but then you're really representing UNICODE. Both versions means big-endian and little-endian, but you already nkew that... jb4 shrub\ufffdbish (Am., from shrub* + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf.* BULLSHIT

Welcome to IWETHEY!