Re: Unicode and ASCII - Nitpick II
ASCII and UNICODE define a set of code points, a binary representation of a character. As it turns out, The ASCII code points are identical to the UNICODE code poitns for the characters represented by ASCII
UTF-8, UTF-16 (both versions*), UTF-32, UCS-2, etc. are all encoding schemes; that is mechanisms through which the code points can be represented. In ASCII, such things are not necessary because ASCII is defined to be fully representable in a singe byte. UNICODE is not, and so we have come up with all sorts of ways to represent the 97,000+ characters that UNICODE currently represents (and more coming RSN!). The encoding schemes listed above (along with UCS-4) are specifically for UNICODE. So talking about representing ASCII as UTF-8 is (pedantically) meaningless. You can represent the ASCII subset of UNICODE using UTF-8, however (its a "null translation"), but then you're really representing UNICODE.
* Both versions means big-endian and little-endian, but you already nkew that...
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT