IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Uhhh..Not quite, Bill
First, [link|http://z.iwethey.org/forums/render/content/show?contentid=200487|see this post].

UNICODE has nothing to do with rendering; indeed in Arabic (for example) where there are positional forms, there are up to 4 glyphs (renderings) for a given code point. UNICODE does not in any way define the renderings, it simply defines a code point for the character. How that character is rendered is a function of a rendering engine (like Adobe) that knows about how a code point is supposed to be rendered.
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT

New And that is one thing that sucks about Unicode
It is true that Unicode has nothing to do with rendering.

For instance there are different characters in different Asian languages that have been mapped to the same codepoint in Unicode. Which mean that rendering engines have to play bad games about guessing what language they are currently working in to correctly render them on screen! (For instance the same character can have a Han Chinese, Traditional Chinese, Taiwanese, Japanese and Korean variant.) You know, the same kind of bad games that Unicode supposedly protects us from. :-(

Further complicating things is the fact that the same character may have multiple Unicode sequences that produce it, for instance codepoints 69 (Latin letter "i"), 2139 (information source), 2148 (imaginary unit), and 2170 (Roman numeral i) are all likely to be written the same way.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New At least they're consistent
UNICODE doesn't consider the glyph, they only consider the underlying character. That the character can be represented by several glyphs, or that the same glyph can be used to render several different characters is not important to them.

Nor, I suspect, is it important to any other encoding scheme.
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT

New But it is a problem
It means that you cannot conveniently have both Japanese and Chinese text in the same document, even though you're dealing with an encoding that is supposed to solve internationalization problems.

The "multiple code points" problem also creates complexity, and that complexity can lead to security holes. See [link|http://www.schneier.com/crypto-gram-0007.html#9|http://www.schneier....-gram-0007.html#9] for the kinds of security problems that could happen and [link|http://www.schneier.com/blog/archives/2005/02/unicode_url_hac_1.html|http://www.schneier....de_url_hac_1.html] for a concrete example of it being exploited in practice.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Except for that full width/half width ascii thing
See, they're not even consistent - full width ascii is there specifically to support typography.



"Whenever you find you are on the side of the majority, it is time to pause and reflect"   --Mark Twain

"The significant problems we face cannot be solved at the same level of thinking we were at when we created them."   --Albert Einstein

"This is still a dangerous world. It's a world of madmen and uncertainty and potential mental losses."   --George W. Bush
New I dunno...
My understanding of "full-width ASCII" is to support roman-ji in Japanese, where such niceties as proportional spacing are just now appearing in the public marketplace. These code points (U+FF00 - U+FF60) are supported primarily so that fonts that contain both the "standard" ASCII and the "full-width ASCII" (e.g. Monotype Andale, Arial Unicode, Mincho, etc.) can differentiat which glyph to use.

In looking up the code point range for the full-width ASCII, I also discovered that there really are code points for the various arabic presentation forms, as well as Latin presentation forms and others. So, within UNICODE, you can explicitly define the correct glyph for presentation, even if you have a font engine capable of "fixing it up" for you. Course that does complicate UNICODE even more, as someone who is "using UNICODE" cannot be known to be using the presentation forms, so the rendering engine must be capable of passing through the presentation forms, and "massaging" the un-presentation forms when necessary. Sheesh!

That's why I love IWETHEY...you learn something new even when you don't expect to....
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT

New My personal take on it
is that supporting typography IS the job of schema like this. For programming, 7 bit ascii is fine, and it is historically justifiable that it should be based on the English language... but for the rest of it, typography of the languages in question is the raison d'être of text encoding schemes, and when people attempt to come up with standards for such, typographers should be the people consulted as there is up to several centuries worth of experience for any given language to draw upon for what is needed underneath to support rendering a language completely.
--\n-------------------------------------------------------------------\n* Jack Troughton                            jake at consultron.ca *\n* [link|http://consultron.ca|http://consultron.ca]                   [link|irc://irc.ecomstation.ca|irc://irc.ecomstation.ca] *\n* Kingston Ontario Canada               [link|news://news.consultron.ca|news://news.consultron.ca] *\n-------------------------------------------------------------------
New Perhaps, but it makes searching tricky
Finding a phone number in a database used to be annoying as you had to make sure that you normalized the full width ascii to regular ascii before storing and searching. Oracle didn't find a full width represented phone number when it was stored as regular ascii. Made web form programming kind of fiddly.

I think this is better now, but it points up some inconsistencies in the unicode standard. Clearly, "characters, no glyphs" is bogus.




"Whenever you find you are on the side of the majority, it is time to pause and reflect"   --Mark Twain

"The significant problems we face cannot be solved at the same level of thinking we were at when we created them."   --Albert Einstein

"This is still a dangerous world. It's a world of madmen and uncertainty and potential mental losses."   --George W. Bush
New Well, if it was an easy problem
there would've been a solution found a long time ago:)

Your point about searching is very well taken. A badly designed standard could make that nightmarish.
--\n-------------------------------------------------------------------\n* Jack Troughton                            jake at consultron.ca *\n* [link|http://consultron.ca|http://consultron.ca]                   [link|irc://irc.ecomstation.ca|irc://irc.ecomstation.ca] *\n* Kingston Ontario Canada               [link|news://news.consultron.ca|news://news.consultron.ca] *\n-------------------------------------------------------------------
New ICLRPD (new thread)
Created as new thread #201145 titled [link|/forums/render/content/show?contentid=201145|ICLRPD]
jb4
shrub\ufffdbish (Am., from shrub + rubbish, after the derisive name for America's 43 president; 2003) n. 1. a form of nonsensical political doubletalk wherein the speaker attempts to defend the indefensible by lying, obfuscation, or otherwise misstating the facts; GIBBERISH. 2. any of a collection of utterances from America's putative 43rd president. cf. BULLSHIT

New Have you all seen the HUGE unicode poster?
[link|http://www.ianalbert.com/misc/unichart.php|http://www.ianalbert...misc/unichart.php]
New close enough to debug a table entry :-)
yer right the rendering engine decides how to present the unicode.
thanx,
bill
All tribal myths are true, for a given value of "true" Terry Pratchett
[link|http://boxleys.blogspot.com/|http://boxleys.blogspot.com/]

Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 48 years. meep
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]
     What the heck is text? - (systems) - (56)
         It depends on the context. - (Another Scott) - (2)
             Unicode and ASCII - (StevenYap) - (1)
                 Re: Unicode and ASCII - Nitpick II - (jb4)
         you are confusing text with display - (boxley) - (12)
             Uhhh..Not quite, Bill - (jb4) - (11)
                 And that is one thing that sucks about Unicode - (ben_tilly) - (9)
                     At least they're consistent - (jb4) - (8)
                         But it is a problem - (ben_tilly)
                         Except for that full width/half width ascii thing - (tuberculosis) - (5)
                             I dunno... - (jb4)
                             My personal take on it - (jake123) - (3)
                                 Perhaps, but it makes searching tricky - (tuberculosis) - (2)
                                     Well, if it was an easy problem - (jake123)
                                     ICLRPD (new thread) - (jb4)
                         Have you all seen the HUGE unicode poster? - (FuManChu)
                 close enough to debug a table entry :-) - (boxley)
         Text is not as simple as it seems - (ben_tilly)
         This is one thing that Java handles pretty well - (bluke)
         Rule #1 - Everything you think you know is wrong - (tuberculosis) - (29)
             Why xenophobic? - (drewk) - (28)
                 Because they didn't think... - (pwhysall)
                 Because if they had spent any time at all - (tuberculosis) - (25)
                     Now how about addressing my example - (drewk) - (17)
                         The best explanation that I've seen of why 2 digits... - (ben_tilly)
                         No, but they were xenophobic etc - (jake123) - (15)
                             xenophobic's probably the wrong word - (SpiceWare) - (14)
                                 Yeah, you're right - (jake123) - (13)
                                     How about "escessively humble"? - (drewk) - (4)
                                         Look, the point about the two digits for a year is well - (jake123) - (1)
                                             Disagree - (jb4)
                                         Maybe... - (tuberculosis) - (1)
                                             How about simply "provincial". - (a6l6e6x)
                                     The people who coded for teletypes and green terminals - (Arkadiy) - (7)
                                         Yes, a typographer - (jake123) - (3)
                                             Internationalization would not have been so easy - (ben_tilly)
                                             Text layout in 80 by 24 grid of monspaced font? - (Arkadiy) - (1)
                                                 Phone books back then - (jake123)
                                         Please don't use the letter "e" in your code. - (pwhysall) - (2)
                                             I certainly used to do without "e" - (Arkadiy)
                                             I couldn't use "e" either ... - (JimWeirich)
                     Oh, come ON already - (jb4) - (6)
                         The C++ standard i18n library is awful - (tuberculosis) - (5)
                             Dont know ICU - (jb4) - (4)
                                 ICLRPD (new thread) - (drewk)
                                 You can find it here - (tuberculosis) - (2)
                                     Time line? - (jb4) - (1)
                                         Released in 1988 - (tuberculosis)
                 Actually, Algol 68 was designed from the ground up - (Arkadiy)
         Re: What the heck is text? - (JayMehaffey) - (3)
             I must correct you - ASCII is a 7-bit encoding - (tuberculosis)
             Whoa, there. - (ubernostrum) - (1)
                 Your right mostly - (JayMehaffey)
         Using a pencil, it's unambiguous. -NT - (mmoffitt) - (3)
             You haven't seen my handwriting.... -NT - (Another Scott) - (2)
                 Uh-oh. I wouldn't confess that ;0) - (mmoffitt) - (1)
                     My father's handwriting was so bad... - (broomberg)

"Your server tonight will be: Jim." DAMMIT!
130 ms