This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Thomas Bushnell BSG writes: > This is exactly part of the reason why char=codepoint is such a lose. > Most code doesn't *want* to see this kind of garbage; it's an encoding > issue. I want chars where the *computer* takes care of the coding. I > want chars that are fully-understood characters, not little pieces of > a character. Surrogates are a side-effect of UTF-16. Period. Application-level code just doesn't see them. This entire discussion about whether or not a CHAR should include surrogate code points is, IMHO, a waste of everyones talents here. It's much ado about nothing. The only time you should see a surrogate value is if the input text is malformed. Otherwise the lower-level transcoders should have converted to the appropriate astral plan codepoint. If the text is malformed, big deal. It is not difficult to handle this case. FWIW, I've been working in Unicode since before UTF-16 was developed. Most of my work is in Asian languages, where I would expect to see characters outside the BMP. The reality is that they are just not that commmon. You don't see them. The only time I do see them is once in a while when dealing with texts from Hong Kong that are encoded in UTF-16. But the transcoding layers makes these go away, and I just have the full codepoint. If you are a developer and you lose sleep over surrogates, I envy you. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"