[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
On Friday 13 February 2004 11:07 pm, Bradd W. Szonye wrote:
>> Ken Dickey wrote:
> > Scheme does not IMPLEMENT Unicode.
> Bradd wrote:
> *Any* program that handles Unicode data implements Unicode! That
> includes Scheme compilers that support Unicode sources.
Ok. Pick an example. According to the docs, Gambit 3.0 supports Unicode.
> (define great (string-ref "\x5927" 0)) ;; "(U+5927)"
#\*** ERROR -- IO error on #<output-port (stdout)>
> >> In other words, recognizing canonically-equivalent characters *is*
> >> the responsibility of the reader, if it claims to implement the
> >> Unicode character set.
> > Who cares?
> Anybody who wants to claim that his compiler supports Unicode. It's a
> licensing issue. Unicode is a trademark, and you can't claim that you
> "support" Unicode unless you actually conform to the standard.
So does Gambit support Unicode or is the consortium going after somebody for
While Gambit reads unicode files, I don't believe it does normalization.
It does allow kanji identifiers
(ã? -bã?? 5) => 120
Does Gambit comform?
> > It is desirable that a Scheme with support for extended identifiers
> > should not be large or expensive to implement.
> Normalization is not difficult or expensive in a batch program like a
Huh? There are plenty of small Scheme interpreters out there. The binary for
TinyScheme is ~100KB.
There are plenty of interactive compilers out there. I almost never use a
Scheme compiler in a batch mode unless I am (re)building a runtime system.
[Bad choice of words?]
In particular, if you're carrying around the data for "Is this
> a letter or a number?" it's trivial to also provide the canonical
> compositions and decompositions. I don't know where you got the idea
> that it's expensive.
I think it is the "if you're carrying around the data for" part that I am
worried about. Blocks are one thing, but I see that the UniHan.txt file is
25 MB and I am worried that large tables could double or triple the size of a
small Scheme implementation.
> I suspect that you simply don't understand what a "Unicode
> implementation" is.
You are probably right.
I am currently hacking up some code (part time) to do this. I should
understand after I have written the code.
So far I am only up to processing CaseFold.txt and generating things like:
(case-fold (integer->uni-char #x00DF)) ;=> (#\U+0073 #\U+0073)
(uni-char=? #\A (integer->uni-char (char->integer #\A))) ;=> #t
Hey. I am a slow learner. I learn by doing.