[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
>> *Any* program that handles Unicode data implements Unicode! That
>> includes Scheme compilers that support Unicode sources.
Ken Dickey wrote:
> Ok. Pick an example.
Why? Any process that claims to support Unicode must conform to the
> According to the docs, Gambit 3.0 supports Unicode.
> > (define great (string-ref "\x5927" 0)) ;; "(U+5927)"
> > great
> #\*** ERROR -- IO error on #<output-port (stdout)>
I have no idea whether that indicates conformance or not. Is "\x5927"
valid Gambit syntax for the Unicode codepoint U+5927? If not, then this
example is meaningless. Does the output port use a Unicode encoding by
default? If not, this example is meaningless.
>>> Who cares?
>> Anybody who wants to claim that his compiler supports Unicode. It's a
>> licensing issue. Unicode is a trademark, and you can't claim that you
>> "support" Unicode unless you actually conform to the standard.
> So does Gambit support Unicode or is the consortium going after
> somebody for non-compliance?
They might. I don't know what their enforcement policy is. I don't even
know for certain whether they have one (although that's usually how it
works when you trademark the name of the standard).
> While Gambit reads unicode files, I don't believe it does normalization.
I don't think normalization is required, but "reading Unicode files"
does demand that it recognize when graphemes are canonically identical.
> It does allow kanji identifiers
> ([kanji] 5) => 120
> Does Gambit comform?
That isn't nearly enough information to judge. And I don't know what
point you're trying to make here, but you're being extremely rude about
it. C'mon, you just asked a completely ridiculous question. You can't
judge conformance to a large standard from a small example like this,
unless the example demonstrates obvious *non*conformance. Why are you
being so antagonistic?
>> Normalization is not difficult or expensive in a batch program like a
> Huh? There are plenty of small Scheme interpreters out there. The
> binary for TinyScheme is ~100KB.
Interpreters *are* compilers. They just target a software VM instead of
a hardware machine. See EOPL.
> There are plenty of interactive compilers out there.
"Batch" was a bad choice of words, perhaps. Anyway, processing Unicode
isn't any more difficult or expensive in an interactive process.
>> In particular, if you're carrying around the data for "Is this a
>> letter or a number?" it's trivial to also provide the canonical
>> compositions and decompositions. I don't know where you got the idea
>> that it's expensive.
> I think it is the "if you're carrying around the data for" part that I
> am worried about. Blocks are one thing, but I see that the UniHan.txt
> file is 25 MB and I am worried that large tables could double or
> triple the size of a small Scheme implementation.
On many systems, the Scheme implementation doesn't need to carry the
data around. It's part of the operating system interface. If it isn't,
and that's a problem, then *don't implement Unicode.* But don't make a
half-assed implementation and claim that you "support" it.
Look, if a terminal claimed to support ANSI X3.64, but it didn't honor
the clear-screen function, you'd call it a crappy, non-conforming
implementation, wouldn't you? It's exactly the same with a compiler that
claims to support Unicode but doesn't recognize when two encodings are
canonically equivalent. I don't know whether you're upset that I
poo-pooed your idea or what, but you're being unreasonable and rude.
Bradd W. Szonye