[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encodings.

This page is part of the web mail archives of SRFI 52 from before July 7th, 2015. The new archives for SRFI 52 contain all messages, not just those from before July 7th, 2015.



On Friday 13 February 2004 11:07 pm, Bradd W. Szonye wrote:
>> Ken Dickey wrote:
> > Scheme does not IMPLEMENT Unicode.

> Bradd wrote:
> *Any* program that handles Unicode data implements Unicode! That
> includes Scheme compilers that support Unicode sources.

Ok.  Pick an example.  According to the docs, Gambit 3.0 supports Unicode.  

But..

> (define great (string-ref "\x5927" 0)) ;; "(U+5927)"
> great
#\*** ERROR -- IO error on #<output-port (stdout)>

> >> In other words, recognizing canonically-equivalent characters *is*
> >> the responsibility of the reader, if it claims to implement the
> >> Unicode character set.
..
> > Who cares?
>
> Anybody who wants to claim that his compiler supports Unicode. It's a
> licensing issue. Unicode is a trademark, and you can't claim that you
> "support" Unicode unless you actually conform to the standard.

So does Gambit support Unicode or is the consortium going after somebody for 
non-compliance?  

While Gambit reads unicode files, I don't believe it does normalization.

It does allow kanji identifiers

(ã? -bã?? 5) => 120

Does Gambit comform?



> > It is desirable that a Scheme with support for extended identifiers
> > should not be large or expensive to implement.
>
> Normalization is not difficult or expensive in a batch program like a
> compiler. 

Huh?  There are plenty of small Scheme interpreters out there.  The binary for 
TinyScheme is ~100KB.  

There are plenty of interactive compilers out there.  I almost never use a 
Scheme compiler in a batch mode unless I am (re)building a runtime system.

[Bad choice of words?]


In particular, if you're carrying around the data for "Is this
> a letter or a number?" it's trivial to also provide the canonical
> compositions and decompositions. I don't know where you got the idea
> that it's expensive.

I think it is the "if you're carrying around the data for" part that I am 
worried about.  Blocks are one thing, but I see that the UniHan.txt file is 
25 MB and I am worried that large tables could double or triple the size of a 
small Scheme implementation.


> I suspect that you simply don't understand what a "Unicode
> implementation" is.

You are probably right.

I am currently hacking up some code (part time) to do this.  I should 
understand after I have written the code.

So far I am only up to processing CaseFold.txt and generating things like:
(case-fold (integer->uni-char #x00DF)) ;=> (#\U+0073 #\U+0073)
(uni-char=? #\A (integer->uni-char (char->integer #\A))) ;=> #t

Hey.  I am a slow learner.  I learn by doing.

Cheers,
-KenD