[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.

To: tb@xxxxxxxxxx
Subject: Re: character strings versus byte strings
From: Tom Lord <lord@xxxxxxx>
Date: Mon, 22 Dec 2003 14:30:00 -0800 (PST)
Cc: mflatt@xxxxxxxxxxx, srfi-50@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-50@xxxxxxxxxxxxxxxxx
In-reply-to: <87vfo8k3ef.fsf@xxxxxxxxxxxxxxxxx> (tb@xxxxxxxxxx)
References: <20031222141633.829B7828@xxxxxxxxxxxxxxxxx> <87vfo8k3ef.fsf@xxxxxxxxxxxxxxxxx>

    > From: tb@xxxxxxxxxx (Thomas Bushnell, BSG)

    > Matthew Flatt <mflatt@xxxxxxxxxxx> writes:

    > >  * For Scheme characters, pick a specific encoding, probably one of
    > >    UTF-16, UTF-32, UCS-2, or UCS-4 (but I don't know which is the right
    > >    choice).

    > Wrong.  A Scheme character should be a codepoint.  The representation
    > of code points as sequences of bytes should be under the hood.

Misleading.

It isn't obvious that Scheme characters should be _Unicode_
codepoints.  For (much) more inclusive definitions of "codepoint",
that characters should be codepoints is tautologically true.

There's a serious problem regarding Scheme and Unicode in that, for
any sane definition of "character" in Unicode, the character type in
R5RS is not sanely isomorphic.

I think that the best way to handle that in an FFI is to try to remain
agnostic about the range of the scheme CHAR? type when mapped into C.
I _guess_ that the error-signalling-on-range-error property of
SCHEME_EXTRACT_CHARACTER satisfies this but it could certainly be
rounded out and made more useful.

-t

Follow-Ups:
- Re: character strings versus byte strings
  - From: Thomas Bushnell, BSG

References:
- character strings versus byte strings
  - From: Matthew Flatt
- Re: character strings versus byte strings
  - From: Thomas Bushnell, BSG

Prev by Date: Strings/chars
Next by Date: Re: Strings/chars
Previous by thread: Re: character strings versus byte strings
Next by thread: Re: character strings versus byte strings
Index(es):
- Date
- Thread