[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "Unicode Background" section

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

To: Thomas Lord <lord@xxxxxxx>
Subject: Re: the "Unicode Background" section
From: Matthew Flatt <mflatt@xxxxxxxxxxx>
Date: Thu, 21 Jul 2005 17:52:28 -0600
Cc: srfi-75@xxxxxxxxxxxxxxxxx
Delivered-to: srfi-75@xxxxxxxxxxxxxxxxx
In-reply-to: <1121985934.4501.46.camel@xxxxxxxxxxxxxx>
References: <1121985934.4501.46.camel@xxxxxxxxxxxxxx>

At Thu, 21 Jul 2005 15:45:34 -0700, Thomas Lord wrote:
> If CHARs are codepoints, more basic Unicode algorithms translate
> into Scheme cleanly.   

I don't see what you mean. Can you provide an example?

> What is gained by forcing surrogates to be unrepresentable as CHAR?

Every string is representable in UTF-8, UTF-16, etc.

> What kind of code will I wind up with if I want to iterate over
> a large range of CHAR values? 

Two loops: one from 0 to #xD7FF, and one from #xE000 to #x10FFFF.

> It's not as if by excluding surrogates we arrive at a CHAR definition
> that is significantly more "linguistic" than if we don't.

True, but we arrive at a definition that is more standards-friendly,
and that's part of the overall compromise.

FWIW: MzScheme originally supported a larger set of characters, mainly
because extra bits are available my implementation. The resulting bad
experience convinced me to define characters in terms of scalar values,
instead.

Matthew

Follow-Ups:
- Re: the "Unicode Background" section
  - From: John.Cowan

References:
- the "Unicode Background" section
  - From: Thomas Lord

Prev by Date: Re: the "Unicode Background" section
Next by Date: Re: new draft
Previous by thread: Re: the "Unicode Background" section
Next by thread: Re: the "Unicode Background" section
Index(es):
- Date
- Thread