[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.



At Mon, 22 Dec 2003 09:09:44 -0800, Per Bothner wrote:
> Matthew Flatt wrote:
> 
> >  * Where "char *" is used for strings (e.g., "expected_explanation" for
> >    a type error), define it to be an ASCII or Latin-1 encoding (I
> >    prefer the latter).
> 
> No, it should be UTF-8.

I think you're right.

> So if I was designing a Scheme dialect for internationalization,
> I'd do away with mutable strings.

That sounds right, too.

So, one straightforward apporach is that C code only mutates byte
strings, and string operations in the C API use UTF-8. (I think some
particular encoding has to be chosen, even with the performance
implications.)

Matthew