[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character strings versus byte strings

At Mon, 22 Dec 2003 09:09:44 -0800, Per Bothner wrote:
> Matthew Flatt wrote:
> >  * Where "char *" is used for strings (e.g., "expected_explanation" for
> >    a type error), define it to be an ASCII or Latin-1 encoding (I
> >    prefer the latter).
> No, it should be UTF-8.

I think you're right.

> So if I was designing a Scheme dialect for internationalization,
> I'd do away with mutable strings.

That sounds right, too.

So, one straightforward apporach is that C code only mutates byte
strings, and string operations in the C API use UTF-8. (I think some
particular encoding has to be chosen, even with the performance