[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



Okay, thanks for clearing up my misunderstanding.

> but in general using UTF-8 as an internal representation is
> a bad idea.

Using UTF-8 internally for a Scheme on a Plan 9 system is not obviously a bad idea. Sure, you don't have direct indexing, but you avoid conversion when you talk to the C library and OS.

Using UTF-16 internally doesn't give you direct indexing because of characters outside the BMP, but it might make sense on Windows boxes for precisely the same reason.

Using UCS-32 internally in these cases would involve translation to talk to the library and OS and would further make my emacs use about four times as much memory as it does now (which brings us back the the representation for infinity).

In general, any single representation is a bad idea in some circumstances.

Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México