[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Surrogates and character representation



By the same token, random-access disks are a useless feature, for they
can be replaced by sequential-access DECtapes that can be rewound and
selectively rewritten.  But at a price.

Files actually provide a fairly close analogy to the commonest means of representing Unicode strings.

Imagine a file system that implements files as streams of bytes. Now imagine that you want to read the Nth *line*. The only way to do this is to read through the file until you have encounted N-1 newlines. This is like finding the Nth character when using UTF-8 for strings.

Now imagine a file system that implements files as enumerated random-access records and uses exactly one record for each line. You can directly read the Nth line. This is like finding the Nth character when using UCS-32 for strings.

Now imagine a file system that implements files as enumerated random-access records and uses one or more record for each line. This is like using UTF-16 for strings.

Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México