This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
(1) Are your "random" accesses into your corpus linguistics strings really random, do they have significant locality, or could they be arranged to have have significant locality?Speaking for myself, I would say they are as close to random asmakes no difference.
Thanks for your answer.I think I'm convinced that representing strings in plain UTF-8 is a losing representation for this application. Or, generalizing, this application really needs strings that have constant-time random access and not just linear-time traversal.
If I wanted to rescue UTF-8 (because I really really really want to keep conversion to UTF-8 as a constant-time operation), I could maintain a vector of byte offsets to every Nth character.
Regards, Alan -- Dr Alan Watson Centro de Radioastronomía y Astrofísica Universidad Astronómico Nacional de México