This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Dr Alan Watson quoting me: > > Suppose each string s is represented by a vector of 2^21 elements, > > where element i consists of a list of numbers, in IEEE double > > precision, that represent the indexes within s at which the > > character c appears, where c is the Unicode scalar value f(i), > > where f is represented by a global association list that maps > > scalar values to indexes (i.e. f-inverse). SRFI-75 allows that > > representation, yet penalizes it. I also consider it to be a > > poor representation. > > You consider this a poor representation because it does not > allow constant-time random access or because it does not allow > linear-time traversal? Or simply because of the space cost? Yes. And don't forget the implementation complexity, cost of mutation, potential for race conditions in multithreaded systems, and incompatibility with Java, C#, and C++ cultural expectations. > Consider an implementation that internally represents strings > as if they were doubly-linked lists and cached the position of > the last index. This allows linear-time traversal but not > constant-time random access. Would you consider this a poor > implementation? For some purposes, perhaps, but not for others. IMO it is far more reasonable than the representation I described above. BTW, I once used a doubly-linked-list-of-lines representation to represent the text of editor buffers in an Emacs-like editor I wrote in Scheme. Some of the editor's commands treated the text as a single long linear array of characters; other commands treated the text as an array of strings, with double indexing (line+column) to obtain a character. Aided by a couple of cached index translations, this representation ran acceptably fast on an 8 MHz 68000 with 1 Mby of RAM, and ran very nicely on the faster and larger Macintoshes that came later. It was never released as a product, but I used it at home for more than a decade. That experience is one of the reasons I disregard some of the criticisms I have read here concerning SRFI-75's alleged bias against interesting representations. I have expressed my opinion that implementations should provide multiple representations for strings, transparently and adaptively, using the API described in SRFI-75 and to be described in future SRFIs. Not every one of those representations has to do all things well. Indeed, the impossibility of doing all things well with a single representation is the reason I favor a multiplicity. Will