[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: freshman-level Boyer-Moore fast string search

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.



William D Clinger wrote:
 > Out of curiosity, what string representations does SRFI-75 penalize
 > which you consider to be poor?

Suppose each string s is represented by a vector of 2^21 elements,
where element i consists of a list of numbers, in IEEE double
precision, that represent the indexes within s at which the
character c appears, where c is the Unicode scalar value f(i),
where f is represented by a global association list that maps
scalar values to indexes (i.e. f-inverse).  SRFI-75 allows that
representation, yet penalizes it.  I also consider it to be a
poor representation.

You consider this a poor representation because it does not allow constant-time random access or because it does not allow linear-time traversal? Or simply because of the space cost?

Consider an implementation that internally represents strings as if they were doubly-linked lists and cached the position of the last index. This allows linear-time traversal but not constant-time random access. Would you consider this a poor implementation?

The thing is, SRFI-75 does not mention how one accesses the characters in a string, so I presume that R5RS character indices will remain the only portable way to do this.

Regards,

Alan
--
Dr Alan Watson
Centro de Radioastronomía y Astrofísica
Universidad Astronómico Nacional de México