This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
On 7/28/05, William D Clinger <cesura@xxxxxxxxxxx> wrote: > > You certainly don't need character offsets to do a string > search, but the naive algorithm without random access to > characters is O(mn). The Boyer-Moore algorithm improves > this to O(n/m) in many cases. I believe one can construct > artificial examples to show that some O(n/m) cases would > degrade to an intermediate complexity, or even back to O(mn), > in UTF-8 or UTF-16 without character offsets. This is not correct. Any search algorithm that works on bytes will work on on UTF-8 strings. That is, given a C function that searches for a char* within a char* (e.g. strstr) then that will return the correct result if the arguments are UTF-8 encoded, no matter what algorithm is used. It is in fact UTF-32 that has additional overhead for Boyer-Moore searches as mentioned in my previous mail. -- Alex