This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
> But I'm puzzled, since I think Alex's point is this: A Boyer-Moore > implementation that works on 8-bits characters (e.g. Latin-1) will > work unchanged on UTF-8 characters. Alex was right about that, and I was wrong. (I was wrong for several reasons, which varied over time. At one point I thought "unchanged" meant the Scheme code would remain unchanged while STRING-REF was changed from an O(1) to an O(n) operation, or vice versa. What he really meant was that the C code would remain unchanged while the data was changed from UCS-4 to UTF-8, or something like that.) > Naively, one would think it > would have the same performance characteristics.... It does, roughly speaking. Alex, however, claimed that the Boyer-Moore algorithm actually performs *better* on UTF-8, because he thought the preprocessing phase requires time and space proportional to the size of the alphabet. That was what I understood to be his main point, and it was the reason I felt I had to respond to him. He was wrong about that. > > As Per Bothner agreed, that can degrade the asympototic > > complexity of the algorithm from O(n/m) to O(n). > > I did? No, you didn't. I misunderstood someone else's post; on top of that, I saw your name in the cc: field and thought it was the from: field. Sorry. Will