This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Alex Shinn quoting me: > > I acknowledged that the algorithm will still work. My point > > is that its asymptotic complexity may be degraded. > > I'm sorry, perhaps I'm just misunderstanding, but if the exact > same algorithm, in fact the exact same code, can be used, > how is the asymptotic complexity affected? The different representations for the string being searched (UTF-8 vs UTF-32) change the problem. The main difference is that UTF-32 admits random access of characters, while UTF-8 does not. UTF-8 admits random access of bytes, but there is no way to convert a byte value obtained by random access into a UTF-8 string into the character offset on which the Boyer-Moore algorithm depends, short of reading all the characters that precede the randomly accessed byte. As Per Bothner agreed, that can degrade the asympototic complexity of the algorithm from O(n/m) to O(n). > Please note that I said "additional overhead." Nowhere did I say > or suggest that UTF-32 incurred an asymptotic penalty. However, > the c1 and c2 are not just constants - they are typically larger than > m, and therefore should not be ignored. Your belief that c1 and c2 are larger than m depends upon your implicit assumption that the automaton will be represented in some absurdly naive way. With any reasonable representation, c1 and c2 are negligibly small. The Boyer-Moore fast string search algorithm is an important algorithm that serves as a well-known example of the issue we are discussing, but its asymptotic complexity (both time and space) is not trivial to analyze, and that has been an impediment to our discussion. In another message I will present and analyze the Freshman-Level Boyer-Moore algorithm (FLBM), which is a simplified version that is simple enough for me to explain in freshmen-level courses, yet exhibits the asymptotic behavior that is relevant to this discussion. Will