[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

*To*: srfi-75@xxxxxxxxxxxxxxxxx*Subject*: Re: freshman-level Boyer-Moore fast string search*From*: William D Clinger <cesura@xxxxxxxxxxx>*Date*: Fri, 29 Jul 2005 07:32:10 +0200*Delivered-to*: srfi-75@xxxxxxxxxxxxxxxxx*User-agent*: Gnus/5.110003 (No Gnus v0.3) XEmacs/21.5-b20 (berkeley-unix)

In my previous message concerning the Freshman-Level Boyer-Moore algorithm, I wrote: > In the worst case, the FLBM algorithm requires O(mn lg m) time. > In the best case, however, the FLBM algorithm runs in O(n/m lg m) > time. This is also the usual case in practice. That last sentence isn't true. It's true for the real Boyer-Moore algorithm, but not for the FLBM. Alex Shinn wrote: > What you describe is not the Boyer-Moore algorithm. It is a > (freshman level variant of) a variant of the Boyer-Moore algorithm My message was perfectly clear about that. > which trades a time and space cost of Sigma in the preprocessing > step for a constant factor of lg(m) in both steps. Why do you continue to speak as though the Boyer-Moore algorithm's preprocessing must be done so stupidly? This isn't rocket science. My description of the FLBM algorithm illustrates the basic idea of a less stupid representation, and you can find it worked out in more detail at http://www.codeproject.com/csharp/bmsearch.asp for the real Boyer-Moore algorithm. If, as you say, the real Boyer-Moore algorithm can be made to run just as fast with UTF-8, then I was wrong to give it as an example of an algorithm that performs better with constant-time random access to characters than with constant-time random access to the bytes of a UTF-8 encoding. In that case, please accept the Freshman-Level Boyer-Moore algorithm as my example instead. > However, the point of the discussion is that people are objecting to > explicitly forcing strings to be character arrays. That may be *your* point, but it isn't *my* point. So far as I can tell, absolutely no one has proposed, whether explicitly or implicitly, that strings be forced to be character arrays. In particular, SRFI-75 does not propose that. I got into this because Per Bothner stated that "Random accesses to a position in a string that has not been previously accessed is not in itself useful." I cited the Boyer-Moore string search to show that Bothner's statement is untrue. My dispute with you arose over your claim that "the true cost of BM includes a preprocessing step of O(m+sigma) in both time and space, where m is the pattern length and sigma is the size of the alphabet", and that this implies that "UTF-8 wins hands down for string searching." Your first claim is utter nonsense, leaving the second without support. > If we have string-pointers in Scheme then the above conversion to > index need not even be performed. Why penalize the fastest > implementation? In general, there is no fastest implementation. Implementations are not totally ordered by speed or by asymptotic time complexity. The fastest implementation for one problem is unlikely to be the fastest for all. Even if there were a fastest implementation, I am unaware of any proposal to penalize it. I must admit, however, that I am in the habit of ignoring proposals that I cannot take seriously. If you think there is a serious proposal on the table that would penalize what you regard as the fastest implementation of something, then I suggest you identify the proposal, the fastest implementation, and the something. Will

- Prev by Date:
**freshman-level Boyer-Moore fast string search** - Next by Date:
**Re: freshman-level Boyer-Moore fast string search** - Previous by thread:
**Re: freshman-level Boyer-Moore fast string search** - Next by thread:
**Re: freshman-level Boyer-Moore fast string search** - Index(es):