This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.
> From: Alex Shinn <foof@xxxxxxxxxxxxx> > > > ??? Balanced trees are still trees, and if the tree is exp(n) long > > > there are still O(n) links to follow from the root to a leaf. Can > > > you explain what you mean? > > I mean that you should (maybe, don't know enough about your apps and > > environment) use splay trees so that string algorithms displaying a > > reasonable degree of locality will be able to locate indexes in > > expected-case O(1). > OK, in my reply on c.l.s. I had incorrectly assumed you meant keeping > some sort of reference to that last index in the tree, I should have > asked for clarification. > However, in either case you still have two problems: > 1) Both are very different from unconditional O(1) access. The language I recommend for R6RS says "expected case O(1)", not unconditional. It's not a requirement, just guidance -- so it doesn't prevent any implementation from conforming. It's the vague "expected case" -- so there's pretty wide liberty for claiming you followed the guidance. One kind of implementation it specifically discourages: implementations that use nothing but (simplistically coded) UTF-8 internally and expect to be used for string-intensive processing on anything other than short strings. Another kind of implementation it specifically discourages is the same as that but s/UTF-8/a shift encoding/. If you want to use UTF-8 or a shift encoding or, really, any variable length encoding internally: the guidance is just to encourage you to either say "this implementation isn't expected to be winning for string-intensive programs" or represent your strings cleverly to overcome the shortcomings of those encodings. > 2) Both effectively prevent shared strings as subtrees. I'm aware of two kinds of shared substring you might want. One is a "copy-on-write" shared substring -- the kind of substring that SUBSTRING could return. I'm not sure what generalizations of this can be formed, but you could certainly represent COW-shared substrings as subtrees in splay-tree implementations. I've done so in the past. The other kind of shared substring would be a "mutation sharing" shared substring. True: I don't see any natural "substring == subtree" implementation for those using splay-tree strings. That doesn't mean, though, that I don't think they can be implemented reasonably -- just not by that technique. > > Of course if memory constraints and usage guarantee that your > > particular non-splay trees or guaranteed to be shallow -- that's just > > as good. > Probably not good enough if you want to work with strings the size Bear > is talking about. Bear seems to be just throwing huge amounts of memory at the problem. My impression is that the leaves of his tree are generally big -- the trees, therefore, shallow. > The problem with recommending O(1) on STRING-REF/SET! is that it > encourages programming style that cannot be optimized in some string > implementations. However, by the simple suggestion of Shiro's where you > use a string-pointer object you can achieve O(1) where you need it. I don't actually believe that. If you want to chat about why, we could take it to pika-dev@xxxxxxxxxx or offlist. -t