[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strings draft
> From: Alex Shinn <foof@xxxxxxxxxxxxx>
> > > ??? Balanced trees are still trees, and if the tree is exp(n) long
> > > there are still O(n) links to follow from the root to a leaf. Can
> > > you explain what you mean?
> > I mean that you should (maybe, don't know enough about your apps and
> > environment) use splay trees so that string algorithms displaying a
> > reasonable degree of locality will be able to locate indexes in
> > expected-case O(1).
> OK, in my reply on c.l.s. I had incorrectly assumed you meant keeping
> some sort of reference to that last index in the tree, I should have
> asked for clarification.
> However, in either case you still have two problems:
> 1) Both are very different from unconditional O(1) access.
The language I recommend for R6RS says "expected case O(1)", not
It's not a requirement, just guidance -- so it doesn't prevent any
implementation from conforming.
It's the vague "expected case" -- so there's pretty wide liberty for
claiming you followed the guidance.
One kind of implementation it specifically discourages:
implementations that use nothing but (simplistically coded) UTF-8
internally and expect to be used for string-intensive processing on
anything other than short strings.
Another kind of implementation it specifically discourages is the same
as that but s/UTF-8/a shift encoding/.
If you want to use UTF-8 or a shift encoding or, really, any variable
length encoding internally: the guidance is just to encourage you to
either say "this implementation isn't expected to be winning for
string-intensive programs" or represent your strings cleverly to
overcome the shortcomings of those encodings.
> 2) Both effectively prevent shared strings as subtrees.
I'm aware of two kinds of shared substring you might want.
One is a "copy-on-write" shared substring -- the kind of substring
that SUBSTRING could return. I'm not sure what generalizations of
this can be formed, but you could certainly represent COW-shared
substrings as subtrees in splay-tree implementations. I've done so
in the past.
The other kind of shared substring would be a "mutation sharing"
shared substring. True: I don't see any natural "substring ==
subtree" implementation for those using splay-tree strings. That
doesn't mean, though, that I don't think they can be implemented
reasonably -- just not by that technique.
> > Of course if memory constraints and usage guarantee that your
> > particular non-splay trees or guaranteed to be shallow -- that's just
> > as good.
> Probably not good enough if you want to work with strings the size Bear
> is talking about.
Bear seems to be just throwing huge amounts of memory at the problem.
My impression is that the leaves of his tree are generally big -- the
trees, therefore, shallow.
> The problem with recommending O(1) on STRING-REF/SET! is that it
> encourages programming style that cannot be optimized in some string
> implementations. However, by the simple suggestion of Shiro's where you
> use a string-pointer object you can achieve O(1) where you need it.
I don't actually believe that. If you want to chat about why, we
could take it to pika-dev@xxxxxxxxxx or offlist.