[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

    > From: Alex Shinn <foof@xxxxxxxxxxxxx>

    > >     > ???  Balanced trees are still trees, and if the tree is exp(n) long
    > >     > there are still O(n) links to follow from the root to a leaf.  Can
    > >     > you explain what you mean?

    > > I mean that you should (maybe, don't know enough about your apps and
    > > environment) use splay trees so that string algorithms displaying a
    > > reasonable degree of locality will be able to locate indexes in
    > > expected-case O(1).

    > OK, in my reply on c.l.s. I had incorrectly assumed you meant keeping
    > some sort of reference to that last index in the tree, I should have
    > asked for clarification.

    > However, in either case you still have two problems:

    >   1) Both are very different from unconditional O(1) access.

The language I recommend for R6RS says "expected case O(1)", not

It's not a requirement, just guidance -- so it doesn't prevent any
implementation from conforming.

It's the vague "expected case" -- so there's pretty wide liberty for
claiming you followed the guidance.

One kind of implementation it specifically discourages:
implementations that use nothing but (simplistically coded) UTF-8
internally and expect to be used for string-intensive processing on
anything other than short strings.

Another kind of implementation it specifically discourages is the same
as that but s/UTF-8/a shift encoding/.

If you want to use UTF-8 or a shift encoding or, really, any variable
length encoding internally: the guidance is just to encourage you to
either say "this implementation isn't expected to be winning for
string-intensive programs" or represent your strings cleverly to
overcome the shortcomings of those encodings.

    >   2) Both effectively prevent shared strings as subtrees.

I'm aware of two kinds of shared substring you might want.

One is a "copy-on-write" shared substring -- the kind of substring
that SUBSTRING could return.  I'm not sure what generalizations of
this can be formed, but you could certainly represent COW-shared
substrings as subtrees in splay-tree implementations.   I've done so
in the past.

The other kind of shared substring would be a "mutation sharing"
shared substring.  True: I don't see any natural "substring ==
subtree" implementation for those using splay-tree strings.  That
doesn't mean, though, that I don't think they can be implemented
reasonably -- just not by that technique.

    > > Of course if memory constraints and usage guarantee that your
    > > particular non-splay trees or guaranteed to be shallow -- that's just
    > > as good.

    > Probably not good enough if you want to work with strings the size Bear
    > is talking about.

Bear seems to be just throwing huge amounts of memory at the problem.
My impression is that the leaves of his tree are generally big -- the
trees, therefore, shallow.

    > The problem with recommending O(1) on STRING-REF/SET! is that it
    > encourages programming style that cannot be optimized in some string
    > implementations.  However, by the simple suggestion of Shiro's where you
    > use a string-pointer object you can achieve O(1) where you need it.

I don't actually believe that.   If you want to chat about why, we
could take it to pika-dev@xxxxxxxxxx or offlist.