[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strings draft

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.




    > From: Alex Shinn <foof@xxxxxxxxxxxxx>

    > >     > ???  Balanced trees are still trees, and if the tree is exp(n) long
    > >     > there are still O(n) links to follow from the root to a leaf.  Can
    > >     > you explain what you mean?

    > > I mean that you should (maybe, don't know enough about your apps and
    > > environment) use splay trees so that string algorithms displaying a
    > > reasonable degree of locality will be able to locate indexes in
    > > expected-case O(1).

    > OK, in my reply on c.l.s. I had incorrectly assumed you meant keeping
    > some sort of reference to that last index in the tree, I should have
    > asked for clarification.

    > However, in either case you still have two problems:

    >   1) Both are very different from unconditional O(1) access.

The language I recommend for R6RS says "expected case O(1)", not
unconditional.

It's not a requirement, just guidance -- so it doesn't prevent any
implementation from conforming.

It's the vague "expected case" -- so there's pretty wide liberty for
claiming you followed the guidance.

One kind of implementation it specifically discourages:
implementations that use nothing but (simplistically coded) UTF-8
internally and expect to be used for string-intensive processing on
anything other than short strings.

Another kind of implementation it specifically discourages is the same
as that but s/UTF-8/a shift encoding/.

If you want to use UTF-8 or a shift encoding or, really, any variable
length encoding internally: the guidance is just to encourage you to
either say "this implementation isn't expected to be winning for
string-intensive programs" or represent your strings cleverly to
overcome the shortcomings of those encodings.



    >   2) Both effectively prevent shared strings as subtrees.

I'm aware of two kinds of shared substring you might want.

One is a "copy-on-write" shared substring -- the kind of substring
that SUBSTRING could return.  I'm not sure what generalizations of
this can be formed, but you could certainly represent COW-shared
substrings as subtrees in splay-tree implementations.   I've done so
in the past.

The other kind of shared substring would be a "mutation sharing"
shared substring.  True: I don't see any natural "substring ==
subtree" implementation for those using splay-tree strings.  That
doesn't mean, though, that I don't think they can be implemented
reasonably -- just not by that technique.




    > > Of course if memory constraints and usage guarantee that your
    > > particular non-splay trees or guaranteed to be shallow -- that's just
    > > as good.

    > Probably not good enough if you want to work with strings the size Bear
    > is talking about.

Bear seems to be just throwing huge amounts of memory at the problem.
My impression is that the leaves of his tree are generally big -- the
trees, therefore, shallow.


    > The problem with recommending O(1) on STRING-REF/SET! is that it
    > encourages programming style that cannot be optimized in some string
    > implementations.  However, by the simple suggestion of Shiro's where you
    > use a string-pointer object you can achieve O(1) where you need it.

I don't actually believe that.   If you want to chat about why, we
could take it to pika-dev@xxxxxxxxxx or offlist.


-t