[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strings, one last detail.






On Fri, 30 Jan 2004, Tom Lord wrote:

> Bear wrote:
>    > What's missing is an explicit declaration that it is unspecified
>    > whether or not values written into the buffer pointed at by the
>    > result of SCHEME_EXTRACT_STRING mutate the scheme string that
>    > was originally referred to,
>
>Interesting conlusion.  I conclude that EXTRACT must allocate string
>data which the C code must explicitly free.  I arrived at this through
>a fairly systematic exploration of the design space (described below).

It's true that having a multi-step procedure where C code asks for the
string length, allocates the buffer, then aks for the characters to be
copied into the buffer, does whatever it does, and then disposes of
the buffer when it doesn't need it anymore, would be more stable and
general and easily portable to more scheme implementations. That was
what I initially proposed (that only values, and not pointers, should
cross the FFI), and that's what I'd still rather see.

However, the general response, as I understood it, was while string
copying or string-translating costs for SCHEME_EXTRACT_STRING are
inevitable for implementations that use odd string representations,
most people felt that is not acceptable to impose a string-copying
cost on scheme runtimes that *do* represent strings in some form
comprehensible to C systems.  So, basically, I thought that the "copy
everything" approach that you and I were advocating had been
eliminated from discussion.

*IF* the copy-everything approach is not on the table, and
implementations that store strings internally in a C-comprehensible
format are supposed to be spared the overhead of copying, then we need
to warn developers that the pointer they get is unstable, and might
cease to be valid on any string mutation from the scheme side or on
garbage collection, and that writing to the buffer is not guaranteed
to cause mutation to the scheme string.

We need to warn them of this because the write-through question is not
possible to solve just one way or the other.  Explicitly supporting
direct write-through mutation is in fact not even possible for
implementations that must provide SCHEME_EXTRACT_STRING by means of
copying/translating some other internal representation, or which may
change internal representations (and locations) on mutation, or which
have copying garbage collectors.  Conversely, absolutely preventing
direct write-through mutation entirely is impossible for most
implementations that store strings in a form that the C code *can*
understand and which implement SCHEME_EXTRACT_STRING without copying
the string buffer.  Essentially this seems to partition every
easily-possible implementation into three classes; either it cannot
guarantee to support write-through mutation (like mine), or it cannot
guarantee to prevent it (like S48), or it can guarantee neither (like
a scheme that uses byte strings but may relocate them on mutation or
GC).  Shiro's suggestion of forcing the C code to regard this buffer
as const chars seems to be the best solution.

But if the copy-everything still on the table, then I agree with you,
that it's definitely a more general, stable, and portable approach,
and I would support it.  However, I'd also agree with its detractors
that it imposes a copying overhead on some implementations that could
have provided visibility to their strings without copying them.  I
regard this as an entirely acceptable cost, but then I may be biased as
I couldn't possibly have avoided it anyway.

				Bear