[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strings, one last detail.

This page is part of the web mail archives of SRFI 50 from before July 7th, 2015. The new archives for SRFI 50 contain all messages, not just those from before July 7th, 2015.




    > From: bear <bear@xxxxxxxxx>

    >> Interesting conlusion.  I conclude that EXTRACT must allocate string
    >> data which the C code must explicitly free.  I arrived at this through
    >> a fairly systematic exploration of the design space (described below).

    > It's true that having a multi-step procedure where C code asks for the
    > string length, allocates the buffer, then aks for the characters to be
    > copied into the buffer, does whatever it does, and then disposes of
    > the buffer when it doesn't need it anymore, would be more stable and
    > general and easily portable to more scheme implementations. That was
    > what I initially proposed (that only values, and not pointers, should
    > cross the FFI), and that's what I'd still rather see.

    > However, the general response, as I understood it, was while string
    > copying or string-translating costs for SCHEME_EXTRACT_STRING are
    > inevitable for implementations that use odd string representations,
    > most people felt that is not acceptable to impose a string-copying
    > cost on scheme runtimes that *do* represent strings in some form
    > comprehensible to C systems.  So, basically, I thought that the "copy
    > everything" approach that you and I were advocating had been
    > eliminated from discussion.

Right, but I think that was irrational and have tried to show why.

I agree with the sentiment that there's an eventual need for a
portable FFI API which includes a possibly-shared extract-like
routine.

I would also say that there's an eventual need for a shared-writes 
extract-like routine.   In fact, I think that the possibly-shared
interface and the shared-writes interface should be the same thing.

In particular, I like the design pattern illustrated by this design
sketch:

  scm_lock_ascii_string (&string_data, &string_length, instance, string);

  [.... do stuff to string data .... no FFI calls permitted ...
   .... must be a short (timewise) path through the code ...]

  scm_unlock_ascii_string (instance, string, string_data, string_length);


and

  scm_lock_ascii_string2 (&s1_data, &s1_len, &s2_data, &s2_len,
                          instance, string1, string2);

  [.... do stuff to string data .... no FFI calls permitted ...
   .... must be a short (timewise) path through the code ...]

  scm_unlock_ascii_string2 (instance, 
                            string1, s1_data, s1_leng,
                            string2, s2_data, s2_leng);


Those are highly useful intefaces, have realistic relationships to
other threads, have realistic relationships to the execution of
arbitrary Scheme code, play nicely with async code execution, permit
actual sharing without requiring it, and have a shared-write semantic
for the C programmer.


    > *IF* the copy-everything approach is not on the table, and
    > implementations that store strings internally in a C-comprehensible
    > format are supposed to be spared the overhead of copying, then we need
    > to warn developers that the pointer they get is unstable, and might
    > cease to be valid on any string mutation from the scheme side or on
    > garbage collection, and that writing to the buffer is not guaranteed
    > to cause mutation to the scheme string.

As my little case-wise analysis tries to show, the list of warnings is
such that the functionality would be basically useless in a portable FFI.


    > We need to warn them of this because the write-through question is not
    > possible to solve just one way or the other.  Explicitly supporting
    > direct write-through mutation is in fact not even possible for
    > implementations that must provide SCHEME_EXTRACT_STRING by means of
    > copying/translating some other internal representation, 

The lock/unlock pattern can solve it -- not in the general case but in
a useful way -- and is quite portable.

-t