[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SRFI withdrawn; comments on the possible future

This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.

Matthew Flatt <mflatt@xxxxxxxxxxx> writes:

>     * `string-normalize-nfd', `string-normalize-nfkd,
>       `string-normalize-nfc', and `string-normalize-nfkc', which each
>       accept a string and produce its normalization according to normal
>       form D, KD, C, or KC, respectively.

If the basic concept of the SRFI - a string being a sequence of
code points - does not change, I do think these procedures are
useful (contrary to bear and Alex Shinn). An implementation can
still normalize internally in the "usual case", and if the
programmer enforces a different normalization, that's eir problem.

STRING=? and similar procedures need to define which kind of
normalization they work on (or just "the same normalization for
all arguments").

STRING-DOWNCASE, STRING-APPEND etc. need to define whether they
may normalize their arguments, and if so, which normalization they
return. If the normalization shouldn't be prescribed, another
procedure, STRING-NORMALIZE (or similar), needs to be added to
return the normalization the implementation prefers.

A higher-level string API can (and should) be built on top of the
strings defined in this SRFI.

> The #\newline character
> -----------------------
> It is likely that #\newline will be removed from Scheme leaving only
> #\linefeed. Since R6RS will pin down characters to Unicode scalar
> values, the right name for the character is #\linefeed.

I'm always in favor of breaking stuff to get a clean result.

> Another view is that #\newline can serve as an abstaction of the
> end-of-line character sequence which is returned by read-char
> when the end-of-line character sequence is read (be it
> #\linefeed, or #\return, or # \return followed by #\linefeed).
> So even though #\newline and #\linefeed are the same characters,
> Scheme programs might use #\newline to highlight that the
> character is being used to denote the end-of-line sequence. The
> name #\newline would also reinforce the link with the escape
> sequence "\n" in strings.

If #\newline is considered to be some kind of abstraction of the
end-of-line character sequence, please remember that Unicode
canonical new line code points, to finally get rid of all these

> Escape sequences
> ----------------

>    with semi-colon terminator          without terminator
>    "A\x42;C" = "ABC"                   "A\x42\x43" = "ABC"
>    "\x41;\x42;\x43;" = "ABC"           "\x41\x42\x43" = "ABC"
>    "\x03BB;x.x" = "λx.x"               "\x03BBx.x" = "λx.x"

I agree with bear that the semicolon is a bad choice - why not use
the colon?


> Using less-than and greater-than characters, which are not actual
> brackets, avoids this problem:
>     #\x<03BB> = #\λ

Braces have been offered as an alternative:


> However, they become somewhat more difficult to read when multiple
> escape appear in a string:
>    "\x<41>\x<42>\x<43>" = "ABC"


> In either case, the trade-off is that Scheme strings are unlikely to be
> compatible with any other language's string syntax. A consequence is
> that there is additional burden on the programmer which must learn yet
> another string and character syntax.

I do think it's good that we don't go with bad decisions made by
other languages just because the decision has been made by them.

> Symbol characters
> -----------------
> [...]
> Meanwhile, the symbol escapes are similar yet not identical to the
> escapes in strings and characters, so there is a potential for mistakes
> if the programmer is not careful. For example one might expect a\nb to
> be a valid symbol, but it is an error.

Why not allow the same escapes in symbols and in strings?

All in all I like the changes you propose (modulo the comments
above). Thanks for the good work!

        -- Jorgen

((email . "forcer@xxxxxxxxx") (www . "http://www.forcix.cx/";)
 (gpg   . "1024D/028AF63C")   (irc . "nick forcer on IRCnet"))