This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Thomas Lord scripsit: > My plan (and stalled code) works that way. If a > string contains only codepoints in 0..255, store it as bytes. > 0..ffff, use 16-bits, otherwise, use 32. This is a plausible design. If you are willing to pay more time to save some more space, you could have multiple flavors of single-byte strings based on SCSU dynamic windows. Keep a single overhead byte T with each single-byte string that indicates the meaning of the byte range 80-FF: Value of T Unicode offset Comment 01..67 x*80 half-blocks from U+0080 to U+3380 68..A7 x*80+AC00 half-blocks from U+E000 to U+FF80 F9 00C0 Latin-1 letters + half of Latin Extended-A FA 0250 IPA Extensions FB 0370 Greek FC 0530 Armenian FD 3040 Hiragana FE 30A0 Katakana FF FF60 Halfwidth Katakana So your byte strings (range U+0000..U+00FF) would have an T byte of 01. Of course there is no requirement to implement this entire scheme; you can cherry-pick particular T values that make sense. -- As you read this, I don't want you to feel John Cowan sorry for me, because, I believe everyone jcowan@xxxxxxxxxxxxxxxxx will die someday. http://www.reutershealth.com --From a Nigerian-type scam spam http://www.ccil.org/~cowan