This page is part of the web mail archives of SRFI 75 from before July 7th, 2015. The new archives for SRFI 75 contain all messages, not just those from before July 7th, 2015.
Alex Shinn writes: > You're missing Per's point. Those features have to have been > assigned by some previous text processing, which had to know > the location in the text in order to choose a tag. Those locations > could just as easily be represented by opaque pointers as by > codepoint offsets. To store these pointers in a separate file they > just need to be serializable. The obvious pointer representation > for UTF-8 strings would be the byte offset, an integer, which > serializes as is. I'm not missing his point, actually. The stand-off markup may be generated by someone else, say the data provider (in the case of data acquired from the LDC or ELDA) and hence I do not have any Scheme serialized data, rather character offsets into a UTF-8 scheme. -tree -- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"