Title

Enhanced Array Literals

Author

Per Bothner

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-163@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This is a specification of a reader form (literals) for multi-dimensional arrays. It is an extension of the Common Lisp array reader syntax to handle non-zero lower bounds, optional explicit bounds, and optional uniform element types (compatible with SRFI 4). It can be used in conjunction with SRFI 25, SRFI 122, or SRFI 164. These extensions were implemented in Guile (except the handling of rank-0 arrays), and later in Kawa.

There are recommendations for output formatting and a suggested format-array procedure.

Rationale

It is desirable to have a read and write syntax for multi-dimensional arrays. Basing it on the Common Lisp syntax makes sense, and is what has been done by all known existing implementations. However, Common Lisp arrays do not support non-zero lower bounds, and Common Lisp's handling of specialized (uniform) arrays is very different from that of known Scheme implementations. Various Scheme extensions have been proposed. SRFI 58 is one proposal, but it does not handle non-zero lower bounds, and its type-specifier syntax is verbose and stylistically incompatible with the literals proposed for uniform vectors in SRFI 4.

This specification is basically that of Guile. However, Guile requires the single element of rank-0 arrays to be enclosed in parentheses. While this may slightly enhance readability, it is somewhat inconsistent in that otherwise the nesting of parentheses (of atom-valued arrays) is equal to the rank. It is also inconsistent with Common Lisp and SRFI 58. So this specification (and the Kawa implementation) diverge from Guile in this respect.

Specification

Reader syntax

An array literal starts with # followed by its rank, followed by a tag that describes the underlying vector (by default a), optionally followed by information about its shape, and finally followed by the cells, organized into dimensions using parentheses.

For example, #2a((11 12 13) (21 22 23)) is a rank-2 array (a matrix) whose upper index bounds (shape) is #(2 3). It could be pretty-printed (using the non-normative format-array helper function below):

#2a:2:3══╗
║11│12│13║
╟──┼──┼──╢
║21│22│23║
╚══╧══╧══╝

array-literal ::= array-literal-header datum
array-literal-header ::= # rank vectag array-bound* 
array-bound ::= [@lower]:length | @lower
vectag ::= a | type-tag

The vectag specifies the type of the elements of the array. The letter a means a general array where each element can be an arbitrary Scheme datum. Alternatively, a type-tag can be used for arrays whose elements are restricted to some specific type. An implementation that supports the literal syntax of SRFI 4 (or its proposed update SRFI 160) should allow the TAG from that specification as a type-tag. For example #2u32((10 11) (20 21)) is a 2x2 array of 32-bit unsigned integers. An implementation may support other values for type-tag as long as there is no ambiguity.

Following the vectag you can optionally include information about the shape: For each dimension you can optionally specify the lower bounds (after the character @), followed by the length of the dimension (after the character ":"). The shape information is required if a lower bound is non-zero, or any length is zero. If any array-bound is specified, there must be exactly rank of them.

The datum contains the elements in a nested-list format: a rank-1 array uses a single list, a rank-2 array uses a list of lists, and so on. The elements are in lexicographic order.

The array-literal-header must be terminated by a delimiter. This is trivially the case for rank-1 and above (since the left parenthesis of a list is a delimiter). For rank-0 arrays, a single space before datum is recommended style, even if the datum starts with a delimiter. (Compatibility note: Common Lisp does not require a space after #0a.)

A uniform u32 array of rank 2 with index ranges 2..3 and 3..4:

#2u32@2@3((1 2) (2 3))

Rank 0 arrays:

#0a sym
#0f32 237.0

Empty arrays:

#2a:0:2()
#2a:2:0(() ())
#3a:2:0:3(() ())
#3a:2:3:0((() () ()) (() () ()))

Possible extension: It might make source code more readable if it could contain array literals in the form produced by format-array. An implementation has not been attempted, but it seems doable: If the first non-whitespace character after the header is a box drawing character, then use those box-drawing characters to figure out each element cell. (Nested arrays make this a bit more complicated.) If a cell has multiple lines, convert it to a string, with a newline between each line. Then recursively read each element.

Output

When an array is printed with the write function, the result should be an array-literal. For rank-0 arrays (only), a single space should be printed before the datum. In an implementation where vector is a subtype of array, rank-1 arrays with zero lower bound may be printed as vectors.

Printing with display may format the array in the same way as write (except using display for each element), or in some more readable way, perhaps by using the format-array procedure.

The format-array utility procedure (non-normative)

(format-array value [port])

Produce a nice pretty display for value, which is usually an array. Using Unicode "box drawing" characters is suggested but not required.

If port is an output port, the formatted output is written into that port. Otherwise, port must be a boolean (one of #t or #f). If the port is #t, output is to the (current-output-port). If the port is #f or no port is specified, the output is returned as a string. If the port is specified and is #t or an output-port, the result of the format-array procedure is unspecified. (This convention matches that of SRFI 48 format.)

The top line includes an array-literal-header. A lower bound is only printed if non-zero. The dimension lengths are printed if there is room, or if one of them is zero.

(define arr
  #2a@1:2@1:3((#2a((1 2) (3 4)) 9 #2a((3 4) (5 6)))
              (#(42 43) #2a((8 7 6)) #2a((90 91) (100 101)))))
arr ⇨
#2a@1:2@1:3═════╤═════════╗
║#2a═╗  │      9│#2a═╗    ║
║║1│2║  │       │║3│4║    ║
║╟─┼─╢  │       │╟─┼─╢    ║
║║3│4║  │       │║5│6║    ║
║╚═╧═╝  │       │╚═╧═╝    ║
╟───────┼───────┼─────────╢
║#1a:2═╗│#2a:1:3│#2a:2:2═╗║
║║42│43║│║8│7│6║│║ 90│ 91║║
║╚══╧══╝│╚═╧═╧═╝│╟───┼───╢║
║       │       │║100│101║║
║       │       │╚═══╧═══╝║
╚═══════╧═══════╧═════════╝

If the rank is more than 2, then each “layer” is printed separated by double lines.

#3a(((1 2 3 4) (5 6 7 8))
    ((9 10 11 12) (13 14 15 16))
    ((17 18 19 20) (21 22 23 24))) ⇨
#3a:3:2:4═══╗
║ 1│ 2│ 3│ 4║
╟──┼──┼──┼──╢
║ 5│ 6│ 7│ 8║
╠══╪══╪══╪══╣
║ 9│10│11│12║
╟──┼──┼──┼──╢
║13│14│15│16║
╠══╪══╪══╪══╣
║17│18│19│20║
╟──┼──┼──┼──╢
║21│22│23│24║
╚══╧══╧══╧══╝

Implementations may extend format-array so it can be called with more than 2 arguments, but this is not specified here. For example, implementations could allow a formatting procedure or combinator. An implementation that supports SRFI 48 format may support an optional element-format parameter, which would be interpreted as a format string used for each array element. This is implemented in the Kawa reference implementation, as illustrated here:

(format-array arr "~4,2f") ⇨
#2a@1:2@1:3═══╤════════════════╤═══════════════╗
║#2a:2:2═══╗  │            9.00│#2a:2:2═══╗    ║
║║1.00│2.00║  │                │║3.00│4.00║    ║
║╟────┼────╢  │                │╟────┼────╢    ║
║║3.00│4.00║  │                │║5.00│6.00║    ║
║╚════╧════╝  │                │╚════╧════╝    ║
╟─────────────┼────────────────┼───────────────╢
║#1a:2═╤═════╗│#2a:1:3════╤═══╗│#2a:2:2═══════╗║
║║42.00│43.00║│║8.00│7.00│6.00║│║ 90.00│ 91.00║║
║╚═════╧═════╝│╚════╧════╧════╝│╟──────┼──────╢║
║             │                │║100.00│101.00║║
║             │                │╚══════╧══════╝║
╚═════════════╧════════════════╧═══════════════╝

Implementation

This syntax is implemented in Guile (except for the handling of rank-0 arrays) and Kawa.

The Kawa implementation for the reader syntax is in gnu.kawa.lispexpr.LispReader.

Kawa also provides format-array, implemented as a Scheme wrapper around the print method of the gnu.kawa.functions.ArrayPrint class.

The Kawa implementation includes tests in arr-test.scm. They are not standalone or portable, as they depend on Kawa functionality, including SRFI-164-style arrays and SRFI 109 (to make the result strings more readable).

Acknowledgements

This specification is based on prior art, primarily Kawa, Guile, and Common Lisp.

Copyright

Copyright (C) Per Bothner 2019

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler