276: Type-specific Flonum Libraries

by Peter McGoron

Status

This SRFI is currently in draft status. Here is an explanation of each status that a SRFI can hold. To provide input on this SRFI, please send email to srfi-276@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

This SRFI is an updated version of SRFI 144 that allows an implementation to support multiple flonum representations. Each flonum has its own separate library. Each library also has the ability to inspect properties of the flonum operations, such as rounding mode and deviations from IEEE 754 arithmetic. New flonum operations are also available, such as random number generation and serialization.

Issues

Rationale

This section is non-normative.

Standard Scheme doesn’t give specifics about the precision and range of inexact numbers. From the R4RS onward, implementations could use s, f, d, and l to denote inexact constants of different precisions. The R6RS and SRFI 144 included “flonum” operations. However, these specifications do not specify what the format of the flonum is. The flonum might not be an IEEE format number and operations may differ from implementation to implementation.

This SRFI proposes a variant of SRFI 144 that is organized into representation-specific libraries. The functions exported from a specific library operate on a precisely defined number format. For example, if one wanted to operate on binary64 floating point numbers, one can import (srfi 276 binary64).

Speed and Portability vs. Ease of Use

One reason to use type-specific procedures is speed: the function sqrt from (srfi 276 binary32) can be compiled to a single FSQRT instruction on a RISC-V processor. One could also compile multiple square roots to a single vectorized SQRTPS instruction on an x86_64 processor with SSE2.

Another reason to use type-specific procedures is portability. Given the same rounding mode, format, and IEEE 754 conformance flag below, operations like +, -, and √ will always return the same value given the same inputs.

Most programmers do not have the speed and portability of floating point operations as their top priorities. They want their floating point calculations to work well above blazing speeds or bit-for-bit reproducibility across architectures. Basically, floating-point should do “what they want.” A type-flexible system is more likely to do what the non-numerically inclined programmer wants: see Kahan 1997 p. 29 and Kahan and Darcy 1998 pp. 60ff.

Scheme’s module system, lack of special arithmetic syntax, and latent typing allow us to separate strict correctness and “do what I want.” Programmers who wish for their programs to do “what they want” should use Scheme’s generic arithmetic. An implementation is free to do things like widen operands or optimize expressions (for example, using the SSE2 instruction RSQRTPS for (/ 1 (sqrt x))) without worrying about strict reproducibility or the absolute fastest speed.

Specification

This SRFI is organized into these parts:

  1. The type-specific libraries.
  2. Optimization considerations.
  3. Relationships between type-specific libraries and number vectors.
  4. Recommendations for lexical syntax.

Terminology

All references to IEEE 754 refer to its 2019 revision.

A representation is a type of inexact number that has fixed properties, like exponent range and mantissa width. Examples include binary32, binary64, and posit32 (Gustafson 2022).

An operation is correctly rounded if the returned value is the same as if the operation were calculated to infinite precision, and then rounded to fit in the resulting representation according to the current rounding mode.

Operations are non-stop if all functions return a flonum in the same format as the input, even if the result is a subnormal, infinity, or NaN.

Square brackets [] are used to denote a group of arguments that are optional, but all arguments must be present or absent. If one pair of square brackets is nested in another pair, then the nested pair is optional even when the other arguments are supplied.

In procedure arguments, it is an error if endianness is not the symbol little, big, or an endianness supplied by the macro in (rnrs bytevectors). When endianness is not supplied, it is the native endianness.

Requirements on implementations using RFC 2119 terminology are marked up in strong text.

Type-specific libraries

The following library names, if available, must implement the library described in the sections below.

(srfi 276 binary16)
Operates on IEEE 754 binary16 (AKA half-precision floating-point) values.
(srfi 276 binary32)
Operates on IEEE 754 binary32 (AKA single-precision floating-point) values.
(srfi 276 binary64)
Operates on IEEE 754 binary64 (AKA double-precision floating-point) values.
(srfi 276 binary128)
Operates on IEEE 754 binary128 (AKA quadruple-precision floating-point) values.
(srfi 276 binary256)
Operates on IEEE 754 binary256 (AKA octuple-precision floating-point) values.

The implementation must export (srfi 276), which implements the same library. It should be the same as one of the above libraries.

The following library names are reserved (where ⟨n⟩ is a base-10 numeral). They are reserved because some of the functions in the flonum library may not be appropriate for these format numbers. A future SRFI or Report will define operations on these representations.

(srfi 276 decimal⟨n⟩)
Operates on IEEE 754 decimal formats.
(srfi 276 complex-⟨format⟩⟨n⟩) where ⟨format⟩ is either binary or decimal
Operates on complex numbers represented as two values in that IEEE format.
(srfi 276 binary⟨n⟩) for ⟨n⟩ not previously defined
Reserved for future IEEE 754 revisions.

An implementation may provide libraries with different names than the ones above. Such a library should implement all of the procedures described below. For example, an implementation could provide (srfi 276 posit32) for operations on posits, with a similar API to the one below. However, posits do not have infinite values, so infinite? would not be exported.

IEEE binary floating-point library

The binary floating-point libraries exports the identifiers of SRFI 144, with the following modifications:

Rationale: Multiple floating-point libraries may be pulled into the same library. To disambiguate them, one would prefix them differently. This would mean that procedures would look like f32:fl+, which is redundant. Hence this SRFI uses the shorter : prefix. Then the above can be imported as f32:+ using f32 as a prefix.

The operations that must return correctly rounded values are the one that IEEE 754 mandates to be correctly rounded.

Library summary

The following identifiers are exported:

:e :1/e :e-2 :e-pi/4 :log2-e :log10-e :log-2 :1/log-2 :log-3 :log-pi :log-10 :1/log-10 :pi :1/pi :2pi :pi/2 :pi/4 :2/sqrt-pi :pi-squared :degree :2/pi :sqrt-2 :sqrt-3 :sqrt-5 :sqrt-10 :1/sqrt-2 :cbrt-2 :cbrt-3 :4thrt-2 :phi :log-phi :1/log-phi :euler :e-euler :sin-1 :cos-1 :gamma-1/2 :gamma-1/3 :gamma-2/3 :greatest :least :epsilon :integer-exponent-zero :integer-exponent-nan :flonum :adjacent :copysign :make-flonum :integer-fraction :exponent :integer-exponent :normalized-fraction-exponent :sign-bit :flonum? :=? :<? :>? :<=? :>=? :unordered? :max :min :integer? :zero? :positive? :negative? :odd? :even? :finite? :infinite? :nan? :normal? :subnormal? :+ :* :+* :- :/ :abs :absdiff :posdiff :sgn :numerator :denominator :floor :ceiling :round :truncate :exp :exp2 :exp-1 :square :sqrt :cbrt :hypot :expt :log :log1+ :log2 :log10 :make-log-base :sin :cos :tan :asin :acos :atan :sinh :cosh :tanh :asinh :acosh :atanh :quotient :remainder :remquo :gamma :loggamma :first-bessel :second-bessel :erf :erfc :rounding-mode :features :read-random-flonum :round/ties-to-away :byte-width :bytevector-flonum-ref :bytevector-flonum-set! :string->flonum :flonum->string

New identifiers

The examples assume use the optional reader syntax suggestions to denote values of different representation.

(srfi 276 ⟨library⟩)
procedure
(:rounding-mode)

Returns the current rounding mode for this flonum type. This SRFI defines the following symbols which can be returned from this procedure. The SRFI defers to the IEEE 754 standard for the complete definition of these rounding modes. An implementation may add other rounding modes, which should be symbols. For example, an implementation with support for GNU MPFR may add MPFR's additional rounding modes.

round-to-nearest/ties-to-even
Operations are rounded to the nearest representable value, with ties broken by returning the value with an even least significant digit. For representations where that is ambiguous, the returned value is the larger of the tie in magnitude. (IEEE 754 roundTiesToEven)
round-to-nearest/ties-to-away
Operations are rounded to the nearest representable value, with ties broken by returning the tie value with the largest magnitude. (roundTiesToAway)
round-towards-positive
Operations are rounded to the closest representable value not less than the infinitely precise value. (IEEE 754 roundTowardsPositive)
round-towards-negative
Operations are rounded to the closest representable value not greater than the infinitely precise value. (IEEE 754 roundTowardsNegative)
round-towards-zero
Operations are rounded to the closest representable value not greater than in magnitude the infinitely precise value. (IEEE 754 roundTowardsZero)

Note: This SRFI provides no portable way to change the rounding mode because it is a major implementation burden with little benefit. In a vacuum, the rounding mode is best represented as a dynamic variable that can be parameterized. However, the rounding mode is generally a global variable, and can sometimes be attached to individual instructions (RISC-V is an example). Modifying the rounding mode is a pretty rare operation: an analysis of RISC-V code saw no use of any mode besides roundTiesToEven for arithmetic operations [Zurstraßen 2023].

In a similar vein, this SRFI provides no way of inspecting and raising IEEE 754 exceptions. An example of an implementation that has both IEEE 754 exception handling and rounding mode control is MIT Scheme.

The rounding mode is independent of the behavior of the round function.

(srfi 276 ⟨library⟩)
procedure
(:features)

Returns a list containing information about the floating-point operations in this library. The following symbols have defined meanings. An implementation may add other features, which should be symbols.

subnormals-are-zero
Subnormal numbers are treated as zero. (This is sometimes called “DAZ,” or “denormals are zero” mode, for historical reasons.)
flush-to-zero
An operation that would underflow and create a subnormal number instead creates a zero. (This is sometimes called “FTZ.”)
ieee-754-2019
Arithmetic compiles with IEEE 754. In particular, the operations that IEEE 754 requires to be correctly rounded are correctly rounded. Must not appear when subnormals-are-zero or flush-to-zero appear.
non-stop
Arithmetic is non-stop (see above).
fast-fma
The function (:+* x y z) is at least as fast as or faster than (:+ (:* x y) z). (Fused multiply-add must be rounded correctly when IEEE 754 compliance mode is on, regardless of whether fast-fma is available.)
⟨name⟩-correctly-rounded where ⟨name⟩ is a procedure from the library without : prefixed
The function ⟨name⟩ is always correctly rounded. (When ieee-754-2019 appears, then features corresponding to functions the IEEE 754 be correctly rounded must not appear.)

Note: DAZ/FTZ modes are usually enabled by the compiler, or are baked-in features of the architecture. As such, this SRFI does not provide a portable way to manipulate this mode.

This should not be confused with the features procedure in the R7RS. This is a run-time procedure that reports on the run-time environment, and the flags may change over the runtime of the program. These flags are not accessible through cond-expand.

(srfi 276 ⟨library⟩)
procedure
(:read-random-flonum binary-input-port [start [end]])

Returns a random flonum between start (default 0) exclusive and end (default 1) exclusive calculated from the bytes from binary-input-port. If the bytes from the port are uniformly distributed, then the resulting flonum is drawn from a uniform distribution of flonums between the two supplied numbers, to the best extent possible.

Rationale: floating-point random number generators may take a variable number of bytes to return an answer: see for example Campbell 2014. Because of this, this procedure cannot take a bytevector. This procedure could take an SRFI 158 generator, but those have issues as described in SRFI 271.

This procedure requires that any flonum between the two ends may be returned with roughly equal probability. This precludes some methods such as filling in the lower 52 bits of a binary64 number, because that does not sample all possible exponents. If one wants this faster (and less accurate) sampling method, one can directly manipulate the structure of the flonum using a bytevector and bytevector-flonum-ref.

It is not possible to pick a random flonum between two arbitrary finite flonums uniformly. It is possible in special cases, including the important case of (0,1): see Goualard 2022 for discussion and an algorithm that attempts to sample from arbitrary intervals as uniformly as possible.

(srfi 276 ⟨library⟩)
procedure
(:round/ties-to-away fl)

Round fl to an integer flonum, with ties broken as in roundTiesToAway. (C99 round).

(:round/ties-to-away 2.5) ⇒ 3.0
(:round 2.5) ⇒ 2.0
(:round/ties-to-away 3.5) ⇒ 4.0
(:round 3.5) ⇒ 4.0

Note: The flround procedure in the R6RS and SRFI 144 implements Scheme’s round ties-to-even behavior, which is the behavior of roundeven in C11.

(srfi 276 ⟨library⟩)
value
:byte-width

Size of the flonum in bytes.

(srfi 276 ⟨library⟩)
procedure
(:bytevector-flonum-ref bv k [endianness])

It is an error if k to k + :byte-width are not valid indices of bv. If endianness is not supplied, it is an error if k is not a multiple of :byte-width.

Read the bytes in bv at k as a flonum of this type, with the endianness.

If the value is a NaN, then the NaN should not be coerced into another NaN.

(import (scheme base) (prefix (srfi 276 binary64) f64:))

(define bv (make-bytevector f64:byte-width))
(bytevector-u8-set! bv 0 #b01000000)
(bytevector-u8-set! bv 1 #b00001001)
(bytevector-u8-set! bv 2 #b00100001)
(bytevector-u8-set! bv 3 #b11111011)
(bytevector-u8-set! bv 4 #b01010100)
(bytevector-u8-set! bv 5 #b01000100)
(bytevector-u8-set! bv 6 #b00101101)
(bytevector-u8-set! bv 7 #b00011000)
(f64:bytevector-flonum-ref bv 0 'big) ⇒ 3.141592653589793116d0

Rationale: Some implementations, in particular those that use NaN boxing, may only be able to represent a limited set of NaNs. There were few requirements on quiet versus signalling NaN formats until 2019. Different systems may have different canonical NaNs. For these reasons portable code should not expect that different NaNs are distinguishable.

(srfi 276 ⟨library⟩)
procedure
(:bytevector-flonum-set! bv k fl [endianness])

It is an error if k to k + :byte-width are not valid indices of bv. If endianness is not supplied, it is an error if k is not a multiple of byte-width.

Write fl to bv at k with endianness.

This procedure and :bytevector-flonum-ref must to round-trip on all non-NaNs. That is, given a non-NaN flonum fl,


  (let ((bv (make-bytevector :byte-width)))
    (:bytevector-flonum-set! bv 0 fl)
    (eqv? fl (:bytevector-flonum-ref bv 0)))

always evaluates to #t. These procedures should round-trip NaNs.

(import (scheme base) (prefix (srfi 276 binary32) f32))

(define bv (make-bytevector f32:byte-width))
(f32:bytevector-flonum-set! bv 0 1.41421353816986083984f0 'little)
bv⇒ #u8(#xf3 #x04 #xb5 #x3f)
(srfi 276 ⟨library⟩)
procedure
(:string->flonum string [radix])

It is an error if radix is not 2, 8, 10, or 16. The value of radix defaults to 10.

Read string as a number in that representation.

(import (scheme base)
        (prefix (srfi 276 binary64) f64)
        (prefix (srfi 276 binary128) f128))
(f128:string->flonum "1e400") ⇒ 1l400
(f64:string->flonum "1e400") ⇒ #fl(binary64 +inf.0)
(srfi 276 ⟨library⟩)
procedure
(:flonum->string fl [radix])

It is an error if radix is not 2, 8, 10, or 16. The value of radix defaults to 10.

Return a string that represents fl in radix. This procedure must round-trip fl with string->flonum in the way that the R7RS specifies for number->string.

Implications of IEEE arithmetic for optimizers

When an implementation advertises that it implements, e.g. sqrt with one rounding, then it must not reorder or optimize the program if it would return a different result. For example, (/ 1 (sqrt x)) may return a different result if implemented as two operations literally, versus as one inverse square root operation. Implementations should offer modes that do not optimize mathematical operations at the expense of reproducibility.

Given the same rounding mode, input values, with ieee-754-2019 and non-stop as features, any set of operations that are correctly rounded will produce the same answers on one correctly conforming implementation as on another with the same rounding mode, input values, and features implicating correct rounding.

Examples

This section is non-normative.

(import (scheme base) (prefix (srfi 276 binary32) f32))
(unless (member 'ieee-754-2019 (f32:features))
  (error "requires IEEE 754 arithmetic"))

(define (f32:kahan-sum lst)
  (do ((sum (f32:flonum 0.0))
       (c (f32:flonum 0.0))
       (lst lst (cdr lst)))
      ((null? lst) sum)
    (let* ((y (fl32:- (car lst) c))
           (t (fl32:+ sum y)))
      (set! c (fl32:- (fl32:- t sum) y))
      (set! sum t))))

This code will always calculate the correct results with the desired algorithmic properties on any conforming implementation. In particular, a conforming implementation will not re-order operations in such a way to make the output values differ.

Considerations for inexact number vectors

SRFI 4 specifies f32vectors and f64vectors, and SRFI 160 specifies c64vectors and c128vectors. Implementors should make the elements of each vector the corresponding representation in the table. If the corresponding cond-expand feature is available, then the elements of the number vector must be that type.

VectorRepresentationcond-expand feature
f32vectorbinary32f32vector-is-binary32
f64vectorbinary64f64vector-is-binary64
c64vector each part is binary32c64vector-is-binary32
c128vectoreach part is binary64c128vector-is-binary64

Reader syntax suggestions

On implementations with binary floating point of the corresponding precisions, the exponent specifiers in the table should map to the corresponding representation:

ExponentRepresentation
sbinary16
fbinary32
dbinary64

Because there is not a lot of hardware with binary128 support, this SRFI makes no recommendation for the l exponent. Some formats that could be used for l include binary128, x87 long double, and so-called “double-double” arithmetic (see Dekker 1971).

On implementations with wildly varying representations, such as decimal floats or posit numbers, one may wish to specify the number format in a precise and portable manner. One way to do so is to modify the grammar of the R7RS to be the following:

  ⟨real R⟩ → ⟨real numeral R⟩
           | ⟨represented flonum R⟩
  ⟨real numeral R⟩ → ⟨sign⟩ ⟨ureal R⟩ | ⟨infnan⟩
  ⟨represented flonum R⟩ → #fl( ⟨representation name⟩ ⟨real numeral R)
  ⟨representation name⟩ → binary16 | binary32 | …

For example, #fl(binary256 1e400) reads as a finite number, while #fl(binary64 1e400) reads the same as #fl(binary64 +inf.0). The syntax allows for complex numbers to be written with mixed precision: for example, #fl(binary32 1.0)+#fl(binary64 2.0)i.

Implementation

An implementation should use the floating-point operations available in hardware as much as possible. Most implementations only have one floating-point type (binary64), and those implementations can copy most of their SRFI 144 implementation to (srfi 276 binary64) with minor renamings.

The simplest way to implement the inspection portion of this SRFI is an FFI to C’s fenv.h. Checking the FTZ/DAZ mode (for example, on Intel CPUs) requires intrinsics to check the MXCSR register.

Although it is possible to implement :bytevector-flonum-ref and :bytevector-flonum-set! in terms of :normalized-fraction-exponent and :make-flonum, it is much easier to manipulate the byte representation of the flonum directly.

A sample implementation will be provided that wraps MIT Scheme’s floating-point environment API.

Bibliography

  1. Taylor Campbell. 2014. Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits. Retrieved from https://mumble.net/~campbell/2014/04/28/uniform-random-float on 2026-06-20.
  2. T. J. Dekker. 1971. A Floating-Point Technique for Extending the Available Precision. Numer. Math, 18, 224-242. doi:10.1007/BF01397083.
  3. Laurent Fousse et al. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw., 33, 2. doi:10.1145/1236463.1236468. The version referenced in this SRFI is 4.2.2.
  4. Frédéric Goualard. 2022 Drawing random floating-point numbers from an interval. ACM Transactions on Modeling and Computer Simulation, 32 (3). hal-03282794v2
  5. John Gustafson et al. 2022. Standard for Posit Arithmetic. Retrieved from https://posithub.org/docs/posit_standard-2.pdf on 2026-06-20.
  6. IEEE Computer Society. 2019. IEEE Standard for Floating-Point Arithmetic (IEEE STD 754-2019). doi:10.1109/IEEESTD.2019.8766229. ISBN 978-1-5044-5924-2.
  7. William Kahan. 1997. Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic. Retrieved from https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF on 2026-06-20.
  8. William Kahan and Joseph Darcy. 1998. How Java’s Floating-Point Hurts Everyone. Retrieved from https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf on 2026-06-20.
  9. Massachusetts Institute of Technology. 2022. Fixnum and Flonum Operations in MIT/GNU Scheme. Retrieved from https://www.gnu.org/software/mit-scheme/documentation/stable/mit-scheme-ref/Fixnum-and-Flonum-Operations.html on 2026-06-20.
  10. Niko Zurstraßen. 2023. Evaluation of the RISC-V Floating Point Extensions F/D. Retrieved from https://www.chciken.com/risc-v/2023/08/06/evaluation-riscv-fd.html on 2026-06-20.

Acknowledgements

Thanks to those in Working Group 2 for discussing the semantics of this SRFI. In particular, I would like to thank Zhu Zihao for lots of information gathering.

I thank Bradley Lucier for his input.

I thank the authors of SRFI 144, as this work builds on theirs.

I also thank William Kahan, whose work on IEEE 754 and his many complaints about how programming language designers fail to understand it influenced the design of this SRFI (even if I could not incorporate all of his suggestions).

© 2026 Peter McGoron.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler