Title

Ephemerons

Author

John Cowan

Status

This SRFI is currently in final status. Here is an explanation of each status that a SRFI can hold.

To provide input on this SRFI, please send email to srfi-124@nospamsrfi.schemers.org. To subscribe to the list, follow these instructions. You can access previous messages via the mailing list archive.

Abstract

An ephemeron is an object with two components called its key and its datum. It differs from an ordinary pair as follows: if the garbage collector (GC) can prove that there are no references to the key except from the ephemeron itself and possibly from the datum, then it is free to break the ephemeron, dropping its reference to both key and datum. In other words, an ephemeron can be broken when nobody else cares about its key. Ephemerons can be used to construct weak vectors or lists and (possibly in combination with finalizers) weak hash tables.

Much of this specification is derived with thanks from the MIT Scheme Reference Manual.

Issues

None at present.

Rationale

Weak references are a mechanism for building data structures that point at objects without protecting them from garbage collection. An example of such a data structure might be an entry in a lookup table that should be removed if the rest of the program does not reference its key. Such an entry must still point at its key to carry out comparisons, but should not in itself prevent its key from being garbage collected.

A weak reference is a reference that points at an object without preventing it from being garbage collected. The term strong reference is used to distinguish normal references from weak ones. If there is no path of strong references to some object, the garbage collector will reclaim that object and mark any weak references to it to indicate that it has been reclaimed.

If there is a path of strong references from an object A to an object B, A is said to hold B strongly. If there is a path of references from an object A to an object B, but every such path traverses at least one weak reference, A is said to hold B weakly.

Ephemerons, unlike simple weak pairs, allow the correct handling of data structures in which the value points to the key. For example, a record may have a single field holding the key; if it's desired to use these records as values in a weak hash table using this internal key, ephemerons are necessary or the table will not really be weak.

Ephemerons are considerably heavier-weight than simple weak pairs, because garbage-collecting ephemerons is more complicated than garbage-collecting weak pairs. In MIT Scheme, each ephemeron needs five words of storage rather than the two words needed by a weak pair. However, while the GC needs to spend more time on ephemerons than on other objects, the amount of time it spends on ephemerons can be made to scale linearly with the number of live ephemerons, which is how a copying GC's running time scales with the total number of live objects anyway.

Specification

(ephemeron? object)

Returns #t if object is an ephemeron; otherwise returns #f.

(make-ephemeron key datum)

Returns a newly allocated ephemeron, with components key and datum. Note that if key and datum are the same in the sense of eq?, the ephemeron is effectively a weak reference to the object.

(ephemeron-broken? ephemeron)

Returns #t if ephemeron has been broken; otherwise returns #f.

This procedure must be used with care. If it returns #f, that guarantees only that prior evaluations of ephemeron-key or ephemeron-datum yielded the key or datum that was stored in ephemeron. However, it makes no guarantees about subsequent calls to ephemeron-key or ephemeron-datum, because the GC may run and break the ephemeron immediately after ephemeron-broken? returns. Thus, the correct idiom to fetch an ephemeron's key and datum and use them if the ephemeron is not broken is:

     (let ((key (ephemeron-key ephemeron))
           (datum (ephemeron-datum ephemeron)))
       (if (ephemeron-broken? ephemeron)
           ... broken case ...
           ... code using key and datum ...))

(ephemeron-key ephemeron)

(ephemeron-datum ephemeron)

These return the key or datum component, respectively, of ephemeron. If ephemeron has been broken, these operations return #f, but they can also return #f if that is what was stored as the key or datum.

(reference-barrier key)

This procedure ensures that the garbage collector does not break an ephemeron containing an unreferenced key before a certain point in a program. The program can invoke a reference barrier on the key by calling this procedure, which guarantees that even if the program does not use the key, it will be considered strongly reachable until after reference-barrier returns.

Because one cannot reliably operate on ephemerons in a portable way without calling the procedure reference-barrier, reference-barrier is mandatory.

The following is an example where reference-barrier is needed:

;; Return the datum component of the ephemeron if its key component is
KEY. Return #f otherwise.
(define (ephemeron-ref ephemeron key)
  (let ((k (ephemeron-key ephemeron))
        (d (ephemeron-datum ephemeron)))
    (and (not (ephemeron-broken? ephemeron))
         (eq? key k)
         (reference-barrier k)
         d)))

In this procedure, the evaluation of (eq? key k) is side-effect-free, so it can be evaluated at any time after k has been retrieved, in particular before the call to ephemeron-broken?. Without the call to reference-barrier, key may, therefore, be no longer referenced when ephemeron-broken? is called. In that case, this predicate would return #f. In other words, without the call to reference-barrier, a Scheme implementation may implement the above procedure as (define (ephemeron-ref ephemeron key) #f).

Implementation

Ephemerons are currently available in MIT Scheme, in Racket, and in Chibi, though current versions of Chibi have a bug; the Chibi GC will not break an ephemeron properly if its datum refers to its key. The MIT version allows ephemeron keys and datums to be mutated using set-ephemeron-key! and set-ephemeron-datum!, but this SRFI does not require this capacity, as it is tricky to implement on systems where the GC runs in a separate thread. The effect of mutable ephemerons can be achieved using immutable ephemerons that hold SRFI 111 mutable boxes, provided care is taken to always point to the box and not to its contents.

The implementation in ephemerons-trivial.scm does not have any hooks to the GC, but it is still a correct implementation, because there are no guarantees that the GC will ever break any ephemerons, or run at all, or even exist. (Thanks to Will Clinger for this insight.) It is portable to any R7RS-small or R5RS + SRFI 9 system.

The implementation in ephemerons-racket.scm layers this SRFI's semantics on top of Racket's native ephemerons. The idea here is that the native-level ephemeron value is a pair containing the key and the datum, so that the key can be reliably retrieved and a broken ephemeron can be distinguished from one whose key or value is #f. Reference barriers are not supported.

The files ephemeron-hash.scm and ephemeron-tree.scm are by Taylor Campbell, and constitute a demonstration written in pseudo-Scheme of how ephemerons can be implemented in full. He has offered to provide assistance in transforming them into Pre-Scheme for Scheme48.

Ephemerons with built-in finalizers are also available in GHC's implementation of Haskell under the name of weak pointers.

References

The original paper on ephemerons is Barry Hayes, "Ephemerons: a New Finalization Mechanism", Object-Oriented Languages, Programming, Systems, and Applications, 1997.

A useful implementation paper is Bruno Haible, "Weak References: Data Types and Implementation" posted 2005-04-24.

"Eliminating Cycles in Weak Tables" is about the addition of ephemeron tables to Lua 5.2. It explains how Lua's tricolor mark-sweep GC is extended to handle them, unfortunately using the O(N^2) algorithm of Hayes 1997.

"Extending Garbage Collection to Complex Data Structures" extends ephemerons, which have a fixed GC strategy, to a new data structure called blobs that explain to the GC how they are to be processed, thus allowing an arbitrary number of key-like and value-like pointers.

Verse

Reclaimer, spare that tree!
Take not a single bit!
It used to point to me,
Now I'm protecting it.
It was the reader's cons
That made it, paired by dot;
Now, GC, for the nonce,
Thou shalt reclaim it not.

That old familiar tree,
Whose cdrs and whose cars
Are spread o'er memory —
And wouldst thou it unparse?
GC, cease and desist!
In it no free list store;
Oh spare that moby list
Now pointing throughout core!

It was my parent tree
When it was circular;
It pointed then to me:
I was its cadadr.
My cdr was a list,
My car a dotted pair —
That tree will sore be missed
If it remains not there.

And now I to thee point,
A saving root, old friend!
Thou shalt remain disjoint
From free lists to the end.
Old tree! The sweep still brave!
And, GC, mark this well:
While I exist to save,
Thou shan't reclaim one cell.

The Great Quux (with apologies to George Pope Morris)

(Pedantic note: In Scheme, pairs constructed by read are officially immutable, so this scenario is technically impossible.)

Copyright

Copyright (C) John Cowan (2015).

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


Editor: Arthur A. Gleckler