Universal and Complement Sets in Dunaj
My latest open source project called Dunaj aims to provide an alternative core API for Clojure. I’ve written a lot about it’s main features and goals, but Dunaj is also full of small useful features and improvements. One of such features is the introduction of universal and complement sets, an addition which enables the representation of identity elements in reduction functions that handle set items.
But first, a little detour.
The reduction of a collection is among the most important features
in Clojure, and a lot of effort has been made to make the reduction
simple, efficient and easy to use. While the reduce
function is
in the Clojure since it’s beginning, each new Clojure version further
enhances the reduction process.
One particularly notable thing is a gradual change of the reduction
semantics with regards to the initial value.
The story of reduce
The reduce
function, also known as left fold, takes 2 or 3 arguments, with initial value being the
optional one.
reduce
to compute the sum of vector of numbers
1
2
3
4
5
(reduce + 0 [1 2 3])
;;=> 6
(reduce + [1 2 3])
;;=> 6
The reduction process iteratively applies provided function f
to
two arguments: the intermediate result of the reduction and the next
unprocessed item in a given collection coll
. At the beginning of
the reduction, the initial value val
is used in place of the
intermediate result.
The first code snippet above is efectivelly the same as
calling (+ (+ (+ 0 1) 2) 3)
1
2
3
4
(-> (+ 0 1)
(+ 2)
(+ 3))
;;=> 6
The initial value val
does not have to be provided.
Clojure follows the semantics of the Common Lisp, that states:
If initial-value
is supplied, it is logically placed before the
subsequence and included in the reduction operation.
Common Lisp HyperSpec
This means that the reduction process starts by applying f
to the first two collection items, and the initial value is treated as
a first item in the provided collection.
Easy at first, this behavior has multiple edge cases that have
to be treated specially, and Clojure consistently follows Common Lisp
in the way how these cases are handled.
val |
coll |
reduce behavior |
---|---|---|
supplied |
not empty |
normal operation |
supplied |
empty |
returns |
not given |
empty |
|
not given |
> 1 items |
|
not given |
1 item only |
first item is returned, |
The rules are not that hard to understand and the behavior is not
surprising. What this complicates is however the underlying
implementation of reduce
.
Simple, easy and fast. Pick two.
The initial implementation of reduce
converted input coll
into seq and handled all the special cases by itself. As the
generic reduction of seqs is not a very efficient operation,
collection types that can do better could implement
IReduce
interface and provide more performant
implementation. It is important to note that regardless of the
underlying collection type, the conversion to seq happened every
time and the custom implementations of IReduce
had to provide 2
separate implementations of reduce, based on whether the initial value
was given or not.
In the version 1.1, The role of IReduce was diminished and chunked
sequences were introduced in an attempt to provide fast reductions
with less boilerplate. Collection types implementing
IChunk
interface only needed to provide one version
of reduce, and could assume that the initial value is always given.
Clojure 1.3 have increased the flexibility of custom reduce
implementations by providing a level of indirection and introduced
InternalReduce
protocol. Once again, custom
implementations only had to handle case with the initial value
provided.
In an attempt to remove the initial step of conversion to seq
completely, a new protocol called
CollReduce
has been created in Clojure 1.4,
which however has 2 methods, one without initial value and one
for cases where the initial value was provided.
What is a collection anyway?
The concept of reducers introduced in Clojure 1.5 made the reduction process a central part of the collection related API. The collection itself was abstracted as something that is reducible, and whole new API was built around the concept of composing and reducing such collections. There was no longer a universal reduce algorithm and the way how collection is reduced was pushed completely to the collection itself.
With this however, the old semantics of reduce
and its edge cases
complicate the implementation and hinder the composability of
reducers. To solve this, a separate clojure.core.reducers/reduce
function was introduced, with a slight change in
semantics. A function f
is called with no arguments whenever
there is no initial value provided.
The upcoming transducers feature in Clojure 1.7 will again change
the way how reduce
works. Clojure has come a full circle and
it once again checks whether coll
implements IReduce
interface. It is now understood that the new reduce semantics
introduced in 1.5 are superior and a separate interface called
IReduceInit
was created to handle new reduction
semantics in a more clean and simple way.
Rediscovering monoids
reduce
function takes three arguments: the reduction function f
,
the initial value val
and the (reducible) collection coll
.
reduce
1
2
(reduce f coll)
(reduce f val coll)
The current trend in Clojure is that the reduction function f
should provide both a binary reduction operation and an identity
element, returned when the f
is called with no arguments.
This will be even more desirable with the introduction of
transducers that are coming in Clojure 1.7.
Clojure’s collection API is now slowly being enriched with
support for identity
elements into functions such as conj.
By introducing Universal and Complement Sets, Dunaj enables the creation of reduction functions that work on sets and provide sane identity element. This allows for seamless and streamlined set handling in reducers and transducers.
1
2
(reduce dunaj.set/intersection [#{0 1 2} #{1 2 3} #{2 3 4}])
;;=> #{2}
Universal and Complement Sets
Universal set is defined as a set that contains all objects.
Defined in dunaj.set/U
,
the universal set can be used in any collection or set related
functions. For cases where the usage of universal set is not
appropriate, an exception is thrown.
Dunaj uses ๐
as a notation for the universal set.
1
2
3
4
5
6
7
8
dunaj.set/U
;;=> ๐
(conj dunaj.set/U :foo)
;;=> ๐
(seq dunaj.set/U)
;; Unhandled java.lang.UnsupportedOperationException: seq is not supported on universal set
Universal set is used as an identity element for dunaj.set/intersection
function.
In addition to the Universal set, Dunaj provides the implementation
for absolute complement sets, that represent sets that contain all
objects except ones that are explicitly mentioned by enumeration.
Dunaj uses a superscript แถ
suffix for the notation of complement
sets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(ns foo.bar
(:api dunaj)
(:require [dunaj.set :as ds]))
(ds/set-complement #{})
;;=> ๐
(ds/difference ds/U #{1 2})
;;=> #{1 2}แถ
(ds/union #{1} (disj ds/U 3) (disj ds/U 4 3) #{4})
;;=> #{3}แถ
(ds/union (disj ds/U 3 4) (disj ds/U 4) #{4})
;;=> ๐
(ds/intersection (disj ds/U 3) (disj ds/U 4 3))
;;=> #{4 3}แถ
(ds/intersection #{4 5} (disj ds/U 3 5) #{3 4})
;;=> #{4}
The API related to sets is in Dunaj defined in the dunaj.set
namespace. The implementation
of universal and complement sets can be found in the respective
dunaj/set.clj
file.
This post was published on April 2015. Back to the Blog home