Universal and Complement Sets in Dunaj

Homepage Blog Consulting Services RSS Feed

My latest open source project called Dunaj aims to provide an alternative core API for Clojure. I’ve written a lot about it’s main features and goals, but Dunaj is also full of small useful features and improvements. One of such features is the introduction of universal and complement sets, an addition which enables the representation of identity elements in reduction functions that handle set items.

But first, a little detour.

The reduction of a collection is among the most important features in Clojure, and a lot of effort has been made to make the reduction simple, efficient and easy to use. While the reduce function is in the Clojure since it’s beginning, each new Clojure version further enhances the reduction process. One particularly notable thing is a gradual change of the reduction semantics with regards to the initial value.

The story of reduce

The reduce function, also known as left fold, takes 2 or 3 arguments, with initial value being the optional one.

Using reduce to compute the sum of vector of numbers


  1
2
3
4
5

  (reduce + 0 [1 2 3])
;;=> 6

(reduce + [1 2 3])
;;=> 6

The reduction process iteratively applies provided function f to two arguments: the intermediate result of the reduction and the next unprocessed item in a given collection coll. At the beginning of the reduction, the initial value val is used in place of the intermediate result. The first code snippet above is efectivelly the same as calling (+ (+ (+ 0 1) 2) 3)

Illustrating the reduction process with a threading macro


  1
2
3
4

  (-> (+ 0 1)
    (+ 2)
    (+ 3))
;;=> 6

The initial value val does not have to be provided. Clojure follows the semantics of the Common Lisp, that states:

If initial-value is supplied, it is logically placed before the subsequence and included in the reduction operation.

— Function REDUCE
Common Lisp HyperSpec

This means that the reduction process starts by applying f to the first two collection items, and the initial value is treated as a first item in the provided collection. Easy at first, this behavior has multiple edge cases that have to be treated specially, and Clojure consistently follows Common Lisp in the way how these cases are handled.

Table 1. Different behavior of `reduce`
`val`	`coll`	`reduce` behavior
supplied	not empty	normal operation
supplied	empty	returns `val`, `f` is not called
not given	empty	`f` is called with no arguments
not given	> 1 items	`f` is called on first 2 arguments, then proceed as usual
not given	1 item only	first item is returned, `f` is not called

The rules are not that hard to understand and the behavior is not surprising. What this complicates is however the underlying implementation of reduce.

Simple, easy and fast. Pick two.

The initial implementation of reduce converted input coll into seq and handled all the special cases by itself. As the generic reduction of seqs is not a very efficient operation, collection types that can do better could implement IReduce interface and provide more performant implementation. It is important to note that regardless of the underlying collection type, the conversion to seq happened every time and the custom implementations of IReduce had to provide 2 separate implementations of reduce, based on whether the initial value was given or not.

In the version 1.1, The role of IReduce was diminished and chunked sequences were introduced in an attempt to provide fast reductions with less boilerplate. Collection types implementing IChunk interface only needed to provide one version of reduce, and could assume that the initial value is always given. Clojure 1.3 have increased the flexibility of custom reduce implementations by providing a level of indirection and introduced InternalReduce protocol. Once again, custom implementations only had to handle case with the initial value provided.

In an attempt to remove the initial step of conversion to seq completely, a new protocol called CollReduce has been created in Clojure 1.4, which however has 2 methods, one without initial value and one for cases where the initial value was provided.

What is a collection anyway?

The concept of reducers introduced in Clojure 1.5 made the reduction process a central part of the collection related API. The collection itself was abstracted as something that is reducible, and whole new API was built around the concept of composing and reducing such collections. There was no longer a universal reduce algorithm and the way how collection is reduced was pushed completely to the collection itself.

With this however, the old semantics of reduce and its edge cases complicate the implementation and hinder the composability of reducers. To solve this, a separate clojure.core.reducers/reduce function was introduced, with a slight change in semantics. A function f is called with no arguments whenever there is no initial value provided.

The upcoming transducers feature in Clojure 1.7 will again change the way how reduce works. Clojure has come a full circle and it once again checks whether coll implements IReduce interface. It is now understood that the new reduce semantics introduced in 1.5 are superior and a separate interface called IReduceInit was created to handle new reduction semantics in a more clean and simple way.

Collections in Dunaj

As Dunaj provides an alternative core API, it could break free from the complicated set of reduction related protocols and interfaces, and provides single IRed protocol that performs the reduction. The resulting design is far more simpler and easier to understand, and together with batched reduction provides even more performant reductions.

For more information, see write-ups about reducers first approach and host optimizations that were added in Dunaj. More details about mechanisms of Dunaj’s handling of collection will be subject to upcoming blog posts.

Rediscovering monoids

reduce function takes three arguments: the reduction function f, the initial value val and the (reducible) collection coll.

Function signature for reduce


  1
2

  (reduce f coll)
(reduce f val coll)

The current trend in Clojure is that the reduction function f should provide both a binary reduction operation and an identity element, returned when the f is called with no arguments. This will be even more desirable with the introduction of transducers that are coming in Clojure 1.7. Clojure’s collection API is now slowly being enriched with support for identity elements into functions such as conj.

By introducing Universal and Complement Sets, Dunaj enables the creation of reduction functions that work on sets and provide sane identity element. This allows for seamless and streamlined set handling in reducers and transducers.


  1
2

  (reduce dunaj.set/intersection [#{0 1 2} #{1 2 3} #{2 3 4}])
;;=> #{2}

Universal and Complement Sets

Universal set is defined as a set that contains all objects. Defined in dunaj.set/U, the universal set can be used in any collection or set related functions. For cases where the usage of universal set is not appropriate, an exception is thrown. Dunaj uses 𝕌 as a notation for the universal set.


  1
2
3
4
5
6
7
8

  dunaj.set/U
;;=> 𝕌

(conj dunaj.set/U :foo)
;;=> 𝕌

(seq dunaj.set/U)
;; Unhandled java.lang.UnsupportedOperationException: seq is not supported on universal set

Universal set is used as an identity element for dunaj.set/intersection function.

In addition to the Universal set, Dunaj provides the implementation for absolute complement sets, that represent sets that contain all objects except ones that are explicitly mentioned by enumeration. Dunaj uses a superscript ᶜ suffix for the notation of complement sets.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

  (ns foo.bar
  (:api dunaj)
  (:require [dunaj.set :as ds]))

(ds/set-complement #{})
;;=> 𝕌

(ds/difference ds/U #{1 2})
;;=> #{1 2}ᶜ

(ds/union #{1} (disj ds/U 3) (disj ds/U 4 3) #{4})
;;=> #{3}ᶜ

(ds/union (disj ds/U 3 4) (disj ds/U 4) #{4})
;;=> 𝕌

(ds/intersection (disj ds/U 3) (disj ds/U 4 3))
;;=> #{4 3}ᶜ

(ds/intersection #{4 5} (disj ds/U 3 5) #{3 4})
;;=> #{4}

The API related to sets is in Dunaj defined in the dunaj.set namespace. The implementation of universal and complement sets can be found in the respective dunaj/set.clj file.

This post was published on April 2015. Back to the Blog home