A note on confluent relations
Or, what's so nice about filtered colimits
It’s been a while since I daydreamed on lambda calculus, but old habits die hard. The other day I made an interesting observation about a notion from category theory, filtered colimits, and a notion from proof theory / programming language theory, that of confluence. I assume this idea is well-known to professionals, though my brain just never explicitly made the connection. Plus, as usual, this whole post is just an excuse to write about some fun math!
Reminder on colimits of sets
To give some context, recall that colimits in general are all about “gluing” objects together. See my past blogpost for more on that:
This becomes particularly clear in the category of sets, where colimits can be formed by taking the disjoint union of all the sets in the diagram, followed by quotienting (or “identifying”) elements according to some rule, determined by the arrows of the diagram.
Let $\mathbf{I}$ be some small category (the shape of the diagram) and $D : \mathbf{I} \to \mathbf{Set}$ a functor (the diagram itself). Then the colimit of $D$ is, up to isomorphism, given by
\[\coprod_{i\in\mathbf{I}}{D(i)}\bigg/\sim\]where $\sim$ is the equivalence relation generated by saying: for each morphism $f : i_1 \to i_2$ in $\mathbf{I}$, and each element $x \in D(i_1)$, we take $x \sim D(f)x$.
Pay attention to the phrase “generated by”: in general, the direct definition of $\sim$ written here is not by itself an equivalence relation. Firstly it’s not symmetric in general, but that’s not such a big deal to fix. It is also always reflexive (using the identity morphisms in $\mathbf{I}$). The real issue is in transitivity.
Example: A pushout is a colimit of a diagram whose shape is a “span”; $\mathbf{I} = \{\bullet \leftarrow \bullet \rightarrow \bullet\}$. Consider the following pushout of sets:
The two maps in the span are the unique ones possible: $x \mapsto y$ and $x \mapsto z$. Hence the definition of $\sim$ in this case readily implies $x \sim y$ and $x \sim z$. However without taking the equivalence relation generated by $\sim$, we will never have $y \sim z$, for the two sets containing them are not even connected by any morphism!
This example might motivate us to look for a well-behaved class of diagrams, whose equivalence relation can be obtained more directly from their shape. Clearly the issue in pushouts is that there’s one element ($x$) which “diverges” to two targets ($y$ and $z$) without being able to “converge” back.
Confluence
In proof theory & programming language theory (PLT), one of the fundamental notions is that of confluence. To explain what it means, I shall briefly review some basics.
Our goal in PLT is to define “programming languages” as formal languages, amenable to rigorous study by mathematical methods, and derive interesting/desirable properties of such languages. The staple example for a programming language is the so-called lambda calculus, which has one basic structure at its core known as the lambda abstraction. This is written as $\lambda x. e$ where $x$ is a variable name, and $e$ is some expression which may contain occurances of the variable $x$. Mathematicians would more traditionally write this as $x \mapsto e$. Programmers would call such thing an “anonymous function”, and the particular notations vary between languages.
There’s essentially one thing you can do with a lambda abstraction, namely, evaluate it. If we denote evaluation by juxtaposition, then we expect it to look like so:
\[(\lambda x. e)t \qquad\rightsquigarrow\qquad e[t/x]\]The square bracket notation is for substitution1; replace every occurence of the variable $x$ in the function’s body by the new parameter $t$. In professional lingo, we’d call this rule a $\beta$-reduction. Often you’d want to have another rule, known as $\eta$-reduction, which is sometimes interpreted as extensionality for lambda functions:
\[\lambda x. f\; x \qquad\rightsquigarrow\qquad f\]That is, if your lambda function just takes a variable and immediately passes it to another function, then you can just take that inner function directly. Another type of rule you might worry about is $\alpha$-equivalence, which says that if you rename some variable in a consistent manner (replace all its occurences, and ensure it doesn’t clash with existing variable names), then the resulting expression should be equivalent to the original one.
As you add more features to your programming language you’d have more and more sorts of expressions that should be considered equal, while being syntactically distinct. We’d like to have then some equivalence relation on the set of syntactic expressions which encodes all these reductions. Mathematically, this would be defined by three steps:
- Define the “individual reductions” directly on language constructs – that’s what we did above.
- Extend this to arbitrary expressions by allowing to apply those reductions on subterms (e.g. if $e$ can be $\beta$-reduced to $e’$, then $f\;e$ should be reducible to $f\;e’$).
- Finally, to get an actual equivalence relation, take the reflexive-transitive closure.
For many purposes, like studying the computational behavior of your language or, you know, coding a real life compiler for it, this is a tad too abstract. What we really want is some procedure which determines in finite time whether two expressions are convertible to one another using the $\beta\eta$-reduction rules (or whatever else the language supports).
Here’s an idea for how this might look. A subexpression is called a redex if there is some “individual reduction” applicable to it. An expression is called normal or in normal form if it does not contain any redex. Note that reductions in this sense are directed, i.e. we can always “unreduce” $f \rightsquigarrow \lambda x. f\; x$ but this in some sense increases the complexity of the expression, so we don’t actually wanna do that. Now the desired algorithm can be described as follows:
Normalization algorithm: given an expression $e$ in your language, recursively look at all of its subexpressions until you find a redex. Then, apply on it the appropriate reduction rule. Repeat this until the expression contains no more redexes. Finally, two expressions would be equivalent if and only if they have the same normal form.
Sounds easy, doesn’t it? Well, the proposed algorithm has two glaring problems. First, what guarantees that it terminates? Take for instance the following lambda term:
\[(\lambda x. x\; x)(\lambda y. y\; y)\]If you apply $\beta$-reduction, you would obtain:
\[\stackrel{\beta}{\rightsquigarrow}\quad (x\; x)[(\lambda y. y\; y)/x] \quad\rightsquigarrow\quad (\lambda y. y\; y)(\lambda y. y\; y)\]Using $\alpha$-equivalence, the result may be rewritten in a slightly friendlier form: rename the variable $y$ of the first lambda abstraction to $x$. The result is
\[(\lambda x. x\; x)(\lambda y. y\; y)\]which is exactly what we started from! This seems bad, but there is a solution: namely, introduce a type system to your language. Though that’s already outside the scope of this post. The point is that we can slightly restrict our language to ensure that any sequence of reductions will terminate after a finite number of steps. This property of a language is called strong normalization.
More rigorously, we can discuss the reduction graph of any expression $e$: this is a directed graph whose nodes are expressions obtainable from $e$ via a sequence of reduction, and two expressions are connected by an edge if one is obtained from the other by a reduction. Then $e$ is called strongly normalizing if its reduction graph is finite.
The second problem with my proposal is in the precise structure of the reduction graph; what happens if it doesn’t have a terminal node? That is, what if an expression has more than one normal form? This can occur when there are two reductions $e \rightsquigarrow e_1$ and $e \rightsquigarrow e_2$, but there does not exist any $e’$ such that $e_1 \rightsquigarrow e’$ and $e_2 \rightsquigarrow e’$. This should certainly remind you of the “divergence” issue that we already encountered in pushouts!
A reduction relation such as $\rightsquigarrow$ is called confluent if such $e’$ always exists. Fortunately, languages based on simply-typed lambda calculus tend to be confluent. For instance if you start with an expression like $f\; e$, then you might be able to reduce $f\;e \rightsquigarrow f’\; e$ or $f\;e \rightsquigarrow f\; e’$. But those results both reduce to $f’\; e’$, so there’s no problem. Note that this also has practical significance in terms of determinism: when you implement a normalization algorithm, it shouldn’t matter which redex you start from.
To summarize,
Define the normalization algorithm as before. If your language is confluent, and has strong normalization, then the relation defined by comparing normalizations is an equivalence relation, and it coincides with the three-step definition that we started from.
Aside: If you’d like to learn more on the confluence of $\lambda$-calculi, I can recommend the first chapter of Basic Proof Theory by Troelstra-Schwichtenberg. Also apparently back in 2023 I started writing a formal proof in the proof assistant Agda for confluence of simply-typed lambda calculus, enhanced with boolean types and if-else statements, but then sadly I completely forgot about it and left it unfinished. Still, it contains a (hopefully-)intuitive explanation of the main ideas behind the proof, so you might be able to get something out of it:
Filtered colimits of sets
At last, let’s see the definition of filtered colimits. This is very similar to the definition of “confluence” given above, except we have to be a bit careful. In the olden days, back when colimits were still called “directed limits”, their underlying diagram was always a poset. In our language, this means the category $\mathbf{I}$ has at most one morphism in each homset. We write $i_1 \leq i_2$ if $\operatorname{Hom}_{\mathbf{I}}(i_1,i_2)$ is nonempty. Diagrams like $\mathbf{I}$ are sometimes called posetal since $\leq$ is a partial order on their objects.
Being filtered (which is called directed in the poset case) is completely parallel to being confluent:
Definition: A poset $\mathbf{I}$ is called directed if for any two $i_1,i_2$ there exists $i’$ such that $i_1 \leq i’$ and $i_2 \leq i’$.
In general, we may wanna take colimits over non-posetal diagrams, so this definition is not enough.
Definition: An arbitrary small category $\mathbf{I}$ is called filtered if:
- For any two $i_1,i_2$ there exists an object $i’$ and morphisms $i_1 \to i’$ and $i_2 \to i’$.
- For any two $i,i’$ and morphisms $f_1,f_2:i\rightrightarrows t’$ between them, there exists an object $i’’$ and a morphism $g:i’\to i’’$ such that $g\circ f_1 = g\circ f_2$.
These two conditions are equivalent to requiring that for every subdiagram $S : \mathbf{J} \to \mathbf{I}$ there exists a cocone, i.e. an object $n \in \mathbf{I}$ (called the nadir of the cocone) and a natural transformation $S \Rightarrow \rm{const}_n$ from $S$ to the constant functor.
The main result about filtered colimits, which by now should be obscenely unsurprising, is:
Theorem: If $\mathbf{I}$ is a filtered small category and $D : \mathbf{I} \to \mathbf{Set}$ is a diagram of sets, then
\[\mathop{\operatorname{colim}}_{i\in \mathbf{I}}{D(i)} \cong \coprod_{i\in \mathbf{I}}{D(i)}\bigg/\sim\]where $\sim$ is the equivalence relation defined by setting: for all $x \in D(i_1)$ and $y \in D(i_2)$, decree $x \sim y$ iff there’s an object $i’$ and morphisms $f_1 : i_1 \to i’$ and $f_2 : i_2 \to i’$ such that $D(f_1)(x) = D(f_2)(y)$.
Part of the theorem is that this definition of $\sim$ really does give an equivalence relation.
This whole discussion was just about diagrams of sets, but the same theorem tends to hold in other categories whose objects have an underlying set. For some examples, see Theorem 7.4 & the surrounding discussion in Altman-Kleiman’s commutative algebra book.
Footnotes
-
Be warned that substitution has WAY TOO MANY NOTATIONS… The notation I use seems to be the most standard nowadays. How I remember it is by pretending like it’s actual multiplication by a fraction; the denominator gets cancelled out, and the numerator takes its place. ↩