Data
- Title: Equivariant neural networks and piecewise linear representation theory
- Authors: Joel Gibson, Daniel Tubbenhauer and Geordie Williamson
- Status: To appear in Contemp. Math. Last update: Thu, 1 Aug 2024 23:08:37 UTC
- Code and (possibly empty) Erratum: Click and Click
- ArXiv link: https://arxiv.org/abs/2408.00949
Abstract
Equivariant neural networks are neural networks with symmetry. Motivated by the theory of group representations, we decompose the layers of an equivariant neural network into simple representations. The nonlinear activation functions lead to interesting nonlinear equivariant maps between simple representations. For example, the rectified linear unit (ReLU) gives rise to piecewise linear maps. We show that these considerations lead to a filtration of equivariant neural networks, generalizing Fourier series. This observation might provide a useful tool for interpreting equivariant neural networks.
A few extra words
Neural networks provide flexible and powerful means of approximating a
function. In many applications, one wants to learn a function that is
invariant or equivariant with respect to some symmetries. A
prototypical example is image recognition, where problems are often
invariant under translation. Equivariant neural networks
provide a flexible framework for learning such invariant or
equivariant functions.
Equivariant neural networks can be studied using the
mathematical theory of representation theory. (The
mathematical concept of representation is different from the
typical meaning of ``representation'' in machine learning.
In this paper we exclusively use the term in the mathematical sense.)
In representation theory, simple representations provide the
irreducible atoms of the theory. A main strategy in representation
theory is to take a problem, decompose it into simple representations, and study
the problem on these basic pieces separately. As we will see, this doesn't quite work
for equivariant neural networks: their nonlinear nature allows for
interaction between simple representations, which is
impossible in the linear world.
However, we will argue in this paper that decomposing
the layers of an equivariant neural network into
simple representations is still a very interesting
thing to do. We are led
naturally to the study of piecewise linear maps
between simple representations and \emph{piecewise linear representation theory}.
In concrete terms, the decomposition
into simple representations leads to a new basis of the
layers of a neural network, generalizing the
Fourier transform. We hope that this
new basis provides a useful tool to understand and interpret
equivariant neural networks.
As an example, consider the small vanilla neural network (we often omit labels):
The key motivation for building equivariant neural networks is to replace \(\mathbb{R}\) and \(w\) by more complicated objects with more symmetry. For example, consider the replacement
- \(\mathbb{R}\) \(\rightsquigarrow\) a suitable space \(Fun\)) of functions on \(\mathbb{R}\),
- \(w\) \(\rightsquigarrow\) a convolution operator \(c_{\ast}\colon Fun\to Fun\),
- \(f\) \(\rightsquigarrow\) an activation function \(Fun\to Fun\)) with \(\gamma\mapsto f\circ\gamma\).
Now assume for a moment that our function is periodic of period \(2\pi\). It is very natural to ask what happens to our neural network when we expand it in terms of Fourier series. A fundamental result in Fourier theory is that convolution operators become diagonal in the Fourier basis. Hence, in order to understand how signals flow through the neural network, it remains to understand how the activation function acts on the fundamental frequencies.
A basic, but key, observation is that the Fourier series of \(f(\sin(x))\) only involves terms of higher resonant frequency: (This shows the first few Fourier series terms of \(f(\sin(x))\) when \(f\) is the rectifier linear unit ReLU.) This is very similar to what happens when we pluck a string on a guitar: one has a fundamental frequency corresponding to the note played, as well as higher frequencies (overtones, similar to the bottom three pictures above) which combine to give the distinctive timbre of the guitar.
We show in general that in equivariant neural networks one has a flow of lower frequency to higher resonant frequency, but not conversely: