Data

  • Title: Equivariant neural networks and piecewise linear representation theory
  • Authors: Joel Gibson, Daniel Tubbenhauer and Geordie Williamson
  • Status: preprint. Last update: Thu, 1 Aug 2024 23:08:37 UTC
  • Code and (possibly empty) Erratum: Click and Click
  • ArXiv link: https://arxiv.org/abs/2408.00949

Abstract

Equivariant neural networks are neural networks with symmetry. Motivated by the theory of group representations, we decompose the layers of an equivariant neural network into simple representations. The nonlinear activation functions lead to interesting nonlinear equivariant maps between simple representations. For example, the rectified linear unit (ReLU) gives rise to piecewise linear maps. We show that these considerations lead to a filtration of equivariant neural networks, generalizing Fourier series. This observation might provide a useful tool for interpreting equivariant neural networks.

A few extra words

Neural networks provide flexible and powerful means of approximating a function. In many applications, one wants to learn a function that is invariant or equivariant with respect to some symmetries. A prototypical example is image recognition, where problems are often invariant under translation. Equivariant neural networks provide a flexible framework for learning such invariant or equivariant functions.
Equivariant neural networks can be studied using the mathematical theory of representation theory. (The mathematical concept of representation is different from the typical meaning of ``representation'' in machine learning. In this paper we exclusively use the term in the mathematical sense.) In representation theory, simple representations provide the irreducible atoms of the theory. A main strategy in representation theory is to take a problem, decompose it into simple representations, and study the problem on these basic pieces separately. As we will see, this doesn't quite work for equivariant neural networks: their nonlinear nature allows for interaction between simple representations, which is impossible in the linear world.
However, we will argue in this paper that decomposing the layers of an equivariant neural network into simple representations is still a very interesting thing to do. We are led naturally to the study of piecewise linear maps between simple representations and \emph{piecewise linear representation theory}. In concrete terms, the decomposition into simple representations leads to a new basis of the layers of a neural network, generalizing the Fourier transform. We hope that this new basis provides a useful tool to understand and interpret equivariant neural networks.
As an example, consider the small vanilla neural network (we often omit labels):

As usual, each node represents a copy of \(\mathbb{R}\), each arrow is labeled by a weight \(w\), and the result of each linear map between layers is composed with a nonlinear activation function \(f\) before proceeding to the next layer.
The key motivation for building equivariant neural networks is to replace \(\mathbb{R}\) and \(w\) by more complicated objects with more symmetry. For example, consider the replacement
  • \(\mathbb{R}\) \(\rightsquigarrow\) a suitable space \(Fun\)) of functions on \(\mathbb{R}\),
  • \(w\) \(\rightsquigarrow\) a convolution operator \(c_{\ast}\colon Fun\to Fun\),
  • \(f\) \(\rightsquigarrow\) an activation function \(Fun\to Fun\)) with \(\gamma\mapsto f\circ\gamma\).
We can depict this as:
The actual implementation of this structure on a computer would be impossible, but bear with us!
Now assume for a moment that our function is periodic of period \(2\pi\). It is very natural to ask what happens to our neural network when we expand it in terms of Fourier series. A fundamental result in Fourier theory is that convolution operators become diagonal in the Fourier basis. Hence, in order to understand how signals flow through the neural network, it remains to understand how the activation function acts on the fundamental frequencies.
A basic, but key, observation is that the Fourier series of \(f(\sin(x))\) only involves terms of higher resonant frequency:
(This shows the first few Fourier series terms of \(f(\sin(x))\) when \(f\) is the rectifier linear unit ReLU.) This is very similar to what happens when we pluck a string on a guitar: one has a fundamental frequency corresponding to the note played, as well as higher frequencies (overtones, similar to the bottom three pictures above) which combine to give the distinctive timbre of the guitar.
We show in general that in equivariant neural networks one has a flow of lower frequency to higher resonant frequency, but not conversely: