This page is about the dihedral group. For the cyclic group or the symmetric group click on the associated picture below.
Everything is based on fabulous work of Joel Gibson related to Joel's
Lievis. Explanations for what is going on can be found at the end of the page:
jump to background. See also some slides.

For a nice illustration, and much more, why linear maps do not go very well with neural networks see here.

# Interactive failure of Schur's lemma

Put your cursor over a diagram, then the program will take that vector in \(L_{i}\), include it into \(R\), run ReLU or any of the other maps, then project back down to all of the simples. The trivial and sign representations are included along the real axis (any imaginary component of the cursor is ignored). The scaling checkbox attempts to scale the outputs so that the length of the image of one is equal to one in each simple. This is recommended as the coefficients of the projectors scale the maps quite drastically. For example, PReLU is almost linear and one cannot tell the difference to \(f\colon x\mapsto x\) in an unscaled plot.

# Interaction graphs

# Background

Here is a bit of mathematical background on what is going on.

- Our ground field is \(\mathbb{R}\). Fix \(n\in\mathbb{Z}_{\geq 1}\), and depending on parity, let \(m\) be defined by \(n=2m\) or \(n=2m+1\). Let \(\theta=2\pi/n\); rotation is always clockwise.
- Let \(D_{2\cdot n}=\langle a,b|a^{n}=b^{2}=(ab)^{2}=1\rangle\) be the dihedral group of order \(2n\).
- This group acts on the \(2n\) dimensional \(\mathbb{R}\)-vector space \(R=\mathbb{R}\{x_{0},\dots,x_{n-1},y_{0},\dots,y_{n-1}\}\) by \(a\centerdot x_{i}=x_{i+1},a\centerdot y_{i}=y_{i-1}\) and \(b\centerdot x_{i}=y_{i},b\centerdot y_{i}=x_{i}\), reading indexes modulo \(n\). This \(D_{2\cdot n}\)-representation is called the regular representation.
- For \(n\) even there are precicely \(m+2\) nonequivalent simple real \(D_{2\cdot n}\)-representations: \[L_{0}\leftrightsquigarrow\text{trivial},L_{0}^{\star}\leftrightsquigarrow\text{a acts trivial, b by a sign},L_{1}\leftrightsquigarrow\text{rotation by }\theta,\dots\] \[\dots,L_{m-1}\leftrightsquigarrow\text{rotation by }(m-1)\theta, L_{m}\leftrightsquigarrow\text{a acts by a sign, b by trivial},L_{m}^{\star}\leftrightsquigarrow\text{a acts by a sign, b by a sign}\] These are of dimensions \(1,1,2,\dots,2,1,1\). Here rotation is with respect to the action of the generator \(a\); the generator \(b\) acts always by reflection.
- For \(n\) odd the same is true, but \(L_{m}\) and \(L_{m}^{\star}\) are replaced by a two dimensional \(D_{2\cdot n}\)-representation corresponding to \(m\theta\). The dimension sequence is \(1,1,2,\dots,2\).
- The Artin–Wedderburn decomposition of \(R\) into simple real \(D_{2\cdot n}\)-representation is as follows: \[R\cong \underbrace{L_{0}}_{1d}\oplus\underbrace{L_{0}^{\star}}_{1d}\oplus \underbrace{L_{1}^{\oplus 2}\oplus\dots\oplus L_{m-1}^{\oplus 2}}_{2d}\oplus \underbrace{L_{m}}_{1d}\oplus \underbrace{L_{m}^{\star}}_{1d}\] for \(n\) even, and \[R\cong \underbrace{L_{0}}_{1d}\oplus\underbrace{L_{0}^{\star}}_{1d}\oplus \underbrace{L_{1}^{\oplus 2}\oplus\dots\oplus L_{m}^{\oplus 2}}_{2d}\] for \(n\) odd.
- To distinguish the two copies of the two dimensional \(D_{2\cdot n}\)-representations in the Artin–Wedderburn decomposition, we denote them using a superscript \(\pm 1\).
- The base change matrices realizing these Artin–Wedderburn decompositions are easy to get from the character table of \(D_{2\cdot n}\). However, there is no canonical choice as some representations appear twice. For the below we took a particular choice as follows: \[\text{$n$ even or odd}\colon p_{L_{0}^{(\star)}}=\frac{1}{2n}\sum_{i=1}^{n}(a^{i}+\epsilon a^{i}b).\] \[\text{$n$ even only}\colon p_{L_{m}^{(\star)}}=\frac{1}{2n} \big( 1+\sum_{i=1}^{n}(-1)^{i}\epsilon a^{i}b +\sum_{i=1}^{m-1}(-1)^{i}(a^{i}+a^{-i}) +(-1)^{m}a^{m}\big).\] Here \((\star)\) means either star or not, with \(\epsilon=1\) for the non-star and \(\epsilon=-1\) for the star case. For the two dimensional simple real \(C_{n}\)-representation we additionally need the rotation angle \(\theta\), and we keep the sign \(\epsilon\). \[p_{L_{k}}^{\epsilon}= \frac{1}{n}\big(id_{2\cdot n}+\sum_{i=1}^{n}\cos(ik\theta)a^{i}+\epsilon \sin(ik\theta)a^{i}b\big).\] These are expressions in the group ring \(\mathbb{R}[D_{2\cdot n}]\).

In the demonstration above shows what happens to certain piecewise linear maps upon decomposition, that is, the above calculates the map \[L_{i}\xrightarrow{include}R\xrightarrow{\hspace{0.5cm}f\hspace{0.5cm}}R\xrightarrow{project}L_{j}\] for the maps \(f\colon x\mapsto x\) (this is a linear map), \(ReLU\colon x\mapsto\max(x,0)\) (this map is called ReLU) and \(Abs\colon x\mapsto\max(x,-x)\) (this map is called absolute value), and \(PReLU\colon x\mapsto\max(x,0.99x)\) (this map is a special case of the map called parametric ReLU). Using the above basis, these maps are extended componentwise to \(R\), and the resulting maps are \(D_{2\cdot n}\)-equivariant. Any two of these form a basis over \(\mathbb{R}\) of the space of all piecewise linear maps \(\mathbb{R}\to\mathbb{R}\). The order below means the order of the action matrix of the generator \(a\).

The graphsThe interaction graph (on \(R\)) of a \(D_{2\cdot n}\)-equivariant map \(f\) is the graph with vertices indexed by the simple real constituents \(L_{i}\) of \(R\), counted with multiplicities, and an edge from \(L_{i}\) to \(L_{j}\) if the map \[L_{i}\xrightarrow{include}R\xrightarrow{\hspace{0.5cm}f\hspace{0.5cm}}R\xrightarrow{project}L_{j}\] is not zero.

The interaction graph for \(f\colon x\mapsto x\) is trivial (i.e. we only have isolated vertices with loops), while the intersection graph for PReLU is the same as for ReLU itself. Above are the first few interaction graphs for ReLU (first) and the absolute value (second), where the vertices are labeled \(i\) instead of \(L_{i}\).

The theorem for ReLU is as follows. Define the adjusted order as the one above divided by two if the order itself is even. Call this \(ord^{\prime}\). Then every vertex in the intersection graph has a loop and there is a non-loop edge from \(L_{i}\) to \(L_{j}\) if and only if \(ord(L_{j})\) divides \(ord^{\prime}(L_{i})\) (note that the order and the adjusted order appear here). For the absolute value the theorem reads the same but dropping the forced existence of loops (the order condition itself might give loops as well).