Tensor Product of Vector Spaces, Step by Step

19 May 2016

I came across an unusual definition of the tensor product of two matrices today, and after awhile found it to be incorrect. In the process of doing so, I had to improve my understanding of the tensor product - there’s a lot of in-depth material on the Internet, but I don’t see a clear intro or an example of the operation itself when it comes to matrices (of which vectors are a special case with $n$ rows and 1 column), so here’s my contribution.

In quantum computing, a vector space describes one quantum system; the tensor product of two vector spaces is therefore a way to create a new space representing the combined states of two quantum systems [Yanofsky 68]. I will first describe the tensor product of two vector spaces as simply as I can, using recursion as a teaching aide. I then generalize to matrices further down, and end by correcting a definition of the tensor product of two matrices that I came across.

Tensor Product of Two Vector Spaces, with Recursion as a Guide

First off, we have a vector space $\mathbb{V}$ of dimension $m$, and another vector space $\mathbb{V}’$ with dimension $n$.

Now envision some vector $V \in \mathbb{V}$ paired with some vector $V’ \in \mathbb{V}’$. We could use the Cartesian product $V \times V’$, which would simply pair them up in tuple form: $<V, V’>$. But say we want to combine them in a way that creates a new vector, each entry of which pairs one element of $V$ with another in $V’$. “Pair” is the key word here: pairing $m$ entries in $V$ with $n$ entries in $V’$ yields $m \bullet n$ pairs. In basic English, that is what the tensor product of two vector spaces does: it creates a new vector space $\mathbb{V}’’$ with dimension $m \bullet n$, and each vector in $\mathbb{V}’’$ is the vector that results from simply multiplying each entry of a vector $V \in \mathbb{V}$ with each entry of another vector $V’ \in \mathbb{V}’$.

What I’ve noticed is that at this point, definitions start getting complicated because vectors can be written in a basis other than the canonical basis. Thus, to be formally correct, these definitions have to show that for any vector space $\mathbb{V}$, you can “decompose” its vectors into the particular coefficients that match up with one of the bases of $\mathbb{V}$, which does not have to be the canonical one. Why is this so important? In [Yanofsky 68], after a good bit of formalism, here’s how they define an element of $\mathbb{V} \otimes \mathbb{V}’$:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) + (c_0 \times c’_1)(B_0 \otimes B’_1) + … + (c_{m-1} \times c’_{n-1})(B_{m-1} \otimes B’_{n-1}) \)

When I first saw this, my programmer’s mind almost exploded thanks to the seemingly recursive use of the tensor product operator in this definition of the tensor product operator itself. At first I only understood the first half of each term in their definition: $(c_0 \times c’_0) … (c_{m-1} \times c’_{n-1})$ is simply the pairwise multiplication I described above. Everything clicked when I realized that the tensor product has to be defined for vectors written in any basis of the supplied vector spaces, not just the canonical one.

To illustrate how simple it is to compute the tensor product of two vectors written in the canonical basis, let’s take the tensor product of a vector $V \in \mathbb{R}^{3}$, written in terms of the canonical basis of $\mathbb{R}^{3}$:

\( V = \begin{bmatrix}1 \\ 2 \\ 3\end{bmatrix} = 1 \bullet \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} + 2 \bullet \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} + 3 \bullet \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \)

with another vector $V’ \in \mathbb{R}^{2}$, also written in terms of the canonical basis of $\mathbb{R}^{2}$:

\( V’ = \begin{bmatrix}4 \\ 5 \end{bmatrix} = 4 \bullet \begin{bmatrix}1 \\ 0 \end{bmatrix} + 5 \bullet \begin{bmatrix}0 \\ 1 \end{bmatrix} \)

The result of $V \otimes V’$ is clearly an element of $\mathbb{V} \otimes \mathbb{V}’$, so we use the definition I cited above, starting with the first term:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) = (1 \times 4)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) \)

When initally pondering this, what I didn’t immediately grasp is that the tensor product of two canonical basis vectors serves as a kind of recursive base case: given a canonical basis vector $E_j$, where $j$ represents the location in $E_j$ of the single entry of value 1, and another canonical basis vector $E_k$, where $k$ represents the same location in $E_k$, the tensor product $E_j \otimes E_k$ is simply $E_{j \bullet k}$. So for the term above, we have $E_1 \otimes E_1 = E_{1 \bullet 1} = E_1$, expanded to dimension $m \bullet n$:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) = (1 \times 4)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) = 4 \bullet \begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

The second term proceeds in the same way:

\( (c_0 \times c’_1)(B_0 \otimes B’_1) = (1 \times 5)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}0 \\ 1 \end{bmatrix}) = 5 \bullet \begin{bmatrix}0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

This continues for the remaining four terms, with the final summation as follows:

\( V \otimes V’ = \begin{bmatrix}4 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 5 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 8 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 10 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ 12 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 15\end{bmatrix} = \begin{bmatrix}4 \\ 5 \\ 8 \\ 10 \\ 12 \\ 15\end{bmatrix} \)

That was simple enough, given that we wrote $V$ and $V’$ in terms of their canonical bases. Now, we should also get the same answer even if we rewrite $V’$ in terms of another, non-canonical basis of $\mathbb{R}^{2}$, keep $V$ the same, and re-compute the tensor product $V \otimes V’_D$. Below, we rewrite $V’$ in terms of the non-canonical basis $D = \{[1, 1]^{T}, [1, 0]^{T}\}$, yielding $V’_D$:

\( V’ = \begin{bmatrix}4 \\ 5 \end{bmatrix} = 5 \bullet \begin{bmatrix}1 \\ 1 \end{bmatrix} + (-1) \bullet \begin{bmatrix}1 \\ 0 \end{bmatrix} \)

\( V’_D = \begin{bmatrix}5 \\ -1 \end{bmatrix} \)

Just as above, the result of $V \otimes V’_D$ is clearly an element of $\mathbb{V} \otimes \mathbb{V}’$, so we again use the definition I cited above, starting with the first term:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) = (1 \times 5)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 1 \end{bmatrix}) \)

Of course, now we can’t simply compute $E_{j \bullet k}$: we have $E_j$, but no $E_k$. With no recursive base case to help us, we’re forced to make a recursive application to compute the tensor product of the two basis vectors here. For this nested computation of the tensor product of $W \in \mathbb{R}^{3}$ and $W’ \in \mathbb{R}^{2}$, we choose to write both $W$ and $W’$ in terms of their canonical bases, as to do so otherwise would only prolong our recursive sadism:

\( W = \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} = 1 \bullet \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} + 0 \bullet \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} + 0 \bullet \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \)

\( W’ = \begin{bmatrix}1 \\ 1 \end{bmatrix} = 1 \bullet \begin{bmatrix}1 \\ 0 \end{bmatrix} + 1 \bullet \begin{bmatrix}0 \\ 1 \end{bmatrix} \)

Beginning our nested computation of $W \otimes W’$, and starting with the first term:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) = (1 \times 1)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) = 1 \bullet \begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

Thankfully, we defined $W$ and $W’$ in terms of the canonical basis, allowing us to make use of the recursive base case. The second term:

\( (c_0 \times c’_1)(B_0 \otimes B’_1) = (1 \times 1)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}0 \\ 1 \end{bmatrix}) = 1 \bullet \begin{bmatrix}0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

You can verify for yourself that the final summation yields:

\( W \otimes W’ = \begin{bmatrix}1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

allowing us to finally compute the term we wanted earlier for $V \otimes V’_D$:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) = (1 \times 5)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 1 \end{bmatrix}) = 5 \bullet \begin{bmatrix}1 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

Needless to say, we’ve just done a ton of work considering we’re only one-sixth of the way through the computation for $V \otimes V’_D$, all because we chose to write one of our vectors in a non-canonical basis. The remaining five terms of the computation for $V \otimes V’_D$ are as follows, with the nested tensor product computations $W \otimes W’$ omitted for brevity:

\( (c_0 \times c’_1)(B_0 \otimes B’_1) = (1 \times -1)(\begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) = -1 \bullet \begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

\( (c_1 \times c’_0)(B_1 \otimes B’_0) = (2 \times 5)(\begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 1 \end{bmatrix}) = 10 \bullet \begin{bmatrix}0 \\ 0 \\ 1 \\ 1 \\ 0 \\ 0\end{bmatrix} \)

\( (c_1 \times c’_1)(B_1 \otimes B’_1) = (2 \times -1)(\begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) = -2 \bullet \begin{bmatrix}0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0\end{bmatrix} \)

\( (c_2 \times c’_0)(B_2 \otimes B’_0) = (3 \times 5)(\begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \otimes \begin{bmatrix}1 \\ 1 \end{bmatrix}) = 15 \bullet \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 1\end{bmatrix} \)

\( (c_2 \times c’_1)(B_2 \otimes B’_1) = (3 \times -1)(\begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \end{bmatrix}) = -3 \bullet \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 0\end{bmatrix} \)

Finally, summation of the six terms yields:

\( V \otimes V’_D = \begin{bmatrix}5 \\ 5 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}-1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 10 \\ 10 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ -2 \\ 0 \\ 0 \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ 15 \\ 15\end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 0 \\ -3 \\ 0\end{bmatrix} = \begin{bmatrix}4 \\ 5 \\ 8 \\ 10 \\ 12 \\ 15\end{bmatrix} \)

which is exactly the same result we computed for $V \otimes V’$ above, as expected. If you came here for a walk-through of an actual tensor product computation, I hope mine is crystal clear to you. I personally found the notion of recursion to be very helpful here, and I haven’t seen it explained this way anywhere else.

Tensor Product of Two Matrices

Vectors are a special case of matrices, with columnar dimension fixed to 1. When thinking about the tensor product of two matrices, rather than getting formal from the start, I think it helps to visualize chunking up the two matrices into columns, taking the tensor product of each pair of columns (which are vectors), and assembling them into a single resulting matrix. So for matrix $A \in \mathbb{R}^{2x2}$ and matrix $B \in \mathbb{R}^{3x3}$, $A \otimes B$ is:

\( \begin{bmatrix} \begin{bmatrix}a_{0,0} \\ a_{1,0}\end{bmatrix} \otimes \begin{bmatrix}b_{0,0} \\ b_{1,0} \\ b_{2,0}\end{bmatrix}, \begin{bmatrix}a_{0,0} \\ a_{1,0}\end{bmatrix} \otimes \begin{bmatrix}b_{0,1} \\ b_{1,1} \\ b_{2,1}\end{bmatrix}, \begin{bmatrix}a_{0,0} \\ a_{1,0}\end{bmatrix} \otimes \begin{bmatrix}b_{0,2} \\ b_{1,2} \\ b_{2,2}\end{bmatrix}, \begin{bmatrix}a_{0,1} \\ a_{1,1}\end{bmatrix} \otimes \begin{bmatrix}b_{0,0} \\ b_{1,0} \\ b_{2,0}\end{bmatrix}, \begin{bmatrix}a_{0,1} \\ a_{1,1}\end{bmatrix} \otimes \begin{bmatrix}b_{0,1} \\ b_{1,1} \\ b_{2,1}\end{bmatrix}, \begin{bmatrix}a_{0,1} \\ a_{1,1}\end{bmatrix} \otimes \begin{bmatrix}b_{0,2} \\ b_{1,2} \\ b_{2,2}\end{bmatrix} \end{bmatrix} \)

Clearly this gets wild fast, even for matrices of small dimensions. We need a concise formula for determining the value of any particular entry $(A \otimes B)[j,k]$, and [Yanofsky 72] attempts to supply us with one:

\( \otimes : \mathbb{C}^{m \bullet m’} \times \mathbb{C}^{n \bullet n’} \rightarrow \mathbb{C}^{mn \bullet m’n’} \
(A \otimes B)[j,k] = A[j/n, k/m] \times B[j\%n, k\%m] \)

Unfortunately, the second half of this definition is not correct. For a simple proof by contradiction let’s use $A$ and $B$, giving us $m = 2, m’ = 2, n = 3, n’ = 3$, then calculate:

\( (A \otimes B)[5,5] = A[5/3, 5/2] \times B[5\%3, 5\%2] = A[1, 2] \times B[2, 1] \)

It’s easy to see that this is false; we expect $(A \otimes B)[5,5]$ to be the product of $A$’s final entry ($A[1,1]$) and $B$’s final entry ($B[2,2]$). Upon closer inspection, the problem with this definition is its use of $m$, which should be replaced with $n’$ as follows:

$(A \otimes B)[j,k] = A[j/n, k/n’] \times B[j\%n, k\%n’]$

The reason $A$’s dimensions should not appear here has its roots in the definition of the tensor product itself. Recall the definition provided earlier from [Yanofsky 68] for an element of $\mathbb{V} \otimes \mathbb{V}’$:

\( (c_0 \times c’_0)(B_0 \otimes B’_0) + (c_0 \times c’_1)(B_0 \otimes B’_1) + … + (c_{m-1} \times c’_{n-1})(B_{m-1} \otimes B’_{n-1}) \)

Notice how this definition pairs the first element $c_0 \bullet B_0$ of $V \in \mathbb{V}$ with each element $c’_{0..n-1} \bullet B’_{0..n-1}$ of $V’ \in \mathbb{V’}$, then moves on to pair the second element of $V$ with each element of $V’$, and so on. Visually, you can see that we iterate “slowly” over the elements of $V$, while iterating “rapidly” over the elements of $V’$. Our corrected definition, restated here:

$(A \otimes B)[j,k] = A[j/n, k/n’] \times B[j\%n, k\%n’]$

illustrates this: the rows of matrix $A$ are divided up into $n$ “sections”, its columns are divided up into $n’$ “sections”. Each of these “sections” ends up being a single element that gets multiplied with each of $B$’s elements, which we iterate “rapidly” over using the modulo operation.

One final note about this definition: like any programmer with some modicum of sense, I had to convince myself that the bounds of $A$ and $B$ are never exceeded. We iterate over $A$’s rows using $j/n$; does $j/n < m$ hold? Yes, by rearranging to $j < mn$, which holds from the first half of [Yanofsky 72] above. We iterate over $A$’s columns using $k/n’$; does $k/n’ < m’$ hold? Yes, by rearranging to $k < m’n’$, which holds from the first half of [Yanofsky 72] above as well. Iteration over $B$ is trivial to check: $j \% n < n$ holds by definition of the modulo operation, as does $k \% n’ < n’$.

Final Notes

What appears to be a minor typo on [Yanofsky 72] motivated me to jump down the rabbit hole of tensor products, which turned out to be rewarding. Overall, I highly recommend Quantum Computing for Computer Scientists by Yanofsky and Mannucci, it’s the best introductory resource on the subject I’ve found. This post turned out to be more about the tensor product in general than about that particular typo, so if you’re looking for a step-by-step walk-through, I hope this helps you out.

References

Yanofsky, Noson S. and Mirco A. Mannucci. Quantum Computing for Computer Scientists. Cambridge: Cambridge University Press, 2008.