Boost your intuition for matrix multiplication from the ground up
When I first learned about matrix multiplication I was surprised by how hard it was for me to develop intuition about the operation. The usual definition of matrix multiplication hides a lot of interesting facts that are easier to recognize when you look from different points of view. In this post I will describe matrix multiplication from three points of view: columns, rows, and combinations of both. I will also discuss some simple facts to help test your intuition. After reading you will hopefully gain a deeper understanding of matrix multiplication, rows, and columns. This post was inspired by a linear algebra course taught by the great Gilbert Strang (MIT).
This post lists three ways to interpret matrix multiplication. For each of these interpretations we will discuss the following.
- Interpretation: What does this mean?
- Why does this work?: How does this interpretation arise from the definition of matrix multiplication?
- Check your intuition: A list of facts that you can use to check your intuition for the interpretation we are considering.
I occasionally mention the concepts of linear combination, linear dependence, linear independence, dot product. If you would like to quickly refresh your memory on these topics then see my article 3 basic concepts in linear algebra.
Example. Say we have three barflies: Lars, Fatima, and Georgia. On a night out Lars bought 2 beers 1 cocktail, Fatima bought 1 beer 2 cocktails, and Georgia had 4 beers and no cocktails. Beers cost $7 and cocktails cost $10. We can model their spending for the night with matrix multiplication.
How were the numbers on the right computed?
Our goal is to understand the properties of matrix multiplication with more generality so throughout this post we will consider the product of a 3×3 matrix A and a 3×2 matrix B. The result will be a 3×2 matrix C.
The definition of matrix multiplication
Jacques Philippe Marie Binet …recognized as the first to derive the rule for multiplying matrices in 1812. — Oliver Knill
The usual way to define matrix multiplication is as a summation or, more compactly, a dot product of rows of A and columns of B. The dot product of row 1 of A and column 1 of B will give the first entry of C.
In general the ij-th entry of C is the i-th row of A dotted with the j-th column of B.
Example. Find the third row and second column of the product C.
The answer is (1)(1)+(2)(2) +(3)(1) = 8. Try using the definition to find the rest of entries of C.
Interpretation: An entry of C is the dot product of a row of A and a column of B. Zero entries in C correspond to a row of A and a column of B that are orthogonal (at right angles to each other).
Check your intuition: From this point of view several facts become clearer.
- The number of columns of A must equal the number of rows of B. Otherwise the sums in the definition won’t be defined.
- The product AB will be a matrix with the same number of columns as A and the same number of rows as B.
- A zero entry in C means that the correspond row of A and column of B are orthogonal. Orthogonal vectors are linearly independent. But not all pairs of linearly independent vectors are orthogonal.
#1 The column point of view
The first thing to notice about AB = C is that the columns of the matrix C are related to the columns of the matrix A in an important way.
Interpretation Each of the column of C is the matrix A multiplied by a column of B. The effect of this is that the columns of C are linear combinations of the columns of A with weights given by columns of B.
Why does this work? To see why the columns of C are linear combinations of the columns of A let’s take a close look at how we calculate the first column of C.
Check your intuition: From this point of view several facts become clearer.
- A matrix multiplied by a vector, Ax, is simply a linear combination of the columns of a by the entries of x. So the columns of A are linearly independent if and only if equation Ax = 0 has only the zero solution.
- We can view the columns of C as the results of applying a linear transformation, defined by B, to columns of A.
- Suppose the columns of A are linearly independent. Then, if C has a column of zeros, B must also have a column of zeros.
- If the columns of C are linearly dependent and the columns of B are linearly independent, then the columns of A are dependent. This follows from that fact that if x is a non-trivial solution of Cx = 0 then Bx is a non-trivial solution of Ax = 0.
- If the equation Ax = b has no solution then ABx = Cx = b has no solution. After all, the columns of C are just combinations of columns of A.
- The span of the columns of C are contained in the span of the columns of A. Therefore, rank(AB) ≤ rank(A).
- If B is invertible with inverse B’ then the columns of A and AB have the same span. We can prove this from the previous fact, rank(AB) ≤ rank(A), combined with the fact that rank(A) = rank(AI) = rank(ABB’) ≤ rank(AB).
So matrix multiplication from the point of view of columns. Now let’s move on to rows?
#2 The row point of view
Interpretation The rows of C are rows of A multiplied by the matrix B. Therefore, the rows of C are linear combinations of the rows of B with weights given by rows of A.
Why does this work? To see why the rows of C are linear combinations of the rows of B let’s take a close look at how we calculate the first row of C using the definition of matrix multiplication.
Check your intuition: Once again let’s list some facts about rows that lead from this interpretation of matrix multiplication.
- For AB = C, if the rows of C are linearly independent then so are the rows of B. Warning: the converse is not necessarily true.
- If A has a row of zeros then AB has a row of zeros.
- The the span of the rows of B contains the span the rows of C.
- If E is an invertible n×n matrix and B is any n×m matrix. Then EB has the same row space as E. In particular, elementary row operations preserve the row space.
We can use the row and column interpretations the help sketch a proof of an interesting result about the dimension of the row space and column space of an m×n matrix. The dimension of the span of the columns of a matrix is called its rank. The dimension of the span of the rows is called the rowrank.
Claim: The rank and rowrank of an m×n matrix C are equal.
There are many m×r matrices A and r×n matrices B such that C = AB. Choose A and B such that r is minimum. The r columns of A span the column space of C. The r rows of B span the row space of C. Since we chose r to be the smallest such number, the rank(C) = rowrank(C) = r.
Claim: If A and B are square matrices and AB = I then BA = I. Therefore B is the inverse of A.
We have AB = I. Therefore the columns of A are linearly independent. Therefore the equation Ax = 0 has only the trivial solution. multiply the first equation on the right by A to get ABA = A. Then ABA-A = A(BA-I)=0. Therefore BA = I.
#3 Columns and rows
Our last interpretation gives us a way to decompose a product of two matrices into a sum of matrices.
Interpretation The matrix C is the sum of matrices that are columns of A multiplied by rows of B. The matrices that make up the sum have columns that are scalar multiples of a column of A.
Why does this work? To see why this is true consider what happens when you write the matrix A as the sum of matrices and compute AB by distributing B.
Check your intuition:
- Each of the matrices in the summand have 1-dimensional column spaces.
- You can interchange two columns of A and get the same product AB as long as you interchange the corresponding rows of B.
We talked about three different ways to understand matrix multiplication.
- A matrix multiplied by columns
- A rows multiplied by matrices
- And columns multiplied by rows
We used these different interpretations to review some basic facts about matrix multiplication, independence, and span.