Projecting onto Projections

The first time I saw the expression \int_C \mathbf{F} \cdot \mathbf{n}\ d\mathbf{r}, I thought, “Why should that dot product be in there. By the time I saw \iint_S \mathbf{F} \cdot\ d\mathbf{S}, I resigned myself to the fact that there was always a dot product in these seemingly random integrals. At some point, I decided that the dot products are in there to turn vectors (or vector fields) into scalar functions—which is something we know how to integrate. More recently, I’ve decided that the purpose of these dot products is to capture the projection of one vector on the other.

For example, if I apply a force \mathbf{F} to an object, then the work done by that force in moving the object a certain distance in a given direction (denote this shift by \mathbf{v}) is \mathbf{F} \cdot \mathbf{v}. If the force is not constant over some curve parametrized by \mathbf{r}(t) (a \leq t \leq b), then we compute the work by evaluating the integral \int_a^b \mathbf{F} \cdot \mathbf{r}'(t)\ dt since, at any given point, our \mathbf{v} from above is just the tangent vector to the curve at that point, i.e., \mathbf{r}'(t).

If you understand multivariable calculus, then you are probably laughing at me. “Duh. Why did it take you so long to figure that out?”

Here is my answer: We (or maybe just I) improperly motivate the dot product. This semester, I’m using Stewart for Multivariable Calculus*. He introduces vectors in a way that seems fairly standard for math texts.

Definition: The dot product of \langle x_1, \ldots, x_n \rangle and \langle y_1, \ldots, y_n \rangle is x_1y_1 + \cdots + x_ny_n.

The great thing about this definition is that it is bloody easy to compute and understand.

Theorem: If \mathbf{a} and \mathbf{b} be vectors with angle \theta between them, then \mathbf{a} \cdot \mathbf{b} = \mid\mid \mathbf{a} \mid\mid\ \mid \mid\mathbf{b}\mid\mid \cos(\theta).

The beauty here is that you can use the dot product to help compute angles and it is immediately obvious that the dot product of orthogonal vectors is 0.

*This wasn’t my choice, but rather the choice of my department. Oh, did I mention I got a new job? Indeed I finally gave up on east coast living and moved back to California. I am now in the mathematics department at California State University Fullerton.

I’ve heard that in physics textbooks, they switch the order of the above, i.e., they define the dot product via the cosine formula and then prove the above definition as a theorem. As a mathematician, I always went with the first definition. Now, I am not so sure. What follows is the introduction to the dot product I plan to give to my students (until I come up with something better, anyway*).

*In the comments, please do set me straight about the real purpose of the dot product or how you think it best to introduce it in this context.

Let’s start with two vectors, joined by their tales.

I am interested in how far \mathbf{b} extends along \mathbf{a}, so I drop a line perpendicular to \mathbf{a} from the end of \mathbf{b}.

At this point, I’m already confused by what would happen if I had tried to see how far \mathbf{a} goes along \mathbf{b}, but I decide that I could simply extend \mathbf{b} and at least draw the following picture:

Awesome, I have a couple of right triangles. And, heck, since they are right triangles that share the angle (let’s call it \theta) between \mathbf{a} and \mathbf{b}, they are similar triangles. Let’s give some names to the important sides.

The comment about similar triangles implies that \dfrac{h}{||\mathbf{b}||} = \dfrac{k}{||\mathbf{a}||}. Ugh, let’s clear denominators to get h||\mathbf{a}|| = k||\mathbf{b}||. On the other hand, \cos(\theta) = \dfrac{h}{||\mathbf{b}||}, and so if we multiply by ||\mathbf{a}||\ ||\mathbf{b}||, we get

||\mathbf{a}||\ ||\mathbf{b}||\cos(\theta) = h||\mathbf{a}||

The moral is that this important quantity—h||\mathbf{a}|| = k||\mathbf{b}||—coming from projecting the vectors onto each other, has a very simple reformulation as ||\mathbf{a}||\ ||\mathbf{b}||\cos(\theta) which only relies on knowing the original vectors and the angle between them. Since this projection property is so important to us physically, we give a short name to this expression: \mathbf{a} \cdot \mathbf{b}, and call it the dot product of \mathbf{a} and \mathbf{b}.

If \mathbf{b} is orthogonal to \mathbf{a}, then the projection should be 0, which of course it is since the cosine of 90^\circ is 0.

At this point one can go about proving that the dot product is obtained directly from the components, i.e., without knowing the angle between them. Of course,  there is still the issue of when \theta is obtuse, and it will probably be helpful to cover that case as well. Geometrically it will look a bit different, but the algebra and trig will be almost the same*.

*You do get to use the fact that the cosine of an angle equals the cosine of the supplementary angle!

There is nothing really new here, but I think the ordering is important. Their first impression of the dot product should convey the purpose of the dot product, not just the easiest algorithm for computing it. As it stands, the projection of a vector onto another vector gets a a somewhat token reference at the end of the dot product chapter. As ubiquitous as the idea is throughout the end of the class, it deserves its time in the sun.