Math 208

Topics for first exam

Chapter 11: Functions of Several Variables

§ 1:: Functions of two variables

Function of one variable: one number in, one number out. Picture a black box; one input and one output.

Function of several variables: several inputs, one output. Picture a quantity which depends on several different quantities. E.g., distance from the origin in the plane:

distance = d = [Ö(x²+y²)]

depends on both the x- and y-coordinates of our point.

Our goal is to understand functions of several variables, in much the same way that the tools of calculus allow us to understand functions of one variable. And our basic tool is going to be to think of a function of several variables as a function of one variable (at a time!), s ithat we can use those tools to good effect.

§ 2:: 3-space

One of the best ways to understand a function is to graph it. for one variable, y = f(x), this means to plot all of the pairs of points (x,f(x), where x is in the domain of f. For functions of two variables, z=f(x,y), we need to plot all triples (x,y,f(x,y). We need to build our graphs in 3-space.

Cartesian coordinates: three axes, each perpendicular to one another, all meeting at the origin (0,0,0). Axes are labelled by the right hand rule; the thumb, forefinger, and middle finger of the right hand point in the x, y, and z directions, respectively. The point (a,b,c) is the one reached from the origin by moving a units in the direction of the (positive) x-axis, then b units in the y direction, and then c units in the z direction.

Figure

The distance between points (a,b,c) and (x,y,z) is given by a formula very similar to the one for the plane:

d = [Ö((x-a)²+(y-b)²+(z-c)²)]

So, e.g., the points satisfying x²+y²+z² = 9 are all of the points with distance 3 from the origin (0,0,0), i.e., they form a sphere of radius 3, centered at the origin.

§ 3:: graphs of functions of two variables

We know what such a graph is; but how do we see what it looks like? One answer is to think of it as a function of one variable (at a time!).

If we set y = c=constant, and look at z = f(x,c) , we are looking at a function of one variable, x, which we can (in theory) graph. This graph is what we would see when the plane y = c meets the graph z = f(x,y) ; this is a (vertical) cross section of our graph (parallel to the plane y = 0, the xz-plane). Similarly, if we set x = d=constant, and look at the graph of z = f(d,y) (as a function of y), we are seeing vertical cross sections of our original graph, parallel to the yz-plane. Several of these x- and y-cross sections together can give a very good picture of the general shape of the graph of our function z=f(x,y) .

Some of the simplest functions to describe are linear functions; functions havng equations of the form z=ax+by+c . Their cross sections are all lines; the cross sections x = const all have the same slope b, and the y-cross sections all have slope a.

Another simple type of function is cylinders; these are functions like f(x,y) = y² which, although we think of them as functions of x and y, the output does not depend on one of the inputs. Cross sections of such functions, setting equal to a constant whichever variable does not change the value of the function, will all be identical, so the graph looks likecopies of the exact same function, stacked side-by-side.

§ 4:: Contour diagrams

The cross sections of the graph of a function are obtained by slicing our graph with vertical planes (parallel to one of our coordinate planes). But we could also use horizontal planes as well, that is, the planes z=constant. In other words, we graph f(x,y) = c for different values of the constant c. These are called contour lines or level curves for the function f, since they represent all of the points on the graph of f which lie on the same horizontal level (the term contour line is borrowed from topographic maps; the lines represent the level curves of the height of land). These have the advantage that they can be graphed together in the xy-plane, for different values of c, because the level curves corresponding to different values of c cannot meet (a point (x,y) on both level curves would satisfy c = f(x,y) = d, so f would not be a function....)

A collection of level curves also gives a good picture of what the graph of our function f looks like; we can imagine wandering through the doman of our function, reading off the value of the function f by looking at what level curve we are standing on. We usually draw level curves for equally spaced values of c; that way, if the level curves are close together, we know that the function is changing values rapidly in that region, while if they are far appart, the values of the function are not varying by a large amount in that area.

Figure

We usually, for convenience, draw the level curves of f on a single xy-plane (since we can keep them somewhat separate), labelling each curve with its z-value. We could reconstruct a picture of the graph of f by simply drawing the level curve f(x,y) = c on the horizontal plane z = c in 3-space.

§ 5:: Linear functions

Just as lines play an important role in the 1-variable theory of calculus (e.g., as tangent lines to functions), linear function play an important role in the several variable theory.

The most general equation for a plane in 3-space is

ax+by+cz = d , where a,b,c and d are constants,

although typically, our planes will come in the form z = ax+by+c. The number a is called the x-slope, since it tells us how much z changes if we move 1 unit to the right in the x-direction. For similar reasons, b is called the y-slope.

Typically, there is exactly one plane passing through any three particular points in 3-space (unless they happen to line up in a line, then there are many planes possible). Later we will see how to determine an equation for this plane.

We can also completely describe a plane by knowing a point (x₀,y₀,z₀) on the plane, and its x-slope m and y-slope n; the equation for the plane is then

z = z₀+m(x-x₀)+n(y-y₀)

This will often be our method for finding equations of planes, since it is these three pieces of information which we will know when trying to compute the tangent plane to the graph of a function, as we shall see.

If we look at the level curves of a linear function, z = ax+by+c = d = constant, they are a collection of parallel lines; if we choose equally spaced horizontal levels, they will be equally spaced parallel lines.

§ 6:: Functions of more than two variables

There is of course no reason to stop with two variables for a function. An expression like F =F(M,m,r) = GMm/r² can be thought of as a function describing F as a function of M, m, and r (and G!). When we think of the graph of this function (as a function of the first three variables), its graph will live in 4-space! However, we can still get an impression of what the function looks like, by graphing F(M,m,r) = c = constant, for various values of c. These are level surfaces for the function f. We can get a picture of what the level surfaces look like by taking cross sections! Or we could look at each level surface's level curves.

Chapter 12: Vectors

§ 1,2:: Vectors

In one-variable calculus, we make a distinction between speed and velocity; velocity has a direction (left or right), while speed doesn't. It is the size of the velocity. This distinction is even more important in several variable calculus, and motivates the ntion of a vector.

Basically, a vector [v\vec] is an arrow pointing from one point in the plane (or 3-space or ...) to another. A vector is thought of as pointing frm its tail to its head. If it points from P to Q, we call the vector [v\vec] = PQ.

A vector has both a size (= length = distance from P to Q) and a direction. Vectors that have the same size and point in the same direction are often thought of as the same, even if they have different tails (and heads). Put differently, by picking up the vector and translating it so that its tail is at the origin (0,0), we can identify [v\vec] with a point in the plane, namely its head (x,y), and write [v\vec] = (x,y). If [v\vec] goes from (a,b) to (c,d), then we would have [v\vec](c-a,d-b). The length of [v\vec] (a,b) is then ||[v\vec]|| = [Ö(a²+b²)].

In 3-space we have three special vectors, pointing in the direction of each coordinate axis (in the plane there are, analogously, two); these are called

[i\vec](1,0,0), [j\vec](0,1,0), and [k\vec](0,0,1)

These come in especially handy when we start to add vectors. There are several different points of view to vector addition:

(1) move the vector [w\vec] so that its head is on the tail of [v\vec]; then the vector [v\vec]+[w\vec] has tail equal to the tail of [v\vec] and head equal to the head of [w\vec];

(2) move [v\vec] and [w\vec] so that their tails are both at the origin, and build the parallelogram which has sides equal to [v\vec] and [w\vec]; then [v\vec]+[w\vec] is the vector that goes from the origin to the opposite corner of the parallelogram;

(3) if [v\vec](a,b) and [w\vec](c,d), then [v\vec]+[w\vec](a+c,b+d)

Figure

We can also subtract vectors; if they share the same tail, [v\vec]-[w\vec] is the vector that points from the head of [w\vec] to the head of [v\vec] (so that [w\vec]+([v\vec]-[w\vec]) = [v\vec]). In coordinates, we simply subtract the coordinates.

We can also rescale vectors = multiply them by a constant factor; a[v\vec]vector pointing in the same direction, but a times as long. (We use the convention that if a < 0, then a[v\vec] points in the opposite direction from [v\vec].)

Using coordinates, this means that a(x,y) = (ax,ay) . To distinguish a from the coordinates or the vector, we call a a scalar.

All of these operations satisfy all of the usual properties you would expect:

[v\vec]+[w\vec][w\vec]+[v\vec]

([v\vec]+[w\vec])+[u\vec][v\vec]+([w\vec]+[u\vec])

a(b[v\vec]) = (ab)[v\vec]

a([v\vec]+[w\vec]) = a[v\vec]+a[w\vec]

Of course there is nothing special in all of this about vectors in the plane; all of these ideas work for vectors in 3-space, or 4-space, or .....

§ 3:: Dot products

One thing we haven't done yet is multiply vectors together. It turns out that there are two ways to reasonably do this, serving two very different sorts of purposes.

The first, the dot product, is intended top measure the extent to which two vectors [v\vec] and [w\vec]\ are pointing in the same direction. It takes a pair of vectors [v\vec](v₁,¼,v_n) and [w\vec](w₁,¼,w_n), and gives us a scalar [v\vec]·[w\vec] = v₁w₁+¼+v_nw_n.

Note that [v\vec]·[v\vec] = v₁²+¼+v_n² = ||[v\vec]||². In general, [v\vec]·[w\vec] = ||[v\vec]||·||[w\vec]||·cos(q), where q is the angle between the vectors [v\vec] and [w\vec] (when they have the same tail); this can be seen by an application of the Law of Cosines. This in turn allows us to compute this angle:

The angle Q between v and w = the angle (between 0 and p with cos(Q) = áv,wñ/(||v||·||w||)

The dot product satisfy some properties which justify calling it a product:

[v\vec]·[w\vec] = [w\vec]·[v\vec]

(k[v\vec])·[w\vec] = k([v\vec]·[w\vec])

[v\vec]·([w\vec]+[u\vec]) = [v\vec]·[w\vec]+[v\vec]·[u\vec]

Two vectors are orthogonal (= perpendicular) if the angle q between them is p/2, so cos(q)=0; this means that [v\vec]·[w\vec] = 0. We write [v\vec]^[w\vec].

We've seen that it usually take three pieces of information to describe a plane in 3-space (3 points in the plane, or a point and the x- and y-slopes), however, using dot products, we can describe it using only two:

Every plane has a normal vector [n\vec]; [n\vec] is orthogonal to the vector PQ for any pair of points P and Q in the plane. For example, the vector [k\vec] is perpendicular to the xy-plane, sine it is perpendicular to every vector of the form (a,b,0). Given such a normal vector [n\vec] and a point (x₀,y₀,z₀) in the plane, every other point in the plane must satisfy [n\vec]·(x-x₀,y-y₀,z-z₀) = 0; writing this in coordinates gives the equation for the plane;

a(x-x₀)+b(y-y₀)+c(z-z₀) = 0, where [n\vec](a,b,c)

This means that if we are given the equation of the plane, we can quickly read off what the normal vector for the plane is, as well.

Another application: projecting one vector onto another.

The idea is to figure out how much of one vector [v\vec] points in the direction of another vector [w\vec]. The dot product measures to what extent they are pointing in the same direction, so it is only natural that it plays a role.

What we wish to do is to write [v\vec]c[w\vec] + [u\vec], where [u\vec]^[w\vec] (i.e., write [v\vec] as the part pointing in the direction of [w\vec] and the part ^ [w\vec]). By solving the equation ([v\vec]-c[w\vec])·[w\vec] = 0, we find that c = ([v\vec]·[w\vec])/([w\vec]·[w\vec]).

We write c[w\vec] = proj_[w\vec][v\vec] = [([v\vec]·[w\vec])/([w\vec]·[w\vec])][w\vec]= [([v\vec]·[w\vec])/(||[w\vec]||)][[w\vec]/(||[w\vec]||)] = (orthogonal) projection of [v\vec] onto [w\vec]

[u\vec] = [v\vec]-c[w\vec] !

§ 4:: The cross product

We saw how to use the dot product to give an equation for a plane, using a normal vector for the plane The only question is, how do we find the normal vector, from the usual pieces of information we will know about the plane? One answer is given by the cross product.

One ingredient we will need: The area of a parallelogram with sides the vectors [v\vec] and [w\vec] :

Area = (bases×(height) = ||[w\vec]||·h = ||[w\vec]||·||[v\vec]||·sin(q) 9from triginometry).

Now, for no apparent reason, we define, for [v\vec](v₁,v₂,v₃) and [w\vec](w₁,w₂,w₃),

[v\vec]×[w\vec] = (v₂w₃-v₃w₂,-(v₁w₃-v₃w₁),v₁w₂-v₂w₁)

= (v₂w₃-v₃w₂)[i\vec]-(v₁w₃-v₃w₁)[j\vec]+(v₁w₂-v₂w₁)[k\vec]

We do this because, it turns out, ([v\vec]×[w\vec])^[v\vec] and ([v\vec]×[w\vec])^[w\vec] . [This is just a long tedious computation; take the dot products!] Also, a similar computation will produce

||[v\vec]×[w\vec]|| = ||[v\vec]||·||[w\vec]||·sin(q) = the area of that parallelogram!

How do you rembmer this formula? Most people remember it using the notation

[v\vec]×[w\vec] = |

[i\vec]

[j\vec]

[k\vec]

v₁

v₂

v₃

w₁

w₂

w₃

| = |

v₂

v₃

w₂

w₃

| ®
i

- ê
ê
ê
ê
ê

v₁

v₃

w₁

w₃

ê
ê
ê
ê
ê ®
j

+ ê
ê
ê
ê
ê

v₁

v₂

w₁

w₂

ê
ê
ê
ê
ê ®
k

where |

v₂

v₃
w₂

w₃
| = v₂w₃-v₃w₂, etc.

The cross product satisfies the following equalities:

[v\vec]×[w\vec] = -[w\vec]×[v\vec]

(k[v\vec])×[w\vec] = k([v\vec]×[w\vec])

One immediate application is the equation for a plane:

To find the plane through the three points P, Q, and R in 3-space, look at the vectors [v\vec]PQ and [w\vec]PR . These are vectors between points in our plane, and so they give a pair of directions in the plane. They then must both be perpendicular to the normal vector [n\vec] for the plane. But we know that they are both perpendicular to [v\vec]×[w\vec], and so [v\vec]×[w\vec] must be perpendicular to the plane. In other words, we can choose our normal vector to be [v\vec]×[w\vec]. Using one of our original points (P, say) as a point in the plane, we can write down the equation for the plane using our dot product equation, above.

Chapter 13: Differentiation

§ 1:: Partial derivatives

In one-variable calculus, the derivative of a function y = f(x) is defined as the limit of difference quotients:

f^¢(x)lim_{h® 0}[(f(x+h)-f(x))/h]

and interpreted as an instantaneous rate of chage, or slope of tangent line.

But a function of two variables has two variables; which one do you increment by h to get your difference quotient? The answer is both of them, one at a time. In other words, a function of two variables z = f(x,y) has two (partial) derivatives:

[(¶f)/(¶x)]lim_{h® 0}[(f(x+h,y)-f(x,y))/h] and [(¶f)/(¶y)]= lim_{h® 0}[(f(x,y+h)-f(x,y))/h]

Essentially, [(¶f)/(¶x)] is the derivative of f, thought of solely as a function of x (i.e., pretending that y is a constant), while [(¶f)/(¶y)] is the derivative of f, thought of solely as a function of y.

Different viewpoint, same result:

For one variable calculus f^¢(x) is the slope of the tangent line to the graph of f. As we shall see, The graph of a function of two variables has something we would naturally call a tangent plane, and one way to describe a plane is by computing its x- and y-slopes, i.e, the rate of change of f solely in the x- and y- directions. But this is precisely what the limits above calculate; so [(¶f)/(¶x)] will be the x-slope of the tangent plane, and [(¶f)/(¶y)] will be the y-slope.

The basic picture here is:

Figure

Just as with one variable, there are lots of different notations for describing the partial derivatives: for z = f(x,y),

[(¶f)/(¶x)]= [(¶)/(¶x)](f) =[(¶z)/(¶x)][(¶)/(¶x)](z) = D_x(f) = D_x(z) = f_x = z_x

§ 2:: The algebra of partial derivatives

The basic idea is that since a partial derivative is `really' the derivative of a function of one variable (the other `variable' is really a constant), all of our usual differentiation rules can be applied. so, e.g,

[(¶)/(¶x)](f+g) = [(¶f)/(¶x)] + [(¶g)/(¶x)][(¶)/(¶y)](f+g) = [(¶f)/(¶y)] + [(¶g)/(¶y)]

[(¶)/(¶x)](c·f) = c[(¶f)/(¶x)][(¶)/(¶y)](c·f) = c[(¶f)/(¶y)]

[(¶)/(¶x)](f·g) = [(¶f)/(¶x)]g + f[(¶g)/(¶x)](etc.)

[(¶)/(¶x)](f/g) = ([(¶f)/(¶x)]g-f[(¶g)/(¶x)])/g² (etc.)

[(¶)/(¶x)](h(f(x,y))) = h^¢(f(x,y))·[(¶f)/(¶x)](etc.)

In the end the way we should getused to taking a partial derivative is exactly the same as for functions of one variable; just read from the outside in, applying each rule as it is appropriate. The only difference now is that what tking a derivative of a function z = f(x,y), we need to remember that

[(¶)/(¶x)](y) = 0 and [(¶)/(¶y)](x) = 0

§ 3:: Tangent planes

In one-variable calculus, we can convince ourselves that a function has a tangent line at a point by zooming in on that point of the graph; the closer we look, the `straighter' the graph appears to be. At extreme magnification, the graph looks just like a line - its tangent line.

Functions of two variables are really no different; as we zoom in, the graph of our function f starts to look like a plane - the graph's tangent plane. Finding the equation of this tangent plane is really a matter of determining its x- and y-slopes, which is precisely what the partial derivatives of f do. The x-slope is the rate of change of the function in the x-direction, i.e., the partial derivative with respect to x; and similarly for the y-slope.

So the equation for the tangent plane to the graph of z = f(x,y) at the point

(a,b,f(a,b)) is z = [(¶f)/(¶x)](a,b)(x-a)+[(¶f)/(¶y)](a,b)(y-b)+f(a,b)

And just as with one-variable calculus, one use we put this to is to find good approximations to f(x,y) at points near (a,b);

f(x,y) » [(¶f)/(¶x)](a,b)(x-a)+[(¶f)/(¶y)](a,b)(y-b)+f(a,b), for (x,y) near (a,b)

As with one variable, this also goes hand-in-hand with the idea of differentials:

df = f_x(a,b) dx + f_y(a,b)dy = differential of f at (a,b)

And as before, f(x,y)-f(a,b) » df, when dx = x-a and dy = y-b are small.

§ 4:: The gradient

[(¶f)/(¶x)] and [(¶f)/(¶y)] measure the instantaneous rate of change of f in the x- and y-directions, respectively. But what if we want to know the rate of change of f in the direction of the vector 3[i\vec]-4[j\vec] ? By thinking of the partial derivatives in a slightly different way, we can get a clue to how to answer this question.

By writing f_x(a,b) = lim_{h® 0}[(f((a,b)+h(1,0)) - f(a,b))/h]

and f_y(a,b) = lim_{h® 0}[(f((a,b)+h(0,1)) - f(a,b))/h],

we can make the two derivatives look the same; which motivates us to define the directional derivative of f at (a,b), in the direction of the vector [u\vec], as

f_[u\vec](a,b) = D_[u\vec](a,b) = lim_{h® 0}[(f((a,b)+h[u\vec]) - f(a,b))/h]

[Technically, we need [u\vec] to be a unit vector, ||[u\vec]||; for other vectors [v\vec], we would define D_[v\vec](f) = D_{[v\vec]/||[v\vec]||}(f).]

But running to the limit definition all of the time would take up way too much of our time; we need a better way to calculate directional derivatives! We can figure out how to do this using differentials:

For [u\vec](u₁,u₂), f((a,b)+h[u\vec] » df = f_x(a,b) hu₁ + f_y(a,b)hu₂, so

[(f((a,b)+h[u\vec]) - f(a,b))/h] » f_x(a,b) u₁ + f_y(a,b) u₂

and so taking the limit, we find that f_[u\vec](a,b) = f_x(a,b) u₁ + f_y(a,b) u₂ = (f_x(a,b),f_y(a,b))·[u\vec] .

The vector (f_x(a,b),f_y(a,b)) is going to come up often enough that we will give it its own name;

(f_x(a,b),f_y(a,b)) = Ñ(f)(a,b) = grad(f)(a,b) = the gradient of f

So the derivative f in the direction of [u\vec] is the dot product of [u\vec] with the gradient of f. This means that (when q is the angle between Ñf and [u\vec]), D_[u\vec](f) = ||Del f||·||[u\vec]||·cos(q) = ||Ñf||cos(q) . This is the largest when cos(q) = 1, i.e., q = 0 i.e., [u\vec] points in the same direction as Ñf . So Ñf points in the direction of largest increase for the function f, at every point (a,b) . Its length is this maximum rate of increase.

On the other hand, when [u\vec] points in the same direction as the level curve for the point (a,b) (i.e., it is tangent to the level curve), then the rate of change of f in that direction is 0; so Ñf·[u\vec] = 0, i.e, Ñf^[u\vec] . This means that Ñf is perpendicular to the level curves of f, at every point (a,b).

§ 5:: Gradients for functions of 3 variables

For functions of 3 variables, everything works pretty much the same. We can make a similar construction of the directional derivative of w = f(x,y,z); using the differential of f,

df = f_x(a,b)dx + f_y(a,b)dy + f_z(a,b)dz

we can compute that D_[u\vec](f) = Ñf·[u\vec], where Ñf = (f_x,f_y,f_z) is the gradient of f. For the exact same reasons, this means that Ñf points in the direction of maximal increase for f, and Ñf is perpendicular to the level surfaces for f.

We can use the gradient of functions of 3 variables to help us understand the graphs of functions of two variables, since we can think of the graph of a function of two variables, z = f(x,y), as a level curve of a function of 3 variables

g(x,y,z) = f(x,y)-z = 0

The gradient of g is perpendicular to its level curves, so it is perpendicular to the graph of f, so gives us the normal vector for the tangent plane to the graph of f. Computing, we find that

Ñg = ([(¶f)/(¶x)],[(¶f)/(¶y)],-1) = [n\vec]

which means that the equation for the tangent plane to the graph of z = f(x,y) at the point (a,b,f(a,b)) is

([(¶f)/(¶x)](a,b),[(¶f)/(¶y)](a,b),-1)·(x-a,y-b,z-f(a,b)) = 0

§ 6:: The Chain Rule

If f is a function of the variables x and y, and both x and y depend on a single variable t, then in a certain sense, f is a fuinction of t; f(x,y) = f(x(t),y(t)); it is a composition. To find its derivative with respect to t, we can turn to differentials:

df = f_x dx+f_y dy, while dx = [dx/dt]dt and dy = [dy/dt]dt . Putting these together we get

df = ([(¶f)/(¶x)][dx/dt]+ [(¶f)/(¶y)][dy/dt])dt = [df/dt]dt , which implies that [df/dt]= [(¶f)/(¶x)][dx/dt]+ [(¶f)/(¶y)][dy/dt]

This is the (or rather, one of the) Chain Rule(s) for functions of several variables. A similar line of reasoning would lead us to:

If z = f(u,v) and u = u(x,y) and v = v(x,y), then

[(¶f)/(¶x)]= [(¶f)/(¶u)][(¶u)/(¶x)]+ [(¶f)/(¶v)][(¶v)/(¶x)]. A similar formula would hold for [(¶f)/(¶y)].

In general, we can imagine a composition of functions of several variables as a picture with each variable linked by a line going up to functions it is a variable of, and linked by a line going down to variables it is a function of, with the original function f at the top. To find the derivative of f with respect to a variable, one finds all paths leading down from f to the variable, multiplying together all of the partial derivatives of one varaible w.r.t. the variable below it, and adding these products together, one for each path. This can, as before, be verified using differentials.

File translated from T_EX by T_TH, version 0.9.