Chapter 11: Functions of Several Variables
Function of several variables: several inputs, one output. Picture a quantity which depends on several different quantities. E.g., distance from the origin in the plane:
depends on both the x- and y-coordinates of our point.
Our goal is to understand functions of several variables, in much the same way that the tools of calculus allow us to understand functions of one variable. And our basic tool is going to be to think of a function of several variables as a function of one variable (at a time!), s ithat we can use those tools to good effect.
Cartesian coordinates: three axes, each perpendicular to one another, all meeting at the origin (0,0,0). Axes are labelled by the right hand rule; the thumb, forefinger, and middle finger of the right hand point in the x, y, and z directions, respectively. The point (a,b,c) is the one reached from the origin by moving a units in the direction of the (positive) x-axis, then b units in the y direction, and then c units in the z direction.
The distance between points (a,b,c) and (x,y,z) is given by a formula very similar to the one for the plane:
So, e.g., the points satisfying x2+y2+z2 = 9 are all of the points with distance 3 from the origin (0,0,0), i.e., they form a sphere of radius 3, centered at the origin.
If we set y = c=constant, and look at z = f(x,c) , we are looking at a function of one variable, x, which we can (in theory) graph. This graph is what we would see when the plane y = c meets the graph z = f(x,y) ; this is a (vertical) cross section of our graph (parallel to the plane y = 0, the xz-plane). Similarly, if we set x = d=constant, and look at the graph of z = f(d,y) (as a function of y), we are seeing vertical cross sections of our original graph, parallel to the yz-plane. Several of these x- and y-cross sections together can give a very good picture of the general shape of the graph of our function z=f(x,y) .
Some of the simplest functions to describe are linear functions; functions havng equations of the form z=ax+by+c . Their cross sections are all lines; the cross sections x = const all have the same slope b, and the y-cross sections all have slope a.
Another simple type of function is cylinders; these are functions like f(x,y) = y2 which, although we think of them as functions of x and y, the output does not depend on one of the inputs. Cross sections of such functions, setting equal to a constant whichever variable does not change the value of the function, will all be identical, so the graph looks likecopies of the exact same function, stacked side-by-side.
A collection of level curves also gives a good picture of what the graph of our function f looks like; we can imagine wandering through the doman of our function, reading off the value of the function f by looking at what level curve we are standing on. We usually draw level curves for equally spaced values of c; that way, if the level curves are close together, we know that the function is changing values rapidly in that region, while if they are far appart, the values of the function are not varying by a large amount in that area.
We usually, for convenience, draw the level curves of f on a single xy-plane (since we can keep them somewhat separate), labelling each curve with its z-value. We could reconstruct a picture of the graph of f by simply drawing the level curve f(x,y) = c on the horizontal plane z = c in 3-space.
The most general equation for a plane in 3-space is
although typically, our planes will come in the form z = ax+by+c. The number a is called the x-slope, since it tells us how much z changes if we move 1 unit to the right in the x-direction. For similar reasons, b is called the y-slope.
Typically, there is exactly one plane passing through any three particular points in 3-space (unless they happen to line up in a line, then there are many planes possible). Later we will see how to determine an equation for this plane.
We can also completely describe a plane by knowing a point (x0,y0,z0) on the plane, and its x-slope m and y-slope n; the equation for the plane is then
This will often be our method for finding equations of planes, since it is these three pieces of information which we will know when trying to compute the tangent plane to the graph of a function, as we shall see.
If we look at the level curves of a linear function, z = ax+by+c = d = constant, they are a collection of parallel lines; if we choose equally spaced horizontal levels, they will be equally spaced parallel lines.
Chapter 12: Vectors
Basically, a vector [v\vec] is an arrow pointing from one point in the plane (or 3-space or ...) to another. A vector is thought of as pointing frm its tail to its head. If it points from P to Q, we call the vector [v\vec] = PQ.
A vector has both a size (= length = distance from P to Q) and a direction. Vectors that have the same size and point in the same direction are often thought of as the same, even if they have different tails (and heads). Put differently, by picking up the vector and translating it so that its tail is at the origin (0,0), we can identify [v\vec] with a point in the plane, namely its head (x,y), and write [v\vec] = (x,y). If [v\vec] goes from (a,b) to (c,d), then we would have [v\vec](c-a,d-b). The length of [v\vec] (a,b) is then ||[v\vec]|| = [Ö(a2+b2)].
In 3-space we have three special vectors, pointing in the direction of each coordinate axis (in the plane there are, analogously, two); these are called
These come in especially handy when we start to add vectors. There are several different points of view to vector addition:
(1) move the vector [w\vec] so that its head is on the tail of [v\vec]; then the vector [v\vec]+[w\vec] has tail equal to the tail of [v\vec] and head equal to the head of [w\vec];
(2) move [v\vec] and [w\vec] so that their tails are both at the origin, and build the parallelogram which has sides equal to [v\vec] and [w\vec]; then [v\vec]+[w\vec] is the vector that goes from the origin to the opposite corner of the parallelogram;
(3) if [v\vec](a,b) and [w\vec](c,d), then [v\vec]+[w\vec](a+c,b+d)
We can also subtract vectors; if they share the same tail, [v\vec]-[w\vec] is the vector that points from the head of [w\vec] to the head of [v\vec] (so that [w\vec]+([v\vec]-[w\vec]) = [v\vec]). In coordinates, we simply subtract the coordinates.
We can also rescale vectors = multiply them by a constant factor; a[v\vec]vector pointing in the same direction, but a times as long. (We use the convention that if a < 0, then a[v\vec] points in the opposite direction from [v\vec].)
Using coordinates, this means that a(x,y) = (ax,ay) . To distinguish a from the coordinates or the vector, we call a a scalar.
All of these operations satisfy all of the usual properties you would expect:
[v\vec]+[w\vec][w\vec]+[v\vec]
([v\vec]+[w\vec])+[u\vec][v\vec]+([w\vec]+[u\vec])
a(b[v\vec]) = (ab)[v\vec]
a([v\vec]+[w\vec]) = a[v\vec]+a[w\vec]
Of course there is nothing special in all of this about vectors in the plane; all of these ideas work for vectors in 3-space, or 4-space, or .....
The first, the dot product, is intended top measure the extent to which two vectors [v\vec] and [w\vec]\ are pointing in the same direction. It takes a pair of vectors [v\vec](v1,¼,vn) and [w\vec](w1,¼,wn), and gives us a scalar [v\vec]·[w\vec] = v1w1+¼+vnwn.
Note that [v\vec]·[v\vec] = v12+¼+vn2 = ||[v\vec]||2. In general, [v\vec]·[w\vec] = ||[v\vec]||·||[w\vec]||·cos(q), where q is the angle between the vectors [v\vec] and [w\vec] (when they have the same tail); this can be seen by an application of the Law of Cosines. This in turn allows us to compute this angle:
The angle Q between v and w = the angle (between 0 and p with cos(Q) = áv,wñ/(||v||·||w||)
The dot product satisfy some properties which justify calling it a product:
[v\vec]·[w\vec] = [w\vec]·[v\vec]
(k[v\vec])·[w\vec] = k([v\vec]·[w\vec])
[v\vec]·([w\vec]+[u\vec]) = [v\vec]·[w\vec]+[v\vec]·[u\vec]
Two vectors are orthogonal (= perpendicular) if the angle q between them is p/2, so cos(q)=0; this means that [v\vec]·[w\vec] = 0. We write [v\vec]^[w\vec].
We've seen that it usually take three pieces of information to describe a plane in 3-space (3 points in the plane, or a point and the x- and y-slopes), however, using dot products, we can describe it using only two:
Every plane has a normal vector [n\vec]; [n\vec] is orthogonal to the vector PQ for any pair of points P and Q in the plane. For example, the vector [k\vec] is perpendicular to the xy-plane, sine it is perpendicular to every vector of the form (a,b,0). Given such a normal vector [n\vec] and a point (x0,y0,z0) in the plane, every other point in the plane must satisfy [n\vec]·(x-x0,y-y0,z-z0) = 0; writing this in coordinates gives the equation for the plane;
This means that if we are given the equation of the plane, we can quickly read off what the normal vector for the plane is, as well.
Another application: projecting one vector onto another.
The idea is to figure out how much of one vector [v\vec] points in the direction of another vector [w\vec]. The dot product measures to what extent they are pointing in the same direction, so it is only natural that it plays a role.
What we wish to do is to write [v\vec]c[w\vec] + [u\vec], where [u\vec]^[w\vec] (i.e., write [v\vec] as the part pointing in the direction of [w\vec] and the part ^ [w\vec]). By solving the equation ([v\vec]-c[w\vec])·[w\vec] = 0, we find that c = ([v\vec]·[w\vec])/([w\vec]·[w\vec]).
We write c[w\vec] = proj[w\vec][v\vec] = [([v\vec]·[w\vec])/([w\vec]·[w\vec])][w\vec]= [([v\vec]·[w\vec])/(||[w\vec]||)][[w\vec]/(||[w\vec]||)] = (orthogonal) projection of [v\vec] onto [w\vec]
[u\vec] = [v\vec]-c[w\vec] !
One ingredient we will need: The area of a parallelogram with sides the vectors [v\vec] and [w\vec] :
Area = (bases×(height) = ||[w\vec]||·h = ||[w\vec]||·||[v\vec]||·sin(q) 9from triginometry).
Now, for no apparent reason, we define, for [v\vec](v1,v2,v3) and [w\vec](w1,w2,w3),
We do this because, it turns out, ([v\vec]×[w\vec])^[v\vec] and ([v\vec]×[w\vec])^[w\vec] . [This is just a long tedious computation; take the dot products!] Also, a similar computation will produce
How do you rembmer this formula? Most people remember it using the notation
|
|
| |||
|
|
| |||
|
|
|
|
| ||
|
|
|
| ||
|
|
|
| ||
|
|
where |
|
| ||
|
|
The cross product satisfies the following equalities:
[v\vec]×[w\vec] = -[w\vec]×[v\vec]
(k[v\vec])×[w\vec] = k([v\vec]×[w\vec])
One immediate application is the equation for a plane:
To find the plane through the three points P, Q, and R in 3-space, look at the vectors [v\vec]PQ and [w\vec]PR . These are vectors between points in our plane, and so they give a pair of directions in the plane. They then must both be perpendicular to the normal vector [n\vec] for the plane. But we know that they are both perpendicular to [v\vec]×[w\vec], and so [v\vec]×[w\vec] must be perpendicular to the plane. In other words, we can choose our normal vector to be [v\vec]×[w\vec]. Using one of our original points (P, say) as a point in the plane, we can write down the equation for the plane using our dot product equation, above.
Chapter 13: Differentiation
and interpreted as an instantaneous rate of chage, or slope of tangent line.
But a function of two variables has two variables; which one do you increment by h to get your difference quotient? The answer is both of them, one at a time. In other words, a function of two variables z = f(x,y) has two (partial) derivatives:
Essentially, [(¶f)/(¶x)] is the derivative of f, thought of solely as a function of x (i.e., pretending that y is a constant), while [(¶f)/(¶y)] is the derivative of f, thought of solely as a function of y.
Different viewpoint, same result:
For one variable calculus f¢(x) is the slope of the tangent line to the graph of f. As we shall see, The graph of a function of two variables has something we would naturally call a tangent plane, and one way to describe a plane is by computing its x- and y-slopes, i.e, the rate of change of f solely in the x- and y- directions. But this is precisely what the limits above calculate; so [(¶f)/(¶x)] will be the x-slope of the tangent plane, and [(¶f)/(¶y)] will be the y-slope.
The basic picture here is:
Just as with one variable, there are lots of different notations for describing the partial derivatives: for z = f(x,y),
[(¶)/(¶x)](f+g) = [(¶f)/(¶x)] + [(¶g)/(¶x)][(¶)/(¶y)](f+g) = [(¶f)/(¶y)] + [(¶g)/(¶y)]
[(¶)/(¶x)](c·f) = c[(¶f)/(¶x)][(¶)/(¶y)](c·f) = c[(¶f)/(¶y)]
[(¶)/(¶x)](f·g) = [(¶f)/(¶x)]g + f[(¶g)/(¶x)](etc.)
[(¶)/(¶x)](f/g) = ([(¶f)/(¶x)]g-f[(¶g)/(¶x)])/g2 (etc.)
[(¶)/(¶x)](h(f(x,y))) = h¢(f(x,y))·[(¶f)/(¶x)](etc.)
In the end the way we should getused to taking a partial derivative is exactly the same as for functions of one variable; just read from the outside in, applying each rule as it is appropriate. The only difference now is that what tking a derivative of a function z = f(x,y), we need to remember that
Functions of two variables are really no different; as we zoom in, the graph of our function f starts to look like a plane - the graph's tangent plane. Finding the equation of this tangent plane is really a matter of determining its x- and y-slopes, which is precisely what the partial derivatives of f do. The x-slope is the rate of change of the function in the x-direction, i.e., the partial derivative with respect to x; and similarly for the y-slope.
So the equation for the tangent plane to the graph of z = f(x,y) at the point
And just as with one-variable calculus, one use we put this to is to find good approximations to f(x,y) at points near (a,b);
As with one variable, this also goes hand-in-hand with the idea of differentials:
And as before, f(x,y)-f(a,b) » df, when dx = x-a and dy = y-b are small.
By writing fx(a,b) = limh® 0[(f((a,b)+h(1,0)) - f(a,b))/h]
and fy(a,b) = limh® 0[(f((a,b)+h(0,1)) - f(a,b))/h],
we can make the two derivatives look the same; which motivates us to define the directional derivative of f at (a,b), in the direction of the vector [u\vec], as
f[u\vec](a,b) = D[u\vec](a,b) = limh® 0[(f((a,b)+h[u\vec]) - f(a,b))/h]
[Technically, we need [u\vec] to be a unit vector, ||[u\vec]||; for other vectors [v\vec], we would define D[v\vec](f) = D[v\vec]/||[v\vec]||(f).]
But running to the limit definition all of the time would take up way too much of our time; we need a better way to calculate directional derivatives! We can figure out how to do this using differentials:
For [u\vec](u1,u2), f((a,b)+h[u\vec] » df = fx(a,b) hu1 + fy(a,b)hu2, so
and so taking the limit, we find that f[u\vec](a,b) = fx(a,b) u1 + fy(a,b) u2 = (fx(a,b),fy(a,b))·[u\vec] .
The vector (fx(a,b),fy(a,b)) is going to come up often enough that we will give it its own name;
So the derivative f in the direction of [u\vec] is the dot product of [u\vec] with the gradient of f. This means that (when q is the angle between Ñf and [u\vec]), D[u\vec](f) = ||Del f||·||[u\vec]||·cos(q) = ||Ñf||cos(q) . This is the largest when cos(q) = 1, i.e., q = 0 i.e., [u\vec] points in the same direction as Ñf . So Ñf points in the direction of largest increase for the function f, at every point (a,b) . Its length is this maximum rate of increase.
On the other hand, when [u\vec] points in the same direction as the level curve for the point (a,b) (i.e., it is tangent to the level curve), then the rate of change of f in that direction is 0; so Ñf·[u\vec] = 0, i.e, Ñf^[u\vec] . This means that Ñf is perpendicular to the level curves of f, at every point (a,b).
we can compute that D[u\vec](f) = Ñf·[u\vec], where Ñf = (fx,fy,fz) is the gradient of f. For the exact same reasons, this means that Ñf points in the direction of maximal increase for f, and Ñf is perpendicular to the level surfaces for f.
We can use the gradient of functions of 3 variables to help us understand the graphs of functions of two variables, since we can think of the graph of a function of two variables, z = f(x,y), as a level curve of a function of 3 variables
The gradient of g is perpendicular to its level curves, so it is perpendicular to the graph of f, so gives us the normal vector for the tangent plane to the graph of f. Computing, we find that
which means that the equation for the tangent plane to the graph of z = f(x,y) at the point (a,b,f(a,b)) is
df = fx dx+fy dy, while dx = [dx/dt]dt and dy = [dy/dt]dt . Putting these together we get
df = ([(¶f)/(¶x)][dx/dt]+ [(¶f)/(¶y)][dy/dt])dt = [df/dt]dt , which implies that [df/dt]= [(¶f)/(¶x)][dx/dt]+ [(¶f)/(¶y)][dy/dt]
This is the (or rather, one of the) Chain Rule(s) for functions of several variables. A similar line of reasoning would lead us to:
If z = f(u,v) and u = u(x,y) and v = v(x,y), then
[(¶f)/(¶x)]= [(¶f)/(¶u)][(¶u)/(¶x)]+ [(¶f)/(¶v)][(¶v)/(¶x)]. A similar formula would hold for [(¶f)/(¶y)].
In general, we can imagine a composition of functions of several variables as a picture with each variable linked by a line going up to functions it is a variable of, and linked by a line going down to variables it is a function of, with the original function f at the top. To find the derivative of f with respect to a variable, one finds all paths leading down from f to the variable, multiplying together all of the partial derivatives of one varaible w.r.t. the variable below it, and adding these products together, one for each path. This can, as before, be verified using differentials.