Multivariable Calculus: Gradients and Directional Derivatives

Temperature varies across a room—warmer near the radiator, cooler near the window. Pressure varies across the atmosphere—higher at lower altitudes. Profit depends on both price and quantity. Functions of multiple variables are everywhere in nature and engineering, and calculus must extend to handle them. The gradient—the multivariable generalization of the derivative—captures how a function changes in all directions simultaneously, and it is the foundation of optimization, machine learning, and mathematical physics.

Partial Derivatives

For a function f(x, y), the partial derivative ∂f/∂x measures how f changes as x varies while y is held fixed—a 'slice' of the full rate of change. Similarly, ∂f/∂y measures change in the y-direction with x fixed. Computing partial derivatives applies the familiar single-variable rules while treating other variables as constants. For f(x, y) = x²y + sin(xy): ∂f/∂x = 2xy + y·cos(xy) and ∂f/∂y = x² + x·cos(xy). Partial derivatives describe local behavior in coordinate-aligned directions, but real problems often require knowing rates of change in arbitrary directions.

\partialf/\partialx: differentiate w.r.t. x, treat y as constant \partialf/\partialy: differentiate w.r.t. y, treat x as constant

The Gradient

The gradient ∇f assembles all partial derivatives into a vector: ∇f = (∂f/∂x, ∂f/∂y, ∂f/∂z, …). This vector points in the direction of steepest increase of f at any point, with magnitude equal to the rate of increase in that direction. On a topographic map, the gradient at any point is perpendicular to the contour lines and points uphill. In machine learning, the negative gradient points toward the minimum of a loss function, which is precisely why gradient descent works: stepping in the negative gradient direction reduces the function most rapidly.

\nablaf = (\partialf/\partialx, \partialf/\partialy, \partialf/\partialz) Points in direction of steepest increase

Directional Derivatives

The directional derivative D_u f measures how f changes in any direction specified by a unit vector u: D_u f = ∇f · u. This dot product has a maximum when u aligns with ∇f (steepest ascent), a minimum when u points opposite ∇f (steepest descent), and equals zero when u is perpendicular to ∇f. The zero case is important: the gradient is perpendicular to level sets (contour curves in 2D, level surfaces in 3D). This geometric fact underlies the method of Lagrange multipliers for constrained optimization.

Directional derivative: D_u f = \nablaf \cdot û (û is a unit vector)

The Chain Rule in Multiple Dimensions

The multivariable chain rule handles compositions of functions. If z = f(x, y) and x = x(t), y = y(t), then dz/dt = (∂f/∂x)(dx/dt) + (∂f/∂y)(dy/dt). In general, if z depends on several intermediate variables each depending on several parameters, the chain rule produces a sum of products—one for each path from the parameters to z through the intermediate variables. This general chain rule is essential in backpropagation, the algorithm for training neural networks, which is simply the chain rule applied systematically to compute gradients through many composed functions.

Applications: Optimization

At a local minimum or maximum of f, all partial derivatives vanish: ∇f = 0. These critical points are found by solving the system ∂f/∂x = 0, ∂f/∂y = 0, … simultaneously. The second derivative test classifies critical points using the Hessian matrix of second partial derivatives: if the Hessian is positive definite at a critical point, it's a local minimum; negative definite means a local maximum; indefinite means a saddle point. This framework underlies optimization throughout economics, engineering, and machine learning where objectives depend on many variables simultaneously.

Conclusion

Multivariable calculus extends single-variable ideas to the functions of multiple inputs that dominate real applications. The gradient unifies partial derivatives into a single geometric object—the steepest ascent direction—that drives optimization algorithms, describes physical fields, and underlies mathematical physics. From weather prediction to machine learning to economic optimization, the gradient is the essential tool for understanding how multi-input systems change and how to move them toward desired states.