Sergei Yakovenko's blog: on Math and Teaching

Sunday, March 25, 2018

Lecture 2. March 25, 2018

Differentiable maps

  • Definition of differentiability at a point. Maps f:U\to W between open subspaces of the Euclidean spaces U\subseteq \mathbb R^n,\ W\subseteq\mathbb R^m smooth on their domain.
  • Tangent spaces T_a U, tangent bundle TU=\bigcup_{a\in U}T_a U\simeq U\times\mathbb R^n.
  • Differential of a smooth map: \mathrm df:TU\to TW.
  • What is the derivative? (answer: exists only when n=m=1). Partial derivatives.
  • How do we define functions “having more than one derivative”?

Algebraic formalism:

  • Algebra C^\infty(U) of functions infinitely smooth in a domain U\subseteq\mathbb R^n
  • Pullback morphism of algebras f^*:C^\infty(W)\to C^\infty(U).

Vector fields: smooth maps v:U\to TU, such that v(a)\in T_a U.
Lie (directional, flow) derivations L_v:C^\infty(U)\to C^\infty(U). The Leibniz rule (algebra) and its meaning (“Any Leibniz linear map of C^\infty(U) to itself is a Lie derivative along some vector field).
Commutator of two vector fields (to be discussed more in the future).
Push-forward of vector fields by smooth invertible maps.


Tuesday, November 15, 2016

Lecture 2 (Nov. 14, 2016).

Filed under: Calculus on manifolds course — Sergei Yakovenko @ 5:07
Tags: , ,

Tangent vectors, vector fields, integration and derivations

Continued discussion of calculus in domains lf \mathbb R^n.

  • Tangent vector: vector attached to a point, formally a pair (a,v):\ a\in U\subseteq\mathbb R^n, \ v\in\mathbb R^n. Tangent space T_a U=\ \{a\}\times\mathbb R^n.
  • Differential of a smooth map F: U\to V at a point a\in U: the linear map from T_a U to T_b V,\ b=F(a).
  • Vector field: a smooth map v(\cdot): a\mapsto v(a)\in T_a U.  Vector fields as a module \mathscr X(U) over C^\infty(U).
  • Special features of \mathbb R^1\simeq\mathbb R_{\text{field}}. Special role of functions as maps f:\ U\to \mathbb R_{\text{field}} and curves as maps \gamma: \mathbb R_{\text{field}}\to U.
  • Integral curves and derivations.
  • Algebra of smooth functions C^\infty(U). Contravariant functor F \mapsto F^* which associates with each smooth map F:U\to V a homomorphism of algebras F^*:C^\infty(V)\to C^\infty(V). Composition of maps vs. composition of morphisms.
  • Derivation: a \mathbb R-linear map L:C^\infty(U)\to C^\infty(U) which satisfies the Leibniz rule L(fg)=f\cdot Lg+g\cdot Lf.
  • Vector fields as derivations, v\simeq L_v. Action of diffeomorphisms on vector fields (push-forward F_*).
  • Flow map of a vector field: a smooth map F: \mathbb R\times U\to U (caveat: may be undefined for some combinations unless certain precautions are met) such that each curve
    \gamma_a=F|_{\mathbb R\times \{a\}} is an integral curve of v at each point a. The “deterministic law” F^t\circ F^s=F^{t+s}\ \forall t,s\in\mathbb R.
  •  One-parametric (commutative) group of self-homomorphisms A^t=(F^t)^*: C^\infty(U)\to C^\infty(U). Consistency: L=\left.\frac{\mathrm d}{\mathrm dt}\right|_{t=0}A^t=\lim_{t\to 0}\frac{A^t-\mathrm{id}}t is a derivation (satisfies the Leibniz rule). If A^t=(F^t)^* is associated with the flow map of a vector field v, then L=L_v.

Update The corrected and amended notes for the first two lectures can be found here. This file replaces the previous version.

Monday, November 7, 2016

Lecture 1 (Nov 7, 2016)

Crash course on linear algebra and multivariate calculus

Real numbers as complete ordered field. Finite dimensional linear spaces over \mathbb R. Linear maps. Linear functionals, the dual space. Linear operators (self-maps of linear space), invertibility via determinant. Affine maps, affine spaces.

Polynomial nonlinear maps and functions, re-expansion as a tool to construct linear (affine) approximation. Differential. Differentiability of maps, smoothness of functions.

Inverse function theorem.

Vector fields, parameterized curves, differential equations.

The first set of notes is available here here.

Saturday, December 19, 2015

Lecture 8, Dec 15

Higher derivatives and better approximation

We discussed a few issues:

  • Lagrange interpolation formula: how to estimate the difference f(b)-f(a) through the derivative f'?
  • Consequence: vanishing of several derivatives at a point means that a function has a “root of high order” at this point (with explanation, what does that mean).
  • Taylor formula for polynomials: if you know all derivatives of a polynomial at some point, then you know it everywhere.
  • Peano formula for C^n-smooth functions: approximation by the Taylor polynomial with asymptotic bound for the error.
  • Lagrange formula: explicit estimate for the error.

The notes (updated) are available here.

Saturday, December 12, 2015

Lecture 7, Dec 8, 2015

Differentiability and derivative

Continuity of functions (and maps) means that they can be nicely approximated by constant functions (maps) in a sufficiently small neighborhood of each point. Yet the constant maps (easy to understand as they are) are not the only “simple” maps.

Linear maps

Linear maps naturally live on vector spaces, sets equipped with a special structure. Recall that \mathbb R is algebraically a field: real numbers cane be added, subtracted between themselves and the ratio \alpha/\beta is well defined for \beta\ne0.

Definition. A set V is said to be a vector space (over \mathbb R), if the operations of addition/subtraction V\owns u,v\mapsto u\pm v and multiplication by constant V\owns v,\ \mathbb R\owns \alpha\mapsto \alpha v are defined on it and obey the obvious rules of commutativity, associativity and distributivity. Some people prefer to call vector spaces linear spaces: the two terms are identical.

Warning. There is no “natural” multiplication V\times V\to V!


  1. The field \mathbb R itself. If we want to stress that it is considered as a vector space, we write \mathbb R^1.
  2. The set of tuples \mathbb R^n=(x_1,\dots,x_n),\ x_i\in\mathbb R is the Euclidean n-space. For n=2,3 it can be identified with the “geometric” plane and space, using coordinates.
  3. The set of all polynomials of bounded degree \leq d with real coefficients.
  4. The set of all polynomials \mathbb R[x] without any control over the degree.
  5. The set C([0,1]) of all continuous functions on the segment [0,1].

Warning. The two last examples are special: the corresponding spaces are not finite-dimensional (we did not have time to discuss what is the dimension of a linear space in general…)

Let V,Z be two (different or identical) vector spaces and f:V\to W is a function (map) between them.
Definition. The map $f$ is linear, if it preserves the operations on vectors, i.e., \forall v,w\in V,\ \alpha\in\mathbb R,\quad f(v+w)=f(v)+f(w),\ f(\alpha v)=\alpha f(v).

Sometimes we will use the notation V\overset f\longrightarrow Z.

Obvious properties of linearity.

  • f(0)=0 (Note: the two zeros may lie in different spaces!)
  • For any two given spaces V,W the linear maps between them can be added and multiplied by constants in a natural way! If V\overset {f,g}\longrightarrow W, then we define (f+g)(v)=f(v)+g(v) for any v\in V (define \alpha f yourselves). The result will be again a linear map between the same spaces.
  • If V\overset f\longrightarrow W and W\overset g\longrightarrow Z, then the composition g\circ f:V\overset f\longrightarrow W\overset g\longrightarrow Z is well defined and again linear.


  1. Any linear map \mathbb R^1\overset f\longrightarrow \mathbb R^1 has the form x\mapsto ax, \ a\in\mathbb R (do you understand why the notations \mathbb R, \mathbb R^1 are used?)
  2. Any linear map \mathbb R^n\overset f\longrightarrow \mathbb R^1 has the form (x_1,\dots,x_n)\mapsto a_1x_1+\cdots+a_nx_n for some numbers a_1,\dots,a_n. Argue that all such maps form a linear space isomorphic to \mathbb R^n back again.
  3. Explain how linear maps from \mathbb R^n to \mathbb R^m can be recorded using n\times m-matrices. How the composition of linear maps is related to the multiplication of matrices?

The first example shows that linear maps of \mathbb R^1 to itself are “labeled” by real numbers (“multiplicators“). Composition of linear maps corresponds to multiplication of the corresponding multiplicators (whence the name). A linear 1-dim map is invertible if and only if the multiplicator is nonzero.

Corollary. Invertible linear maps \mathbb R^1\to\mathbb R^1 constitute a commutative group (by composition) isomorphic to the multiplicative group \mathbb R^*=\mathbb R\smallsetminus \{0\}.


Maps of the form V\to V, \ v\mapsto v+h for a fixed vector h\in V (the domain and source coincide!) are called shifts (a.k.a. translations). Warning: The shifts are not linear unless h=0! Composition of two shifts is again a shift.

Prove that all translations form a commutative group (by composition) isomorphic to the space V itself. (Hint: this is a tautological statement).

Affine maps

A map f:V\to W between two vector spaces is called affine, if it is a composition of a linear map and translations.

Any affine map \mathbb R^1\to\mathbb R^1 has the form x\mapsto ax+b for some a,b\in\mathbb R. Sometimes it is more convenient to write the map under the form x\mapsto a(x-c)+b: this is possible for any point c\in\mathbb R^1. Note that the composition of affine maps in dimension 1 is not commutative anymore.

Key computation. Assume you are given a map f:\mathbb R^1\to\mathbb R^1 in the sense that you can evaluate it at any point c\in\mathbb R^1. Suppose an oracle tells you that this map is affine. How can you restore the explicit formula f(x)=a(x-c)+b for f?

Obviously, b=f(c). To find \displaystyle a=\frac{f(x)-b}{x-c}, we have to plug into it any point x\ne c and the corresponding value f(x). Given that b=f(c), we have \displaystyle a=\frac{f(x)-f(c)}{x-c} for any choice of x\ne c.

The expression a_c(x)=\displaystyle \frac{f(x)-f(c)}{x-c} for a non-affine function f is in general not-constant and depends on the choice of the point x.

Definition. A function f:\mathbb R^1\to\mathbb R^1 is called differentiable at the point c, if the above expression for a_c(x), albeit non-constant, has a limit as x\to c:\ a_c(x)=a+s_c(x), where s_c(x) is a function which tends to zero. The number a is called the derivative of f at the point c and denoted by f'(c) (and also by half a dozen of other symbols: \frac{df}{dx}(c),Df(c), D_xf(c), f_x(c), …).

Existence of the limit means that near the point c the function f admits a reasonable approximation by an affine function \ell(x)=a(x-c)+b: f(x)=\ell(x)+s_c(x)(x-c), i.e., the “non-affine part” s_c(x)\cdot (x-c) is small not just by itself, but also relative to small difference x-c.

Differentiability and algebraic operations

See the notes and their earlier version.

The only non-obvious moment is differentiability of the product: the product (unlike the composition) of affine functions is not affine anymore, but is immediately differentiable:

[b+a(x-c)]\cdot[q+p(x-c)]=pq+(aq+bp)(x-c)+ap(x-c)^2, but the quadratic term is vanishing relative to x-c, so the entire sum is differentiable.

Exercise. Derive the Leibniz rule for the derivative of the product.

Derivative and the local study of functions

Affine functions have no (strong) maxima or minima, unless restricted on finite segments. Yet absence of the extremum is a strong property which descends from the affine approximation to the original function. Details here and here.

Monday, February 1, 2010

Lecture 11 (Jan 19, 2010)

Functions of two variables and their derivation. Application to the study of planar curves.

The idea of linear approximation (and differentiability) can be easily adopted for functions of more than one variable. Apart from the usual (scalar) functions f:\mathbb R^1\to\mathbb R^1 we will consider applications  \mathbb R^1\to\mathbb R^2 (parametrized curves) and functions \mathbb R^2\to\mathbb R^1 of two variables.

A curve t\mapsto (x(t),y(t)) can be considered in kinematic (mechanical) terms as the description of a travel in the plane with coordinates x,y, parametrized by the time t. The notions of limit, continuity, differentiability are defined as the corresponding properties of the coordinate functions t\mapsto x(t) and t\mapsto y(t).


  1. The graph of any continuous (differentiable) function f of one variable is a continuous (resp., differentiable) curve t\mapsto (t,f(t)).
  2. The circle x^2+y^2=1 is the image of the curve t\mapsto (\cos t,\sin t). Note that this image is “covered” infinitely many times: preimage of any point on the circle is an entire arithmetical progression with the difference 2\pi.
  3. The cusp (חוֹד) y^2=x^3 is the image of the differentiable curve t\mapsto (t^2,t^3). Yet the cusp itself is not “smooth-looking”:

    Cusp point

The “linear approximation” at a point \tau\in\mathbb R^1 takes the unit “positive” vector to \mathbb R^1, attached to the point \tau, to the vector (v,w)\in\mathbb R^2, tangent to the plane at the point (a,b)=(x(\tau),b(\tau)), with the coordinates v=\frac{dx}{dt}(\tau),~w=\frac{dy}{dt}(\tau). The vector (v,w)\in\mathbb R^2 is called the velocity vector, or the tangent vector to the curve.

When a differentiable parametrized curve is smooth? and what is smoothness?

Definition. A planar curve is smooth at a point (a,b)\in\mathbb R^2, if inside some sufficiently small square |x-a|<\delta,~|y-b|<\delta this curve can be represented as the graph of a function y=f(x) or x=g(y) differentiable at the point a (resp., b).

Proposition. A differentiable (parametrized) curve with nonzero velocity vector is smooth at the corresponding point.

Proof. Assume that v\ne 0; then the function t\mapsto x(t) has nonzero derivative and hence is locally invertible, so that t can be expressed as a differentiable function of x, t=\phi(x). Then the curve is the graph of the function y(\phi(x).

Problem. Give an example of a differentiable (parametrized) curve with zero velocity at some point, which is nonetheless smooth.

Differentiability of functions of two variables

While the definitions of the limit and continuity for a function of two variables are rather straightforward (open intervals |x-a|<\delta in the one-variable definition should be replaced by open squares |x-a|<\delta,~|y-a|<\delta with the rest staying the same), the notion of differentiability cannot so easily be reduced to one-dimensional case.

We want the function of two variables f(x,y) be approximable by a linear map near the point (a,b)\in\mathbb R^2. This means that there exist two constants A,B (coordinates of the approximating map) such that the difference is fast going to zero,

\displaystyle [f(x,y)-f(a,b)]-[A(x-a)+B(y-b)]=o(x,y),\qquad \frac{o(x,y)}{|x-a|+|y-b|}\to0\quad\text{as }(x,y)\to(a,b).

Denote by dx (resp., dy the linear functions which take the value v (resp., w) on the vector with the coordinates (v,w)\in\mathbb R^2. Then the linear map df, approximating the function f, can be written as df=A\,dx+B\,dy. To compute the coefficients A,B, consider the functions g(y)=f(a,y) and h(x)=f(x,b) of one argument each. Then A=g'(b), B=h'(a) are partial derivatives of f with respect to the variables x,y at the point (a,b).

Definition. For a function of two variables f(x,y) the partial derivative with respect to the variable x at a point (a,b)\in\mathbb R^2  is the limit (if it exists)

\displaystyle \lim_{h\to 0}\frac{f(a+h,b)-f(a,b)}{h}=\frac{\partial f}{\partial x}(a,b)=f_x(a,b)=D_xf(a,b)=\cdots.

Example. Consider the function f(x,y)=x^2+y^2. Its differential is equal to 2x\,dx+2y\,dy and at a point (a,b) on the unit circle it vanishes on the vector with coordinates (-b,a) tangent to the level curve f(x,y)=\text{const} (circle) passing through the point (a,b).

Functions of two variables are nicely represented by their level curves (קווי גובה). Typically these look like smooth curves, but eventually one can observe “singularities”.

Examples. Draw the level curves of the functions f(x,y)=x^2+y^2 and g(x,y)=x^2-y^2. In the second case you may help yourself by changing variables from x,y to X,Y as follows, X=x-y,~Y=x+y and look at the picture below,

Definition. A point (a,b)\in\mathbb R^2 inside the domain of a differentiable function F(x,y) is called critical, if the linear approximation (the differential) is identically zero, i.e., if both partial derivatives \frac{\partial F}{\partial x}(a,b),~\frac{\partial F}{\partial y}(a,b) are zeros. Otherwise the point is called regular.

If we replace a nonlinear function F(x,y) by its affine approximation \ell(x,y)=c+A(x-a)+B(y-b), c=f(a,b),~A=\frac{\partial F}{\partial x}(a,b),~B=\frac{\partial F}{\partial y}(a,b) at a regular point, then the level curves of \ell form a family of parallel lines. It turns out that for regular points these lines give a linear approximation for the (nonlinear) level curves of the initial nonlinear function. In other words, the regularity condition (a form of nondegeneracy)  is the condition which guarantees that the linear approximation works nicely also when studying the level curves.

Theorem. If a point (a,b) is regular for a differentiable function F, then there exists a small square around (a,b) such that the piece of the level curve of F passing through this point is a smooth curve tangent to the level line of the affine approximation \ell.

Zero level curve of a nonlinear function and the zero level line of its affine approximation

This claim is often called the Implicit Function Theorem  (משפט הפונקציות הסתומות), however,  the traditional formulation of the Implicit Function Theorem looks quite different!

Theorem. If at a point (a,b) the partial derivative \frac{\partial F}{\partial y}(a,b)  of a differential function F is nonzero, then the equation F(x,y)=c, c=F(a,b)\in\mathbb R, can be locally resolved with respect to y: there exists a differentiable function f(x) defined on a sufficiently small interval |x-a|<\delta, such that F(x,f(x))\equiv0 and f(a)=b. The derivative f'(a) of this function is given by the ratio,

\displaystyle f'(a)=-\frac{\frac{\partial F}{\partial x}(a,b)}{\frac{\partial F}{\partial y}(a,b)}.

Exercise. Prove that the two formulations of the Implicit Function Theorem are in fact equivalent.

Saturday, January 2, 2010

Lecture 9 (Dec 29, 2009)

Approximation by linear functions, differentiability and derivative

Recall: linear spaces, linear functions…

A vector (or linear) space (מרחב וקטורי, sometimes we add explicitly, space over \mathbb R) is a set V equipped with two operations: addition/subtraction V\owns v,w\mapsto v\pm v and multiplication by (real) numbers, \lambda\in\mathbb R,~ v\in V\mapsto \lambda v\in V. These operations obey all the natural rules. The simplest example is the real line \mathbb R itself: to “distinguish” it from the “usual” real numbers, we denote it by \mathbb R^1. The plane \mathbb R^2 is the next simplest case.

A function f:V\to\mathbb R defined on a vector space, is called linear, if it respects both operations, f(u\pm v)=f(u)\pm f(v),\ f(\lambda u)=\lambda f(u).  The set of all linear functions on the given space V is itself a linear space (called dual space, מרחב הדואלי, with the natural operations of addition f+g and rescaling \lambda f on the functions).

Linear functions on \mathbb R^1 can be easily described.

Example. Let f:\mathbb R^1\to\mathbb R be a linear function. Denote by a\in\mathbb R its value at 1: a=f(1). Then for any other point x\in\mathbb R^1, we have x=x\cdot 1 (meaning: vector = number \cdot vector in \mathbb R^1), so by linearity f(x)=x\cdot f(1)=a\cdot x=ax.

Question. Prove that any linear function of two variables has the form f(x,y)=ax+by, where a=f(1,0) and b=f(0,1). Prove that the dual space to the plane \mathbb R^2 is again the plane of vectors (a,b) as above

Warning!! In the elementary geometry and algebra, a linear function is a function whose graph is a real line. Such functions have the general form f(x)=ax+b,\ a,b\in\mathbb R^1, and are linear in the above sense only when b=0. We will call such functions affine (פונקציות אפיניות). The coefficient a will be called the slope (שיפוע) of the affine function.

We first look at the linear functions of one variable only and identify each function f(x)=ax with its coefficient a\in\mathbb R, called multiplicator (מכפיל): it acts on the real line by multiplication by a. The product of two linear functions is non-linear, yet their composition is linear, does not depend on the order and the multiplicator can be easily computed as the product of the individual multiplicators:

f(x)=ax,\ g(x)=bx \implies (f\circ g)(x)=a\,g(x)=abx=(ab)\cdot x=bax=g(f(x))=(g\circ f)(x)

Problem. Compute the composition of two affine functions f(x)=ax+\alpha and g(x)=bx+\beta. Prove that the slope of the composition is the product of the slopes. Is it true that they also always commute? Find an affine function g that commutes (in the sense of the composition) with any affine function f.

Obviously, linear functions are continuous, bounded on any compact set. To know a linear function, it is enough to know its value at only one point (for affine functions, two points are sufficient).

Approximation by a linear function near the origin

Let f:\mathbb R^1\to\mathbb R be a (nonlinear) function. In some (“good”) cases, the graph of such function looks almost like a straight line under sufficiently large magnification.

Example. Consider f(x)=2x+x^2: this function is obviously nonlinear, and f(0)=0, so that its graph passes through the point (0,0)\in\mathbb R^2. Let \varepsilon be a small positive number. The transformation (change of variables) X=x/\varepsilon,\ Y=y/\varepsilon magnifies the small square [-\varepsilon,+\varepsilon]^2 to the square [-1,1]. After this magnification we see that the equation of the curve becomes Y=2Y+\varepsilon X^2. Clearly, as \varepsilon \to 0^+, the magnified curve converges (uniformly on |X|\le 1) to the graph of the linear function Y=2X.

In other words, we see that

\displaystyle \frac{f(\varepsilon X)- \ell(\varepsilon X)}{\varepsilon}=\frac1\varepsilon f(\varepsilon X)-\ell (X)\to 0,

as \varepsilon \to 0, where \ell(X)=2X is the linear approximation to the function f. In particular, we can set X=1 and see that the limit f(\varepsilon)/\varepsilon exists and is equal to 2, the multiplicator of the linear function \ell.

Example. Consider the function f(x)=|x| and treat it by the same magnification procedure. Will there be any limit behavior? Will the limit function be linear?

Approximation near an arbitrary point. Derivative

What if we want to find a linear approximation to the function f(x) at a point a different from the origin, and without the assumption that f(a)=0? One has to change first the coordinates to \hat x=x-a,\ \hat y=y-f(a). In the new “hat” coordinates we can perform the same calculation and see that existence of a linear approximation for \hat f(x)=f(x-a)-f(a) is equivalent to the existence of the limit \displaystyle\frac{ f(a+\varepsilon)-f(a)}{\varepsilon} as \varepsilon\to 0.

Definition. If a\in\mathbb R is an interior point of the domain of a function f and the limit

\displaystyle f'(a)=\lim_{\varepsilon\to0}\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}

exists, then the function f is called differentiable at a and the value of the limit (denoted by f'(a)) is called the derivative of f at this point. The function f':a\mapsto f(a), defined where it is defined, is called the derivative (function) of f.

Warning! Despite everything, the derivative f' is not a linear function (and even not affine!)  The value f'(a) is just the multiplicator (the principal coefficient) of the affine function \ell_a(x) which approximates f near the point a and depends on a.

Notation. There are several notations for the derivative, used in different sources and on different occasions some are more convenient than others. They include (but not reduced to):

\displaystyle f',\quad \frac{df}{dx},\quad \partial_x f,\quad Df,\quad\dots

We will explain the origin (and convenience) of some of these notations in due time.

First rules of derivation. The “derivation map” f\mapsto f' (or the “differential operator” D:f\mapsto Df) is linear: D(f\pm g)=Df\pm Dg, and D(\lambda f)=\lambda Df, assuming that f,g are differentiable on the common interval. Indeed, the sum and the multiple of affine functions approximating the initial ones, are again affine.

The derivative of an affine function D(\alpha x+\beta) is constant and equal to \alpha, since this function ideally approximates itself at each point. In particular, derivative of a constant is identical zero.

Leibniz rule. The product of two affine functions is not affine anymore, yet admits easy approximation. At the origin a=0 the product of two affine functions \ell_\alpha(x)=\alpha x+p and \ell_\beta=\beta x+q, \ \alpha,\beta,p,q\in\mathbb R, is the quadratic function Q(x)=(\alpha\beta)x^2+[\alpha q+\beta p]x+pq which is approximated by the affine function [\alpha q+\beta p]x+pq.  Note that the four constants are the values of the initial functions and their derivatives at the origin, we can write the last formula as

D(f\cdot g)(0)=f'(0)g(0)+f(0)g'(0),\quad f=\ell_\alpha,\ g=\ell_\beta.

This is called Leibnitz formula. To show that it is true for the product of any two differentiable functions, note that any such function can be written under the form f(x)=\ell(x)+o(x), where \ell(x)  is  an affine function and o(x) is a small function such that \lim_{x\to0}\frac{o(x)}x=0. If g(x) is any bounded function, and o(x) is such small function, then the linear approximation to the product g(x)o(x) is identically zero (prove it!). Use the linearity of the approximation to complete the proof of the Leibniz formula for arbitrary differentiable f,g.

Chain rule of differentiation. Let f,g be two differentiable functions and h=g\circ f their composition. Let a\in\mathbb R be an arbitrary point and b=f(a) its f-image. To compute the derivative of h at a, we replace both f,g by their affine approximation at the points a and b respectively. The composition of the affine approximations is again an affine map (see above) and its slope is equal to the product of the slopes. Thus we obtain the result

h'(a)= (g\circ f)'(a)=g'(b)\cdot f'(a)=g'(f(a))\cdot f'(a),\qquad\text{as }b=f(a).

An easy computation shows that adding the small nonlinear terms o(x-a),\ o(y-b) does not change the computation: the derivative of a composition is the product of the derivatives at the appropriate points.

Problem. Consider n differentiable functions f_1,\dots,f_n and their composition h=f_n\circ\cdots\circ f_1. Prove that h'(a_1)=f'_1(a_1)\cdot f_2'(a)\cdots f_n'(a_n), where a_k=f_k(a_{k-1}),\ k=1,2,\dots,n.

In particular, for the pair of mutually inverse functions such that g\circ f(x)\equiv x, the derivatives at the points a and b=f(a) are reciprocal.

Example. Let f(x)=x^n, n\in\mathbb N. Then by induction one can prove that f'(x)=nx^{n-1}. The inverse function g(y)=\sqrt[n]y=y^{1/n} has the derivative \displaystyle\frac1{nx^{n-1}} at the point y=x^n. Substituting x=\sqrt[n]y, we see that

\displaystyle g'(y)=\frac1{ny^{(n-1)/n}}=\mu y^{\mu-1},\qquad\text{when }\mu=\frac1n.

This allows to prove that (x^\mu)'=\mu x^{\mu-1} for all rational powers \mu\in\mathbb Q.

Create a free website or blog at