# Differentiability and derivative

Continuity of functions (and maps) means that they can be nicely approximated by constant functions (maps) in a sufficiently small neighborhood of each point. Yet the constant maps (easy to understand as they are) are not the only “simple” maps.

## Linear maps

Linear maps naturally live on vector spaces, sets equipped with a special structure. Recall that $\mathbb R$ is algebraically a field: real numbers cane be added, subtracted between themselves and the ratio $\alpha/\beta$ is well defined for $\beta\ne0$.

Definition. A set $V$ is said to be a vector space (over $\mathbb R$), if the operations of addition/subtraction $V\owns u,v\mapsto u\pm v$ and multiplication by constant $V\owns v,\ \mathbb R\owns \alpha\mapsto \alpha v$ are defined on it and obey the obvious rules of commutativity, associativity and distributivity. Some people prefer to call vector spaces linear spaces: the two terms are identical.

Warning. There is no “natural” multiplication $V\times V\to V$!

Examples.

1. The field $\mathbb R$ itself. If we want to stress that it is considered as a vector space, we write $\mathbb R^1$.
2. The set of tuples $\mathbb R^n=(x_1,\dots,x_n),\ x_i\in\mathbb R$ is the Euclidean $n$-space. For $n=2,3$ it can be identified with the “geometric” plane and space, using coordinates.
3. The set of all polynomials of bounded degree $\leq d$ with real coefficients.
4. The set of all polynomials $\mathbb R[x]$ without any control over the degree.
5. The set $C([0,1])$ of all continuous functions on the segment $[0,1]$.

Warning. The two last examples are special: the corresponding spaces are not finite-dimensional (we did not have time to discuss what is the dimension of a linear space in general…)

Let $V,Z$ be two (different or identical) vector spaces and $f:V\to W$ is a function (map) between them.
Definition. The map $f$ is linear, if it preserves the operations on vectors, i.e., $\forall v,w\in V,\ \alpha\in\mathbb R,\quad f(v+w)=f(v)+f(w),\ f(\alpha v)=\alpha f(v)$.

Sometimes we will use the notation $V\overset f\longrightarrow Z$.

Obvious properties of linearity.

• $f(0)=0$ (Note: the two zeros may lie in different spaces!)
• For any two given spaces $V,W$ the linear maps between them can be added and multiplied by constants in a natural way! If $V\overset {f,g}\longrightarrow W$, then we define $(f+g)(v)=f(v)+g(v)$ for any $v\in V$ (define $\alpha f$ yourselves). The result will be again a linear map between the same spaces.
• If $V\overset f\longrightarrow W$ and $W\overset g\longrightarrow Z$, then the composition $g\circ f:V\overset f\longrightarrow W\overset g\longrightarrow Z$ is well defined and again linear.

Examples.

1. Any linear map $\mathbb R^1\overset f\longrightarrow \mathbb R^1$ has the form $x\mapsto ax, \ a\in\mathbb R$ (do you understand why the notations $\mathbb R, \mathbb R^1$ are used?)
2. Any linear map $\mathbb R^n\overset f\longrightarrow \mathbb R^1$ has the form $(x_1,\dots,x_n)\mapsto a_1x_1+\cdots+a_nx_n$ for some numbers $a_1,\dots,a_n$. Argue that all such maps form a linear space isomorphic to $\mathbb R^n$ back again.
3. Explain how linear maps from $\mathbb R^n$ to $\mathbb R^m$ can be recorded using $n\times m$-matrices. How the composition of linear maps is related to the multiplication of matrices?

The first example shows that linear maps of $\mathbb R^1$ to itself are “labeled” by real numbers (“multiplicators“). Composition of linear maps corresponds to multiplication of the corresponding multiplicators (whence the name). A linear 1-dim map is invertible if and only if the multiplicator is nonzero.

Corollary. Invertible linear maps $\mathbb R^1\to\mathbb R^1$ constitute a commutative group (by composition) isomorphic to the multiplicative group $\mathbb R^*=\mathbb R\smallsetminus \{0\}$.

## Shifts

Maps of the form $V\to V, \ v\mapsto v+h$ for a fixed vector $h\in V$ (the domain and source coincide!) are called shifts (a.k.a. translations). Warning: The shifts are not linear unless $h=0$! Composition of two shifts is again a shift.

Exercise.
Prove that all translations form a commutative group (by composition) isomorphic to the space $V$ itself. (Hint: this is a tautological statement).

## Affine maps

Definition.
A map $f:V\to W$ between two vector spaces is called affine, if it is a composition of a linear map and translations.

Example.
Any affine map $\mathbb R^1\to\mathbb R^1$ has the form $x\mapsto ax+b$ for some $a,b\in\mathbb R$. Sometimes it is more convenient to write the map under the form $x\mapsto a(x-c)+b$: this is possible for any point $c\in\mathbb R^1$. Note that the composition of affine maps in dimension 1 is not commutative anymore.

Key computation. Assume you are given a map $f:\mathbb R^1\to\mathbb R^1$ in the sense that you can evaluate it at any point $c\in\mathbb R^1$. Suppose an oracle tells you that this map is affine. How can you restore the explicit formula $f(x)=a(x-c)+b$ for $f$?

Obviously, $b=f(c)$. To find $\displaystyle a=\frac{f(x)-b}{x-c}$, we have to plug into it any point $x\ne c$ and the corresponding value $f(x)$. Given that $b=f(c)$, we have $\displaystyle a=\frac{f(x)-f(c)}{x-c}$ for any choice of $x\ne c$.

The expression $a_c(x)=\displaystyle \frac{f(x)-f(c)}{x-c}$ for a non-affine function $f$ is in general not-constant and depends on the choice of the point $x$.

Definition. A function $f:\mathbb R^1\to\mathbb R^1$ is called differentiable at the point $c$, if the above expression for $a_c(x)$, albeit non-constant, has a limit as $x\to c:\ a_c(x)=a+s_c(x)$, where $s_c(x)$ is a function which tends to zero. The number $a$ is called the derivative of $f$ at the point $c$ and denoted by $f'(c)$ (and also by half a dozen of other symbols: $\frac{df}{dx}(c),Df(c), D_xf(c), f_x(c)$, …).

Existence of the limit means that near the point $c$ the function $f$ admits a reasonable approximation by an affine function $\ell(x)=a(x-c)+b$: $f(x)=\ell(x)+s_c(x)(x-c)$, i.e., the “non-affine part” $s_c(x)\cdot (x-c)$ is small not just by itself, but also relative to small difference $x-c$.

# Differentiability and algebraic operations

See the notes and their earlier version.

The only non-obvious moment is differentiability of the product: the product (unlike the composition) of affine functions is not affine anymore, but is immediately differentiable:

$[b+a(x-c)]\cdot[q+p(x-c)]=pq+(aq+bp)(x-c)+ap(x-c)^2$, but the quadratic term is vanishing relative to $x-c$, so the entire sum is differentiable.

Exercise. Derive the Leibniz rule for the derivative of the product.

# Derivative and the local study of functions

Affine functions have no (strong) maxima or minima, unless restricted on finite segments. Yet absence of the extremum is a strong property which descends from the affine approximation to the original function. Details here and here.

## Approximation by linear functions, differentiability and derivative

### Recall: linear spaces, linear functions…

A vector (or linear) space (מרחב וקטורי, sometimes we add explicitly, space over $\mathbb R$) is a set $V$ equipped with two operations: addition/subtraction $V\owns v,w\mapsto v\pm v$ and multiplication by (real) numbers, $\lambda\in\mathbb R,~ v\in V\mapsto \lambda v\in V$. These operations obey all the natural rules. The simplest example is the real line $\mathbb R$ itself: to “distinguish” it from the “usual” real numbers, we denote it by $\mathbb R^1$. The plane $\mathbb R^2$ is the next simplest case.

A function $f:V\to\mathbb R$ defined on a vector space, is called linear, if it respects both operations, $f(u\pm v)=f(u)\pm f(v),\ f(\lambda u)=\lambda f(u)$.  The set of all linear functions on the given space $V$ is itself a linear space (called dual space, מרחב הדואלי, with the natural operations of addition $f+g$ and rescaling $\lambda f$ on the functions).

Linear functions on $\mathbb R^1$ can be easily described.

Example. Let $f:\mathbb R^1\to\mathbb R$ be a linear function. Denote by $a\in\mathbb R$ its value at 1: $a=f(1)$. Then for any other point $x\in\mathbb R^1$, we have $x=x\cdot 1$ (meaning: vector = number $\cdot$ vector in $\mathbb R^1$), so by linearity $f(x)=x\cdot f(1)=a\cdot x=ax$.

Question. Prove that any linear function of two variables has the form $f(x,y)=ax+by$, where $a=f(1,0)$ and $b=f(0,1)$. Prove that the dual space to the plane $\mathbb R^2$ is again the plane of vectors $(a,b)$ as above

Warning!! In the elementary geometry and algebra, a linear function is a function whose graph is a real line. Such functions have the general form $f(x)=ax+b,\ a,b\in\mathbb R^1$, and are linear in the above sense only when $b=0$. We will call such functions affine (פונקציות אפיניות). The coefficient $a$ will be called the slope (שיפוע) of the affine function.

We first look at the linear functions of one variable only and identify each function $f(x)=ax$ with its coefficient $a\in\mathbb R$, called multiplicator (מכפיל): it acts on the real line by multiplication by $a$. The product of two linear functions is non-linear, yet their composition is linear, does not depend on the order and the multiplicator can be easily computed as the product of the individual multiplicators:

$f(x)=ax,\ g(x)=bx \implies (f\circ g)(x)=a\,g(x)=abx=(ab)\cdot x=bax=g(f(x))=(g\circ f)(x)$

Problem. Compute the composition of two affine functions $f(x)=ax+\alpha$ and $g(x)=bx+\beta$. Prove that the slope of the composition is the product of the slopes. Is it true that they also always commute? Find an affine function $g$ that commutes (in the sense of the composition) with any affine function $f$.

Obviously, linear functions are continuous, bounded on any compact set. To know a linear function, it is enough to know its value at only one point (for affine functions, two points are sufficient).

### Approximation by a linear function near the origin

Let $f:\mathbb R^1\to\mathbb R$ be a (nonlinear) function. In some (“good”) cases, the graph of such function looks almost like a straight line under sufficiently large magnification.

Example. Consider $f(x)=2x+x^2$: this function is obviously nonlinear, and $f(0)=0$, so that its graph passes through the point $(0,0)\in\mathbb R^2$. Let $\varepsilon$ be a small positive number. The transformation (change of variables) $X=x/\varepsilon,\ Y=y/\varepsilon$ magnifies the small square $[-\varepsilon,+\varepsilon]^2$ to the square $[-1,1]$. After this magnification we see that the equation of the curve becomes $Y=2Y+\varepsilon X^2$. Clearly, as $\varepsilon \to 0^+$, the magnified curve converges (uniformly on $|X|\le 1$) to the graph of the linear function $Y=2X$.

In other words, we see that

$\displaystyle \frac{f(\varepsilon X)- \ell(\varepsilon X)}{\varepsilon}=\frac1\varepsilon f(\varepsilon X)-\ell (X)\to 0,$

as $\varepsilon \to 0$, where $\ell(X)=2X$ is the linear approximation to the function $f$. In particular, we can set $X=1$ and see that the limit $f(\varepsilon)/\varepsilon$ exists and is equal to 2, the multiplicator of the linear function $\ell$.

Example. Consider the function $f(x)=|x|$ and treat it by the same magnification procedure. Will there be any limit behavior? Will the limit function be linear?

### Approximation near an arbitrary point. Derivative

What if we want to find a linear approximation to the function $f(x)$ at a point $a$ different from the origin, and without the assumption that $f(a)=0$? One has to change first the coordinates to $\hat x=x-a,\ \hat y=y-f(a)$. In the new “hat” coordinates we can perform the same calculation and see that existence of a linear approximation for $\hat f(x)=f(x-a)-f(a)$ is equivalent to the existence of the limit $\displaystyle\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}$ as $\varepsilon\to 0$.

Definition. If $a\in\mathbb R$ is an interior point of the domain of a function $f$ and the limit

$\displaystyle f'(a)=\lim_{\varepsilon\to0}\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}$

exists, then the function $f$ is called differentiable at $a$ and the value of the limit (denoted by $f'(a)$) is called the derivative of $f$ at this point. The function $f':a\mapsto f(a)$, defined where it is defined, is called the derivative (function) of $f$.

Warning! Despite everything, the derivative $f'$ is not a linear function (and even not affine!)  The value $f'(a)$ is just the multiplicator (the principal coefficient) of the affine function $\ell_a(x)$ which approximates $f$ near the point $a$ and depends on $a$.

Notation. There are several notations for the derivative, used in different sources and on different occasions some are more convenient than others. They include (but not reduced to):

$\displaystyle f',\quad \frac{df}{dx},\quad \partial_x f,\quad Df,\quad\dots$

We will explain the origin (and convenience) of some of these notations in due time.

First rules of derivation. The “derivation map” $f\mapsto f'$ (or the “differential operator” $D:f\mapsto Df$) is linear: $D(f\pm g)=Df\pm Dg$, and $D(\lambda f)=\lambda Df$, assuming that $f,g$ are differentiable on the common interval. Indeed, the sum and the multiple of affine functions approximating the initial ones, are again affine.

The derivative of an affine function $D(\alpha x+\beta)$ is constant and equal to $\alpha$, since this function ideally approximates itself at each point. In particular, derivative of a constant is identical zero.

Leibniz rule. The product of two affine functions is not affine anymore, yet admits easy approximation. At the origin $a=0$ the product of two affine functions $\ell_\alpha(x)=\alpha x+p$ and $\ell_\beta=\beta x+q, \ \alpha,\beta,p,q\in\mathbb R$, is the quadratic function $Q(x)=(\alpha\beta)x^2+[\alpha q+\beta p]x+pq$ which is approximated by the affine function $[\alpha q+\beta p]x+pq$.  Note that the four constants are the values of the initial functions and their derivatives at the origin, we can write the last formula as

$D(f\cdot g)(0)=f'(0)g(0)+f(0)g'(0),\quad f=\ell_\alpha,\ g=\ell_\beta.$

This is called Leibnitz formula. To show that it is true for the product of any two differentiable functions, note that any such function can be written under the form $f(x)=\ell(x)+o(x)$, where $\ell(x)$  is  an affine function and $o(x)$ is a small function such that $\lim_{x\to0}\frac{o(x)}x=0$. If $g(x)$ is any bounded function, and $o(x)$ is such small function, then the linear approximation to the product $g(x)o(x)$ is identically zero (prove it!). Use the linearity of the approximation to complete the proof of the Leibniz formula for arbitrary differentiable $f,g$.

Chain rule of differentiation. Let $f,g$ be two differentiable functions and $h=g\circ f$ their composition. Let $a\in\mathbb R$ be an arbitrary point and $b=f(a)$ its $f$-image. To compute the derivative of $h$ at $a$, we replace both $f,g$ by their affine approximation at the points $a$ and $b$ respectively. The composition of the affine approximations is again an affine map (see above) and its slope is equal to the product of the slopes. Thus we obtain the result

$h'(a)= (g\circ f)'(a)=g'(b)\cdot f'(a)=g'(f(a))\cdot f'(a),\qquad\text{as }b=f(a).$

An easy computation shows that adding the small nonlinear terms $o(x-a),\ o(y-b)$ does not change the computation: the derivative of a composition is the product of the derivatives at the appropriate points.

Problem. Consider $n$ differentiable functions $f_1,\dots,f_n$ and their composition $h=f_n\circ\cdots\circ f_1$. Prove that $h'(a_1)=f'_1(a_1)\cdot f_2'(a)\cdots f_n'(a_n)$, where $a_k=f_k(a_{k-1}),\ k=1,2,\dots,n$.

In particular, for the pair of mutually inverse functions such that $g\circ f(x)\equiv x$, the derivatives at the points $a$ and $b=f(a)$ are reciprocal.

Example. Let $f(x)=x^n$, $n\in\mathbb N$. Then by induction one can prove that $f'(x)=nx^{n-1}$. The inverse function $g(y)=\sqrt[n]y=y^{1/n}$ has the derivative $\displaystyle\frac1{nx^{n-1}}$ at the point $y=x^n$. Substituting $x=\sqrt[n]y$, we see that

$\displaystyle g'(y)=\frac1{ny^{(n-1)/n}}=\mu y^{\mu-1},\qquad\text{when }\mu=\frac1n.$

This allows to prove that $(x^\mu)'=\mu x^{\mu-1}$ for all rational powers $\mu\in\mathbb Q$.

Blog at WordPress.com.