# Sergei Yakovenko's blog: on Math and Teaching

## Approximation by linear functions, differentiability and derivative

### Recall: linear spaces, linear functions…

A vector (or linear) space (מרחב וקטורי, sometimes we add explicitly, space over $\mathbb R$) is a set $V$ equipped with two operations: addition/subtraction $V\owns v,w\mapsto v\pm v$ and multiplication by (real) numbers, $\lambda\in\mathbb R,~ v\in V\mapsto \lambda v\in V$. These operations obey all the natural rules. The simplest example is the real line $\mathbb R$ itself: to “distinguish” it from the “usual” real numbers, we denote it by $\mathbb R^1$. The plane $\mathbb R^2$ is the next simplest case.

A function $f:V\to\mathbb R$ defined on a vector space, is called linear, if it respects both operations, $f(u\pm v)=f(u)\pm f(v),\ f(\lambda u)=\lambda f(u)$.  The set of all linear functions on the given space $V$ is itself a linear space (called dual space, מרחב הדואלי, with the natural operations of addition $f+g$ and rescaling $\lambda f$ on the functions).

Linear functions on $\mathbb R^1$ can be easily described.

Example. Let $f:\mathbb R^1\to\mathbb R$ be a linear function. Denote by $a\in\mathbb R$ its value at 1: $a=f(1)$. Then for any other point $x\in\mathbb R^1$, we have $x=x\cdot 1$ (meaning: vector = number $\cdot$ vector in $\mathbb R^1$), so by linearity $f(x)=x\cdot f(1)=a\cdot x=ax$.

Question. Prove that any linear function of two variables has the form $f(x,y)=ax+by$, where $a=f(1,0)$ and $b=f(0,1)$. Prove that the dual space to the plane $\mathbb R^2$ is again the plane of vectors $(a,b)$ as above

Warning!! In the elementary geometry and algebra, a linear function is a function whose graph is a real line. Such functions have the general form $f(x)=ax+b,\ a,b\in\mathbb R^1$, and are linear in the above sense only when $b=0$. We will call such functions affine (פונקציות אפיניות). The coefficient $a$ will be called the slope (שיפוע) of the affine function.

We first look at the linear functions of one variable only and identify each function $f(x)=ax$ with its coefficient $a\in\mathbb R$, called multiplicator (מכפיל): it acts on the real line by multiplication by $a$. The product of two linear functions is non-linear, yet their composition is linear, does not depend on the order and the multiplicator can be easily computed as the product of the individual multiplicators:

$f(x)=ax,\ g(x)=bx \implies (f\circ g)(x)=a\,g(x)=abx=(ab)\cdot x=bax=g(f(x))=(g\circ f)(x)$

Problem. Compute the composition of two affine functions $f(x)=ax+\alpha$ and $g(x)=bx+\beta$. Prove that the slope of the composition is the product of the slopes. Is it true that they also always commute? Find an affine function $g$ that commutes (in the sense of the composition) with any affine function $f$.

Obviously, linear functions are continuous, bounded on any compact set. To know a linear function, it is enough to know its value at only one point (for affine functions, two points are sufficient).

### Approximation by a linear function near the origin

Let $f:\mathbb R^1\to\mathbb R$ be a (nonlinear) function. In some (“good”) cases, the graph of such function looks almost like a straight line under sufficiently large magnification.

Example. Consider $f(x)=2x+x^2$: this function is obviously nonlinear, and $f(0)=0$, so that its graph passes through the point $(0,0)\in\mathbb R^2$. Let $\varepsilon$ be a small positive number. The transformation (change of variables) $X=x/\varepsilon,\ Y=y/\varepsilon$ magnifies the small square $[-\varepsilon,+\varepsilon]^2$ to the square $[-1,1]$. After this magnification we see that the equation of the curve becomes $Y=2Y+\varepsilon X^2$. Clearly, as $\varepsilon \to 0^+$, the magnified curve converges (uniformly on $|X|\le 1$) to the graph of the linear function $Y=2X$.

In other words, we see that

$\displaystyle \frac{f(\varepsilon X)- \ell(\varepsilon X)}{\varepsilon}=\frac1\varepsilon f(\varepsilon X)-\ell (X)\to 0,$

as $\varepsilon \to 0$, where $\ell(X)=2X$ is the linear approximation to the function $f$. In particular, we can set $X=1$ and see that the limit $f(\varepsilon)/\varepsilon$ exists and is equal to 2, the multiplicator of the linear function $\ell$.

Example. Consider the function $f(x)=|x|$ and treat it by the same magnification procedure. Will there be any limit behavior? Will the limit function be linear?

### Approximation near an arbitrary point. Derivative

What if we want to find a linear approximation to the function $f(x)$ at a point $a$ different from the origin, and without the assumption that $f(a)=0$? One has to change first the coordinates to $\hat x=x-a,\ \hat y=y-f(a)$. In the new “hat” coordinates we can perform the same calculation and see that existence of a linear approximation for $\hat f(x)=f(x-a)-f(a)$ is equivalent to the existence of the limit $\displaystyle\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}$ as $\varepsilon\to 0$.

Definition. If $a\in\mathbb R$ is an interior point of the domain of a function $f$ and the limit

$\displaystyle f'(a)=\lim_{\varepsilon\to0}\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}$

exists, then the function $f$ is called differentiable at $a$ and the value of the limit (denoted by $f'(a)$) is called the derivative of $f$ at this point. The function $f':a\mapsto f(a)$, defined where it is defined, is called the derivative (function) of $f$.

Warning! Despite everything, the derivative $f'$ is not a linear function (and even not affine!)  The value $f'(a)$ is just the multiplicator (the principal coefficient) of the affine function $\ell_a(x)$ which approximates $f$ near the point $a$ and depends on $a$.

Notation. There are several notations for the derivative, used in different sources and on different occasions some are more convenient than others. They include (but not reduced to):

$\displaystyle f',\quad \frac{df}{dx},\quad \partial_x f,\quad Df,\quad\dots$

We will explain the origin (and convenience) of some of these notations in due time.

First rules of derivation. The “derivation map” $f\mapsto f'$ (or the “differential operator” $D:f\mapsto Df$) is linear: $D(f\pm g)=Df\pm Dg$, and $D(\lambda f)=\lambda Df$, assuming that $f,g$ are differentiable on the common interval. Indeed, the sum and the multiple of affine functions approximating the initial ones, are again affine.

The derivative of an affine function $D(\alpha x+\beta)$ is constant and equal to $\alpha$, since this function ideally approximates itself at each point. In particular, derivative of a constant is identical zero.

Leibniz rule. The product of two affine functions is not affine anymore, yet admits easy approximation. At the origin $a=0$ the product of two affine functions $\ell_\alpha(x)=\alpha x+p$ and $\ell_\beta=\beta x+q, \ \alpha,\beta,p,q\in\mathbb R$, is the quadratic function $Q(x)=(\alpha\beta)x^2+[\alpha q+\beta p]x+pq$ which is approximated by the affine function $[\alpha q+\beta p]x+pq$.  Note that the four constants are the values of the initial functions and their derivatives at the origin, we can write the last formula as

$D(f\cdot g)(0)=f'(0)g(0)+f(0)g'(0),\quad f=\ell_\alpha,\ g=\ell_\beta.$

This is called Leibnitz formula. To show that it is true for the product of any two differentiable functions, note that any such function can be written under the form $f(x)=\ell(x)+o(x)$, where $\ell(x)$  is  an affine function and $o(x)$ is a small function such that $\lim_{x\to0}\frac{o(x)}x=0$. If $g(x)$ is any bounded function, and $o(x)$ is such small function, then the linear approximation to the product $g(x)o(x)$ is identically zero (prove it!). Use the linearity of the approximation to complete the proof of the Leibniz formula for arbitrary differentiable $f,g$.

Chain rule of differentiation. Let $f,g$ be two differentiable functions and $h=g\circ f$ their composition. Let $a\in\mathbb R$ be an arbitrary point and $b=f(a)$ its $f$-image. To compute the derivative of $h$ at $a$, we replace both $f,g$ by their affine approximation at the points $a$ and $b$ respectively. The composition of the affine approximations is again an affine map (see above) and its slope is equal to the product of the slopes. Thus we obtain the result

$h'(a)= (g\circ f)'(a)=g'(b)\cdot f'(a)=g'(f(a))\cdot f'(a),\qquad\text{as }b=f(a).$

An easy computation shows that adding the small nonlinear terms $o(x-a),\ o(y-b)$ does not change the computation: the derivative of a composition is the product of the derivatives at the appropriate points.

Problem. Consider $n$ differentiable functions $f_1,\dots,f_n$ and their composition $h=f_n\circ\cdots\circ f_1$. Prove that $h'(a_1)=f'_1(a_1)\cdot f_2'(a)\cdots f_n'(a_n)$, where $a_k=f_k(a_{k-1}),\ k=1,2,\dots,n$.

In particular, for the pair of mutually inverse functions such that $g\circ f(x)\equiv x$, the derivatives at the points $a$ and $b=f(a)$ are reciprocal.

Example. Let $f(x)=x^n$, $n\in\mathbb N$. Then by induction one can prove that $f'(x)=nx^{n-1}$. The inverse function $g(y)=\sqrt[n]y=y^{1/n}$ has the derivative $\displaystyle\frac1{nx^{n-1}}$ at the point $y=x^n$. Substituting $x=\sqrt[n]y$, we see that

$\displaystyle g'(y)=\frac1{ny^{(n-1)/n}}=\mu y^{\mu-1},\qquad\text{when }\mu=\frac1n.$

This allows to prove that $(x^\mu)'=\mu x^{\mu-1}$ for all rational powers $\mu\in\mathbb Q$.