Sergei Yakovenko's blog: on Math and Teaching

Sunday, January 17, 2010

אוסף בעיות מספר 5

לפוסט זה מצורף אוסף בעיות מס’ 5. אם עולות שאלות בקשר לבעיות לפני מפגש בתרגול הבא, אפשר לשאול כאן.

הערה חשובה: היתה טעות בניסוח של שאלה 4, הגרסה המתוקנת נמצאת בקישור לעיל.

Monday, January 4, 2010

Lecture 10 (Jan 5, 2010)

Understanding and using the derivative

A few exotic examples

1. The function f(x)=x^2 h(x), where h(x) is the Dirichle function (equal to 1 when x is rational and 0 if x is irrational) is differentiable at only one point. Why?

2. The function g(x)=x^2\sin (x^k), k\in\mathbb Z, is a source of many examples. Its derivative for x\ne 0 can be computed by the usual rules: g'(x)=2x \sin (x^k)+x^2 \cos (x^k)\cdot kx^{k-1}=2x\sin(x^k)+kx^{k+1}\cos(x^k). This function has a limit at x=0 for k\ge0, is bounded (but discontinuous) for k=-1, is unbounded (without any limit, infinite or not) for k\le -1.

On the other hand, regardless of k, the function is always differentiable at the origin! Thus there exist functions differentiable at each point, whose derivative is discontinuous!

3. There is no function f(x) differentiable everywhere on [-1,+1] such that f' is continuous on [-1,0)\cup(0,+1] and has two unequal limits \lim_{x\to 0-}f'(x)\ne\lim_{x\to 0^+}f'(x). Why?

Derivative and monotonicity

For a function continuous at a point a\in\mathbb R, the sign of the value f(a), if it is nonzero (i.e., \text{sign} f(a)=\pm 1) determines the sign of f at all sufficiently near points (where f is defined, of course). In a similar way, the sign of the derivative of a differentiable function, if it is nonzero, determines the monotonicity direction of f at a: from the definition of the derivative, it follows that

f'(a)>0\implies \exists\delta>0:\quad \forall x_-,x_+\ \text{such that } a-\delta<x_-<a<x_+<a+\delta,\quad f(x_-)<f(a)<f(x_+)

and in a similar way, the function f(x)-f(a) changes its sign from +1 to -1 if f'(a)<0.

Note that this result does not mean that f(x) is monotone growing on [a-\delta,a+\delta]  if f'(a)>0! (Give an example, tuning Example 2 below…) However, this result does imply that for differentiable functions certain points (in a sense, majority) cannot be extrema.


  1. An interior point of smoothness can be extremum only if the derivative vanishes at this point.
  2. A left endpoint can be minimum (resp., maximum), only if the function is differentiable at this point and the derivative is \ge 0 (resp., \le 0).
  3. A right endpoint can be minimum (resp., maximum), only if the function is differentiable at this point and the derivative is \le 0 (resp., \ge 0).

These rules can be easily memorized if instead of the word “function” one substitutes the words “affine function”, when the behavior is obvious. The first (principal) assertion of the theorem means that an affine function can reach an extremum (always non-strict!) only if it has zero slope (i.e., it is a constant).

Warning: if the function is non-differentiable, the theorem says nothing! Consider the functions f(x)=|x| and g(x)=|x|+3x.

Warning: The fact that the derivative vanishes (in the interior or end-point) does not guarantee that the extremum indeed exists. However, if the derivative has a definite sign at the endpoints, this guarantees that they are indeed extrema (in accordance with the theorem).


The derivative as a function f':a\mapsto f(a) is usually non-linear and even not affine, contrary to the declared goal of finding a linear approximation. On the other hand, the affine approximation \ell(x)=\alpha (x-a)+\beta=\alpha x+(\beta-\alpha a),\ \alpha=f'(a),\ \beta=f(a), is in general non-linear and does depend on the point a, the center of approximation. If we want to have a function that would be at the same time linear and explicitly depend on a, we need a function of two variables (and not one). This function is called differential.

Definition. The tangent space to the real line \mathbb R at a point a\in\mathbb R is the vector space of all pairs \{(a,v):\ v\in\mathbb R^1\} with the operations

(a,v)\pm(a,w)=(a,v\pm w),\quad \lambda (a,v)=(a,\lambda v).

An element of this space is called a vector attached to the point a. It differs from the usual, “free” vector, by the “memory”: the attached vector remembers where it grows from. Vectors attached to different points, in general should not be added between themselves: such addition assumes that we can always “translate” vectors from one point to another. While it can still be easy on the plane, in more complicated situations such translation may be problematic (think about translating vector tangent to a circle at one point, to another point on the circle).

Definition. Let a\in\mathbb R be a point at which the function f is differentiable, and v\in\mathbb R^1 is a vector attached to the point a. The differential df is a  function linear in the second argument, which realizes the linear approximation to f, i.e., sends the pair (a,v) into the number f'(a)\cdot v\in\mathbb R.

For animation see the Wolfram page. Note that we treat df as an indivisible symbol for the function of two arguments, though later we will show that it can be considered as the result of application of some operator d to the function f.

How to write the differentials, if their arguments  are “vectors” (even attached to points)? Introduce the “units of measurements” and compare!

Example. Let dx (again, an indivisible symbol) be the function which sends the “attached vector” (a,v),\ a\in\mathbb R,\ v\in\mathbb R^1, into the number v\in\mathbb R\simeq\mathbb R^1. Since any two linear maps  are proportional, another linear map has the form A\,dx, where A\in\mathbb R is the coefficient (slope), which in general may depend on the point a. The linear map approximating a differentiable function f has the form f'(a)\,dx at the point a, and f'(x)\,dx at a general point x\in\mathbb R.  We write the result of this computation symbolically as

df=f'(x)\,dx,\qquad df(a,v)=f'(a)\,dx(v).

Invariance of differentials by the change of variables

The derivative of a function depends on the name of the independent variable. The velocity of the same motion in km/h and in ft/sec is completely different. The notion of the differential is assembled of two parts, both involving the notation of the variable. Yet in a miraculous (well-conceived!) way, the differential is independent of which units (even non-uniform) are used for measurement.

  • Change of variables.
  • Action of differentiable changes of variables on points and on tangent vectors.
  • Example: velocity of the motion along the line. Traveling along the mountain road: height vs. length; height vs. geographic location.
  • Action of differentiable changes of variables on functions. Non-invariance of the derivative.
  • Invariance of the differential. This allows us to write df rather than df(x).

Saturday, January 2, 2010

Lecture 9 (Dec 29, 2009)

Approximation by linear functions, differentiability and derivative

Recall: linear spaces, linear functions…

A vector (or linear) space (מרחב וקטורי, sometimes we add explicitly, space over \mathbb R) is a set V equipped with two operations: addition/subtraction V\owns v,w\mapsto v\pm v and multiplication by (real) numbers, \lambda\in\mathbb R,~ v\in V\mapsto \lambda v\in V. These operations obey all the natural rules. The simplest example is the real line \mathbb R itself: to “distinguish” it from the “usual” real numbers, we denote it by \mathbb R^1. The plane \mathbb R^2 is the next simplest case.

A function f:V\to\mathbb R defined on a vector space, is called linear, if it respects both operations, f(u\pm v)=f(u)\pm f(v),\ f(\lambda u)=\lambda f(u).  The set of all linear functions on the given space V is itself a linear space (called dual space, מרחב הדואלי, with the natural operations of addition f+g and rescaling \lambda f on the functions).

Linear functions on \mathbb R^1 can be easily described.

Example. Let f:\mathbb R^1\to\mathbb R be a linear function. Denote by a\in\mathbb R its value at 1: a=f(1). Then for any other point x\in\mathbb R^1, we have x=x\cdot 1 (meaning: vector = number \cdot vector in \mathbb R^1), so by linearity f(x)=x\cdot f(1)=a\cdot x=ax.

Question. Prove that any linear function of two variables has the form f(x,y)=ax+by, where a=f(1,0) and b=f(0,1). Prove that the dual space to the plane \mathbb R^2 is again the plane of vectors (a,b) as above

Warning!! In the elementary geometry and algebra, a linear function is a function whose graph is a real line. Such functions have the general form f(x)=ax+b,\ a,b\in\mathbb R^1, and are linear in the above sense only when b=0. We will call such functions affine (פונקציות אפיניות). The coefficient a will be called the slope (שיפוע) of the affine function.

We first look at the linear functions of one variable only and identify each function f(x)=ax with its coefficient a\in\mathbb R, called multiplicator (מכפיל): it acts on the real line by multiplication by a. The product of two linear functions is non-linear, yet their composition is linear, does not depend on the order and the multiplicator can be easily computed as the product of the individual multiplicators:

f(x)=ax,\ g(x)=bx \implies (f\circ g)(x)=a\,g(x)=abx=(ab)\cdot x=bax=g(f(x))=(g\circ f)(x)

Problem. Compute the composition of two affine functions f(x)=ax+\alpha and g(x)=bx+\beta. Prove that the slope of the composition is the product of the slopes. Is it true that they also always commute? Find an affine function g that commutes (in the sense of the composition) with any affine function f.

Obviously, linear functions are continuous, bounded on any compact set. To know a linear function, it is enough to know its value at only one point (for affine functions, two points are sufficient).

Approximation by a linear function near the origin

Let f:\mathbb R^1\to\mathbb R be a (nonlinear) function. In some (“good”) cases, the graph of such function looks almost like a straight line under sufficiently large magnification.

Example. Consider f(x)=2x+x^2: this function is obviously nonlinear, and f(0)=0, so that its graph passes through the point (0,0)\in\mathbb R^2. Let \varepsilon be a small positive number. The transformation (change of variables) X=x/\varepsilon,\ Y=y/\varepsilon magnifies the small square [-\varepsilon,+\varepsilon]^2 to the square [-1,1]. After this magnification we see that the equation of the curve becomes Y=2Y+\varepsilon X^2. Clearly, as \varepsilon \to 0^+, the magnified curve converges (uniformly on |X|\le 1) to the graph of the linear function Y=2X.

In other words, we see that

\displaystyle \frac{f(\varepsilon X)- \ell(\varepsilon X)}{\varepsilon}=\frac1\varepsilon f(\varepsilon X)-\ell (X)\to 0,

as \varepsilon \to 0, where \ell(X)=2X is the linear approximation to the function f. In particular, we can set X=1 and see that the limit f(\varepsilon)/\varepsilon exists and is equal to 2, the multiplicator of the linear function \ell.

Example. Consider the function f(x)=|x| and treat it by the same magnification procedure. Will there be any limit behavior? Will the limit function be linear?

Approximation near an arbitrary point. Derivative

What if we want to find a linear approximation to the function f(x) at a point a different from the origin, and without the assumption that f(a)=0? One has to change first the coordinates to \hat x=x-a,\ \hat y=y-f(a). In the new “hat” coordinates we can perform the same calculation and see that existence of a linear approximation for \hat f(x)=f(x-a)-f(a) is equivalent to the existence of the limit \displaystyle\frac{ f(a+\varepsilon)-f(a)}{\varepsilon} as \varepsilon\to 0.

Definition. If a\in\mathbb R is an interior point of the domain of a function f and the limit

\displaystyle f'(a)=\lim_{\varepsilon\to0}\frac{ f(a+\varepsilon)-f(a)}{\varepsilon}

exists, then the function f is called differentiable at a and the value of the limit (denoted by f'(a)) is called the derivative of f at this point. The function f':a\mapsto f(a), defined where it is defined, is called the derivative (function) of f.

Warning! Despite everything, the derivative f' is not a linear function (and even not affine!)  The value f'(a) is just the multiplicator (the principal coefficient) of the affine function \ell_a(x) which approximates f near the point a and depends on a.

Notation. There are several notations for the derivative, used in different sources and on different occasions some are more convenient than others. They include (but not reduced to):

\displaystyle f',\quad \frac{df}{dx},\quad \partial_x f,\quad Df,\quad\dots

We will explain the origin (and convenience) of some of these notations in due time.

First rules of derivation. The “derivation map” f\mapsto f' (or the “differential operator” D:f\mapsto Df) is linear: D(f\pm g)=Df\pm Dg, and D(\lambda f)=\lambda Df, assuming that f,g are differentiable on the common interval. Indeed, the sum and the multiple of affine functions approximating the initial ones, are again affine.

The derivative of an affine function D(\alpha x+\beta) is constant and equal to \alpha, since this function ideally approximates itself at each point. In particular, derivative of a constant is identical zero.

Leibniz rule. The product of two affine functions is not affine anymore, yet admits easy approximation. At the origin a=0 the product of two affine functions \ell_\alpha(x)=\alpha x+p and \ell_\beta=\beta x+q, \ \alpha,\beta,p,q\in\mathbb R, is the quadratic function Q(x)=(\alpha\beta)x^2+[\alpha q+\beta p]x+pq which is approximated by the affine function [\alpha q+\beta p]x+pq.  Note that the four constants are the values of the initial functions and their derivatives at the origin, we can write the last formula as

D(f\cdot g)(0)=f'(0)g(0)+f(0)g'(0),\quad f=\ell_\alpha,\ g=\ell_\beta.

This is called Leibnitz formula. To show that it is true for the product of any two differentiable functions, note that any such function can be written under the form f(x)=\ell(x)+o(x), where \ell(x)  is  an affine function and o(x) is a small function such that \lim_{x\to0}\frac{o(x)}x=0. If g(x) is any bounded function, and o(x) is such small function, then the linear approximation to the product g(x)o(x) is identically zero (prove it!). Use the linearity of the approximation to complete the proof of the Leibniz formula for arbitrary differentiable f,g.

Chain rule of differentiation. Let f,g be two differentiable functions and h=g\circ f their composition. Let a\in\mathbb R be an arbitrary point and b=f(a) its f-image. To compute the derivative of h at a, we replace both f,g by their affine approximation at the points a and b respectively. The composition of the affine approximations is again an affine map (see above) and its slope is equal to the product of the slopes. Thus we obtain the result

h'(a)= (g\circ f)'(a)=g'(b)\cdot f'(a)=g'(f(a))\cdot f'(a),\qquad\text{as }b=f(a).

An easy computation shows that adding the small nonlinear terms o(x-a),\ o(y-b) does not change the computation: the derivative of a composition is the product of the derivatives at the appropriate points.

Problem. Consider n differentiable functions f_1,\dots,f_n and their composition h=f_n\circ\cdots\circ f_1. Prove that h'(a_1)=f'_1(a_1)\cdot f_2'(a)\cdots f_n'(a_n), where a_k=f_k(a_{k-1}),\ k=1,2,\dots,n.

In particular, for the pair of mutually inverse functions such that g\circ f(x)\equiv x, the derivatives at the points a and b=f(a) are reciprocal.

Example. Let f(x)=x^n, n\in\mathbb N. Then by induction one can prove that f'(x)=nx^{n-1}. The inverse function g(y)=\sqrt[n]y=y^{1/n} has the derivative \displaystyle\frac1{nx^{n-1}} at the point y=x^n. Substituting x=\sqrt[n]y, we see that

\displaystyle g'(y)=\frac1{ny^{(n-1)/n}}=\mu y^{\mu-1},\qquad\text{when }\mu=\frac1n.

This allows to prove that (x^\mu)'=\mu x^{\mu-1} for all rational powers \mu\in\mathbb Q.

Blog at