Sergei Yakovenko's blog: on Math and Teaching

Monday, February 8, 2010

בחינת בית

בחינת הבית של הקורס מצורפת. אינכם חייבים לפתור את כל השאלות בכדי להצליח בבחינה, אך נסו לפתור שאלות רבות ככל האפשר (גם רעיונות שלא הצלחתם לעבד לכדי הוכחה מדוייקת מומלץ לרשום). שאלות לגבי הבחינה אפשר לשאול כאן או במייל שלי או של סרגיי. בהצלחה לכולם!


Lecture 13 (Feb 2, 2010)

Integral: antiderivation and area

If f:[a,b]\to\mathbb R is a function, can we find another differential function F:[a,b]\to\mathbb R such that the derivative of F is f? If yes, then how many such functions exist? If we have a complete record of the car speedometer, can we trace the route? compute the end point of the travel?

“Uniqueness” of solution is easy: if there are two solutions F_1(x),F_2(x) such that F'_{1,2}=f, then the derivative of their difference F=F_1-F_2 is identically zero. By the finite difference lemma, for any point x\in[a,b] there exists an intermediate point z\in[a,x] such that F(x)-F(a)=F'(z)=0, thus F is a constant.

Clearly, if F(x) is a solution, then F(x)+c,c\in\mathbb R, is also a solution. In particular, we can choose this constant so that F(a) takes any specified value.

Example. If f(x)\equiv\lambda\in\mathbb R, then F(x)=F(a)+\lambda (b-a), as one can check by the direct derivation.

Example. If f can be found in the right  hand side of any table of derivatives (eventually, with a constant coefficient), then F can be found in the same table. E.g., if f(x)=\mu x^{\mu-1},~\mu\ne 0, then F(x)=x^\mu,~\mu\ne 0. Denoting \mu-1=\nu and dividing both sides by \mu=\nu+1, we conclude that if f(x)=x^\nu,~\nu\ne -1, then F(x)=\frac1{\nu+1}\,x^{\nu+1}. The case of \nu=-1 occurs on another line in the table: if f(x)=\frac1x,~x>0, then F(x)=\ln x.

What to do if f cannot be so easily found? A good idea is to try approximating f by some simple functions and see what happens.

Example. Let [a,b]=[0,N] and assume that the function f takes the same constant value \lambda_i on the (semi)interval [i,i+1). Then one can easily check that the following function,

F(x)=\lambda_0+\lambda_1+\cdots+\lambda_{i-1}+\lambda_i(x-i),\qquad x\in [i,i+1),\quad i=0,\dots,N-1,\qquad(1)

is satisfying the following conditions:

  1. continuous on the entire segment [0,N],
  2. differentiable everywhere except the entire points 1,2,\dots,N-1,
  3. the derivative of F coincides with f,
  4. the graph of F is a broken line (קו שבור).

Instead of the cumbersome (מורכות) formula (1), one can use the equivalent verbal description:

F(x)=area under the graph of f between the vertical segments a and x.\qquad (2)

Indeed, if h is small enough, then both x and x+h belong to the same interval (i,i+1) and hence F(x+h)-F(x)=\lambda_i h, thus the derivative F'(x)=\lambda_i=f(x).

Clearly, an arbitrary interval [a,b] can be subdivided into N equal subintervals: the formula (1) will be changed, yet its meaning (2) would obviously remain the same.

This observation suggests that  the formula (2) gives the answer also in the general case, provided that the expression “the area under the graph” makes sense. Clearly, this is the case when f(x) is piecewise constant  (“the step function”, פונקציית מדרגה, as above) and even when f is piecewise linear (קו שבור). Note that we allow only for step functions having only finitely many intervals of continuity, to avoid summing infinitely many areas of rectangles!

To define the area in the general case, we appeal to the following intuitively clear observation: if X\subseteq Y are two nested subsets of the plane, then the areas of the sets, if it is well defined, should satisfy the inequality s(X)\leq s(Y). For polygons this is an elementary theorem.

Let f:[a,b]\to\mathbb R be a bounded function on a finite interval. For any step function g_i(x):[a,b]\to\mathbb R which is everywhere less or equal to f, g_-(x)\le f(x), the area s(g_-) under the graph of g_- is called a lower sum for f on [a,b]. In a similar way, an upper sum for f is the area under the graph of any step function g_+(x) such that f(x)\le g_+(x) on [a,b]. Clearly, for any two such functions g_-,g_+, the inequality s(g_-)\le s(g_+) holds, thus any lower sum is less or equal to any upper sum.

Definition. A function  f:[a,b]\to\mathbb R is called integrable on the segment [a,b], if there exists a single number s=s(f) which separates the lower and the upper sums, so that s_-\le s \le s_+ for any pair of lower/upper sums s_\pm=s(g_\pm). This number is called the integral (האינטגרל המסוים) of the function f(x) on the interval [a,b] and denoted by \displaystyle \int_a^b f(x)\,dx.♦

An equivalent definition of integrability requires that for any positive \varepsilon>0 there exist an upper and a lower sum which differ by no more than \varepsilon:

\forall\varepsilon>0~~\exists g_-,g_+\text{ step functions }:g_-(x)\le f(x)\le g_+(x),~~s(g_+)-s(g_-)<\varepsilon.

Exercise. Prove that the two definitions are indeed equivalent.

Exercise. Prove that the function f(x)=px+q,~p,q\in\mathbb R, is integrable (in the above sense) on any interval and its integral is equal to the (geometrically calculated) area of the corresponding trapeze (טרפז).

Proposition. Any monotone (bounded) function is integrable.

Proof. Consider the partition of [a,b] into N equal parts by the points a=x_0<x_1<\cdots<x_N=b and denote \lambda_i=f(x_i) the values of f at these points. Then the two step functions, g_-(x)=\lambda_{i},~~x\in [x_i,x_{i+1}) and g_+(x)=\lambda_{i+1},~~x\in [x_i,x_{i+1}) squeeze f between them. The lower sum is s_-=\frac1N(\lambda_0+\lambda_1+\cdots+\lambda_{N_1}) and s_+=\frac1N(\lambda_1+\cdots+\lambda_N). Their difference is less or equal to (\lambda_N-\lambda_0)/N and it  becomes less than any positive number \varepsilon when N is large enough.

Theorem. Any continuous function (on the closed interval) is integrable.

Proof. The idea is to construct two step functions with the common set of jump points a=x_0<x_1<\cdots<x_N=b such that on each interval [x_i,x_{i+1}) these functions squeeze f between themselves and differ by less than a given $latex\varepsilon>0$. The points x_i should be close enough to each other so that the continuity of f implies the required proximity of the corresponding values.

More precisely, for any point c\in[a,b] there exists a small interval latex U_c\subseteq[a,b] containing c such that on this interval the function is squeezed between f(c)-\frac12\varepsilon and f(c)+\frac12\varepsilon.  The union of all these small intervals covers [a,b] which is compact. Hence there exists a finite covering of [a,b] by these intervals. The endpoints of these intervals subdivide [a,b] into finitely many intervals U_i in such a way that on each interval the function f is squeezed between two constants \lambda_i\le f(x) \le \lambda_+ such that \lambda_+-\lambda_-<\varepsilon. The corresponding upper and lower sum differ by no more than \sum (\lambda_+-\lambda_-)(x_{i+1}-x_i)\le \varepsilon \sum (x_{i+1}-x_i)\le \varepsilon (b-a). This difference can be as small as required if \varepsilon is chosen small enough. ♦

Clearly, continuity is sufficient but not necessary assumption for integrability: the step functions are by definition integrable, though they are discontinuous. The accurate description of all integrable functions goes beyond the scope of these notes, yet there are certainly many non-integrable functions.

Example. The Dirichlet function is non-integrable on [0,1]. Indeed, any upper sum must be at least 1, and any lower sum at most 0, hence the gap between these values cannot be bridged (complete all details of this proof!).


  1. Prove that any function on [0,1] which differs from identical zero at only finitely many points, is integrable and its integral is zero, no matter what are the values at these points.
  2. Prove that a function that differs from identical zero at countably many points x_1,x_2,\dots,x_n,\dots, is integrable and the integral is zero if \lim_{k\to\infty}f(x_k)=0. Can one drop the limit assumption?
  3. Prove that the Riemann function equal to f(x)=1/q at the rational points x=p/q (assuming p,q mutually prime) and zero at irrational points, is integrable on [0,1] and its integral is zero.

Problem. Is the function f(x)=\sin \frac1x integrable on [0,1]? (Do not try to compute the integral :-))

Sin(1/x) for x near the origin

The Newton-Leibniz formula

Quite obviously, if f:[a,b]\to\mathbb R^1 is integrable on [a,b], then it is also integrable on any sub-interval [a,c], a\le c \le b. The formula (2) relating the area (definite integral) with the antiderivative F of an integrable function f will take the form

\displaystyle F(c)=F(a)+\int_a^c f(x)\,dx \iff \int_a^c f(x)\,dx=F(c)-F(a).

This allows to express the area under the graph of a function in terms of its antiderivative (if the latter exists and is known).

However, integrability is a weaker condition than existence of the antiderivative: if f is a step function with the partition points x_1,\dots,x_N, then the corresponding area function F(z)=\int_0^z f(x)\,dx is continuous but differentiable only everywhere except these points.

Theorem. If f(x) is continuous at a point c\in[a,b], then the area function F(z)=\int_a^z f(x)\,dx is differentiable at c and F'(c)=f(c). ♦

Change of the independent variable in the integrals

If f(x) is a function defined on the interval x\in [a,b], and z is a new variable which is obtained from x by a monotone differentiable transformation, z=h(x), then this transformation maps bijectively (1-1-way) the interval [a,b] into the interval [h(a),h(b)]. The function $f(x)$ becomes after such change a new function g(z) of the new variable: g(z)=f(x) if and only if z=h(x), i.e., the two functions take equal values at the two points “connected” by the transformation h.

The “formal” relationship between these functions is easier written “in the opposite direction”, expressing f via g:

f(x)=g(h(x))=(g\circ h)(x).

The graphs of functions of f and g are obtained from each other by a “non-uniform stretch along the horizontal axis” which keeps the vertical direction. In particular, any step function with the partition points x_1<x_2\cdots<x_N will be transformed into the step function with the partition points z_1<z_2<\cdots<z_N, z_i=h(x_i), with the same values. Moreover, if f(x) is squeezed between two step functions, f_-\le f\le f_+, then its transform is squeezed between the transforms g_\pm of these functions.

Thus it is sufficient to study how the change of variables affects areas under step functions, which are equal to finite sums \sum_1^N \lambda_i(x_{i}-x_{i-1}) and their transforms \sum_1^N \lambda_i(z_i-z_{i-1}). The heights \lambda_i are unchanged, and the widths z_i-z_i by the finite difference lemma are equal to the initial widths multiplied by the derivative h'(c_i) computed at some intermediate points c_i\in[x_{i-1},x_i]. The result is as if instead of the step function f(x) we would integrate the function f(x)\cdot h'(x). Passing to limit, we conclude that

\displaystyle \int_a^b g(h(x))\,h'(x)\,dx=\int_{h(a)}^{h(b)}g(z)\,dz.

Change of independent variable and integral of a step function

Of course, this is equivalent to the chain rule of differentiation for the primitive functions: if F(X)=\int_a ^X f(x)\,dx and G(Z)=\int_{h(a)}^Z g(z)\,dz, then F(X)=G(h(X)) and F'(X)=G'(h(X))\cdot h'(X).

The formula for change of variables of integrals can be easily memorized using the existing notation: in the formula \int g(z)\,dz one has to transform not just the integrand g(z) by substituting z=h(x), but also the differential dz should be transformed using the formula dz=h'(x)\,dx.

Monday, February 1, 2010

Lecture 12 (Jan 26, 2010)

Higher derivatives

In this lecture we return back to functions of one variable, defined on an open or closed interval on the real axis.

Definition. A function f:[a,b]\to\mathbb R is called twice (two times) differentiable, if it is differentiable at all points of the segment [a,b]\subset\mathbb R^1, and the derivative g=f' considered as a scalar (numeric) function on this segment, is also differentiable.

Iterating this construction, we say that a function is k times differentiable, if it is differentiable and its derivative is k-1 times differentiable. This is an inductive definition.

Variations. Sometimes it is convenient to say that a function is 0 times differentiable, if it is continuous on the segment. If this agreement is used as the base of induction, then this would define a slightly more restricted classes of k times differentiable functions, usually denoted by C^k[a,b].


  1. Give non-polynomial example of a function which is infinitely many times differentiable.
  2. Give example of a function that has exactly 7 derivatives, but not 8, on the segment [-1,1].

As was already established, existence of the first derivative allows to construct the linear approximation for a given function. If this approximation is not degenerate (i.e., the derivative is non-vanishing), it allows to study the extremal properties of the function, in particular,

  1. Guarantee a certain type of the extremum at the endpoints a,b of the segment;
  2. Guarantee the absence of extremum at the interior points of the interval (a,b).

It turns out that higher order derivatives allow to construct approximation of functions by polynomials of higher order, and this approximation sometimes guarantees presence/type or absence of extremum. In cases it does not, one needs more of the same.

Theorem. If a function f(x) is n times differentiable on a segment [a,b] containing the origin x=0, then there exists a unique polynomial p(x)=c_0+c_1x+\cdots+c_n x^n of degree \le n which gives an approximation of f at the origin with the accuracy o(x^n),

\displaystyle f(x)=p(x)+o(x^n) \iff \lim_{x\to 0}\frac{f(x)-p(x)}{x^n}=0.

The coefficients c_k of this polynomial are proportional to the respective higher derivatives f^{(k)}(0) at the origin, \displaystyle c_k=\frac{f^{(k)}(0)}{k!},~k=0,1,\dots,n.

Remark. There is nothing mysterious in the formulas: if f(x)=p_n(x)=c_0+c_1x+\cdots +c_n x^n is a polynomial itself, then the higher order derivatives can be easily computed for each monomial separately:

\displaystyle (x^i)^{(k)}(0)=\left.i(i-1)\cdots(i-k+1) x^{i-k}\right|_{x=0}=k!\quad\text{if }i=k\text{ and }0\text{ otherwise}.

This proves the formulas for the coefficients c_k via the derivatives of the polynomial.

For the proof we need the following lemma, which is very much like the intermediate value theorem.

Lemma (on finite differences). For a function differentiable on the interval [a,b] the normalized finite difference \lambda=\frac{f(b)-f(a)}{b-a} coincides with the derivative f'(c) at some intermediate point c\in[a,b].

Proof of the Lemma. Consider the auxiliary function g(x)=f(x)-\lambda x. Then g(a)=g(b) and the same finite difference for this function is equal to zero. Since the function takes equal values at the endpoints, either its maximum or its minimum is achieved at some point c inside the interval (a,b). By the Fermat rule, g'(c)=0. By construction,f'(c)=\lambda.

Proof of the Theorem. The formulas for the coefficients imply that the difference h(x)=f(x)-p_n(x) is a function which is n times differentiable and all its derivatives vanish at the origin. We have to prove that \lim_{x\to 0}|h(x)|/x^n=0.

For n=1 this is obvious: by definition of differentiability, for function with zero derivative we have h(x)=h(0)+h'(0)x+o(x)=o(x). Reasoning by induction, consider the function h and its derivative g=h'. By the inductive assumption, |g(x)|\le \varepsilon x^{n-1} for an arbitrary \varepsilon>0 on a sufficiently small interval.

By the Lemma, f(x)=f(x)-f(0)=g(c)x for some c\in (0,x). Using the inequality |g(c)|\le \varepsilon |c|^{n-1}\le \varepsilon |x|^{n-1}, we conclude that |f(x)|\le\varepsilon |x|^n. Since \varepsilon can be arbitrary small, we conclude that the limit \lim |f(x)|/|x|^n is smaller than any number, i.e., must be zero.

Definition. The polynomial p_n(x) above is called the Taylor polynomial of order n for the function f at the origin x=0. One can easily modify this to become the definition if the Taylor polynomial at an arbitrary point x=c.

Application to the investigation of functions

Any n times differentiable function can be written (near the origin, but this is not a restriction!) as its Taylor polynomial of degree n plus an error term which is fast decreasing (faster than the highest degree term of the polynomial).   Hence this term cannot affect the extremal properties of the polynomial. In particular, if n=2 and the Taylor polynomial has a minimum (maximum) at x=0, so does the function itself.For quadratic polynomials the horns of the parabola go up (resp., down) if its principal coefficient is positive (resp., negative). This immediately proves the following result.

Second-order sufficient condition.If f is a twice differentiable function and x=c is a critical point, f'(c)=0, then the following holds:

  1. If f''(c)>0, then c is a local minimum,
  2. If f''(c)<0, then c is a local maximum,
  3. If f''(c)=0, everything depends on the Taylor polynomial of degree 3.

Problem. Find conditions for a degree 3 polynomial p(x)=\alpha x^3 to have a local maximum/minimum on (a) interval |x|<\delta, (b) semi-interval 0\le x\le \delta.  Formulate the third order necessary/sufficient conditions for extremum in the interior point (resp., left/right endpoint) of the domain of a non-polynomial function f.

Lecture 11 (Jan 19, 2010)

Functions of two variables and their derivation. Application to the study of planar curves.

The idea of linear approximation (and differentiability) can be easily adopted for functions of more than one variable. Apart from the usual (scalar) functions f:\mathbb R^1\to\mathbb R^1 we will consider applications  \mathbb R^1\to\mathbb R^2 (parametrized curves) and functions \mathbb R^2\to\mathbb R^1 of two variables.

A curve t\mapsto (x(t),y(t)) can be considered in kinematic (mechanical) terms as the description of a travel in the plane with coordinates x,y, parametrized by the time t. The notions of limit, continuity, differentiability are defined as the corresponding properties of the coordinate functions t\mapsto x(t) and t\mapsto y(t).


  1. The graph of any continuous (differentiable) function f of one variable is a continuous (resp., differentiable) curve t\mapsto (t,f(t)).
  2. The circle x^2+y^2=1 is the image of the curve t\mapsto (\cos t,\sin t). Note that this image is “covered” infinitely many times: preimage of any point on the circle is an entire arithmetical progression with the difference 2\pi.
  3. The cusp (חוֹד) y^2=x^3 is the image of the differentiable curve t\mapsto (t^2,t^3). Yet the cusp itself is not “smooth-looking”:

    Cusp point

The “linear approximation” at a point \tau\in\mathbb R^1 takes the unit “positive” vector to \mathbb R^1, attached to the point \tau, to the vector (v,w)\in\mathbb R^2, tangent to the plane at the point (a,b)=(x(\tau),b(\tau)), with the coordinates v=\frac{dx}{dt}(\tau),~w=\frac{dy}{dt}(\tau). The vector (v,w)\in\mathbb R^2 is called the velocity vector, or the tangent vector to the curve.

When a differentiable parametrized curve is smooth? and what is smoothness?

Definition. A planar curve is smooth at a point (a,b)\in\mathbb R^2, if inside some sufficiently small square |x-a|<\delta,~|y-b|<\delta this curve can be represented as the graph of a function y=f(x) or x=g(y) differentiable at the point a (resp., b).

Proposition. A differentiable (parametrized) curve with nonzero velocity vector is smooth at the corresponding point.

Proof. Assume that v\ne 0; then the function t\mapsto x(t) has nonzero derivative and hence is locally invertible, so that t can be expressed as a differentiable function of x, t=\phi(x). Then the curve is the graph of the function y(\phi(x).

Problem. Give an example of a differentiable (parametrized) curve with zero velocity at some point, which is nonetheless smooth.

Differentiability of functions of two variables

While the definitions of the limit and continuity for a function of two variables are rather straightforward (open intervals |x-a|<\delta in the one-variable definition should be replaced by open squares |x-a|<\delta,~|y-a|<\delta with the rest staying the same), the notion of differentiability cannot so easily be reduced to one-dimensional case.

We want the function of two variables f(x,y) be approximable by a linear map near the point (a,b)\in\mathbb R^2. This means that there exist two constants A,B (coordinates of the approximating map) such that the difference is fast going to zero,

\displaystyle [f(x,y)-f(a,b)]-[A(x-a)+B(y-b)]=o(x,y),\qquad \frac{o(x,y)}{|x-a|+|y-b|}\to0\quad\text{as }(x,y)\to(a,b).

Denote by dx (resp., dy the linear functions which take the value v (resp., w) on the vector with the coordinates (v,w)\in\mathbb R^2. Then the linear map df, approximating the function f, can be written as df=A\,dx+B\,dy. To compute the coefficients A,B, consider the functions g(y)=f(a,y) and h(x)=f(x,b) of one argument each. Then A=g'(b), B=h'(a) are partial derivatives of f with respect to the variables x,y at the point (a,b).

Definition. For a function of two variables f(x,y) the partial derivative with respect to the variable x at a point (a,b)\in\mathbb R^2  is the limit (if it exists)

\displaystyle \lim_{h\to 0}\frac{f(a+h,b)-f(a,b)}{h}=\frac{\partial f}{\partial x}(a,b)=f_x(a,b)=D_xf(a,b)=\cdots.

Example. Consider the function f(x,y)=x^2+y^2. Its differential is equal to 2x\,dx+2y\,dy and at a point (a,b) on the unit circle it vanishes on the vector with coordinates (-b,a) tangent to the level curve f(x,y)=\text{const} (circle) passing through the point (a,b).

Functions of two variables are nicely represented by their level curves (קווי גובה). Typically these look like smooth curves, but eventually one can observe “singularities”.

Examples. Draw the level curves of the functions f(x,y)=x^2+y^2 and g(x,y)=x^2-y^2. In the second case you may help yourself by changing variables from x,y to X,Y as follows, X=x-y,~Y=x+y and look at the picture below,

Definition. A point (a,b)\in\mathbb R^2 inside the domain of a differentiable function F(x,y) is called critical, if the linear approximation (the differential) is identically zero, i.e., if both partial derivatives \frac{\partial F}{\partial x}(a,b),~\frac{\partial F}{\partial y}(a,b) are zeros. Otherwise the point is called regular.

If we replace a nonlinear function F(x,y) by its affine approximation \ell(x,y)=c+A(x-a)+B(y-b), c=f(a,b),~A=\frac{\partial F}{\partial x}(a,b),~B=\frac{\partial F}{\partial y}(a,b) at a regular point, then the level curves of \ell form a family of parallel lines. It turns out that for regular points these lines give a linear approximation for the (nonlinear) level curves of the initial nonlinear function. In other words, the regularity condition (a form of nondegeneracy)  is the condition which guarantees that the linear approximation works nicely also when studying the level curves.

Theorem. If a point (a,b) is regular for a differentiable function F, then there exists a small square around (a,b) such that the piece of the level curve of F passing through this point is a smooth curve tangent to the level line of the affine approximation \ell.

Zero level curve of a nonlinear function and the zero level line of its affine approximation

This claim is often called the Implicit Function Theorem  (משפט הפונקציות הסתומות), however,  the traditional formulation of the Implicit Function Theorem looks quite different!

Theorem. If at a point (a,b) the partial derivative \frac{\partial F}{\partial y}(a,b)  of a differential function F is nonzero, then the equation F(x,y)=c, c=F(a,b)\in\mathbb R, can be locally resolved with respect to y: there exists a differentiable function f(x) defined on a sufficiently small interval |x-a|<\delta, such that F(x,f(x))\equiv0 and f(a)=b. The derivative f'(a) of this function is given by the ratio,

\displaystyle f'(a)=-\frac{\frac{\partial F}{\partial x}(a,b)}{\frac{\partial F}{\partial y}(a,b)}.

Exercise. Prove that the two formulations of the Implicit Function Theorem are in fact equivalent.

Create a free website or blog at