Sergei Yakovenko's blog: on Math and Teaching

Tuesday, December 29, 2015

Lecture 10, Dec 29, 2015

Elementary transcendental functions as solutions to simple differential equations

The way how logarithmic, exponential and trigonometric functions are usually introduced, is not very satisfactory and appears artificial. For instance, the mere definition of the non-integer power x^a, a\notin\mathbb Z, is problematic. For a=1/n,\ n\in\mathbb N, one can define the value as the root \sqrt[n]x, but the choice of branch/sign and the possibility of defining it for negative x is speculative. For instance, the functions x^{\frac12} and x^{\frac 24} may turn out to be different, depending on whether the latter is defined as \sqrt[4]{x^2} (makes sense for negative x) or as (\sqrt[4]x)^2 which makes sense only for positive x. But even if we agree that the domain of x^a should be restricted to positive arguments only, still there is a big question why for two close values a=\frac12 and a=\frac{499}{1000} the values, say, \sqrt 2 and \sqrt[1000]{2^{499}} should also be close…

The right way to introduce these functions is by looking at the differential equations which they satisfy.

A differential equation (of the first order) is a relation, usually rational, involving the unknown function y(x), its derivative y'(x) and some known rational functions of the independent variable x. If the relation involves higher derivatives, we say about higher order differential equations. One can also consider systems of differential equations, involving several relations between several unknown functions and their derivatives.

Example. Any relation of the form P(x, y)=0 implicitly defines y as a function of x and can be considered as a trivial equation of order zero.

Example. The equation y'=f(x) with a known function f is a very simple differential equation. If f is integrable (say, continuous), then its solution is given by the integral with variable upper limit, \displaystyle y(x)=\int_p^x f(t)\,\mathrm dt for any meaningful choice of the lower limit p. Any two solutions differ by a constant.

Example. The equation y'=a(x)y with a known function a(x). Even the case where a(x)=a is a constant, there is no, say, polynomial solution to this equation (why?), except for the trivial one y(x)\equiv0. This equation is linear: together with any two functions y_1(x),y_2(x) and any constant \lambda, the functions \lambda y_1(x) and y_1(x)\pm y_2(x) are also solutions.

Example. The equation y'=y^2 has a family of solutions \displaystyle y(x)=-\frac1{x-c} for any choice of the constant c\in\mathbb R (check it!). However, any such solution “explodes” at the point x=c, while the equation itself has no special “misbehavior” at this point (in fact, the equation does not depend on x at all).


The transcendental function y(x)=\ln x satisfies the differential equation y'=x^{-1}: this is the only case of the equation y'=x^n,\ n\in\mathbb Z, which has no rational solution. In fact, all properties of the logarithm follow from the fact that it satisfies the above equation and the constant of integration is chosen so that y(1)=0. In other words, we show that the function defined as the integral \displaystyle \ell(x)=\int_1^x \frac1t\,\mathrm dt possesses all what we want. We show that:

  1. \ell(x) is defined for all x>0, is monotone growing from -\infty to +\infty as x varies from 0 to +\infty.
  2. \ell(x) is infinitely differentiable, concave.
  3. \ell transforms the operation of multiplication (of positive numbers) into the addition: \ell(\lambda x)=\ell(\lambda)+\ell(x) for any x,\lambda>0.


The above listed properties of the logarithm ensure that there is an inverse function, denoted provisionally by E(x), which is inverse to \ell:\ \ell(E(x))=x. This function is defined for all real x\in\mathbb R, takes positive values and transforms the addition to the multiplication: E(\lambda+x)=E(\lambda)\cdot E(x). Denoting the the value E(1) by e, we conclude that E(n)=e^n for all n\in\mathbb Z, and E(x)=e^x for all rational values x=\frac pq. Thus the function E(x), defined as the inverse to \ell, gives interpolation of the exponent for all real arguments. A simple calculation shows that E(x) satisfies the differential equation y'=y with the initial condition y(0)=1.


Consider the integral operator \Phi which sends any (continuous) function f:\mathbb R\to\mathbb R to the function g=\Phi(f) defined by the formula \displaystyle g(x)=f(0)+\int_0^x f(t)\,\mathrm dt. Applying this operator to the function E(x) and using the differential equation, we see that E is a “fixed point” of the transformation \Phi: \Phi(E)+E. This suggests using the following approach to compute the function E: choose a function f_0 and build the sequence of functions f_n=\Phi(f_{n-1}), n=1,2,3,4,\dots. If there exists a limit f_*=\lim f_{n+1}=\lim \Phi(f_n)=\Phi(f_*), then this limit is a fixed point for \Phi.

Note that the action of $\Phi$ can be very easily calculated on the monomials: \displaystyle \Phi\biggl(\frac{x^k}{k!}\biggr)=\frac{x^{k+1}}{(k+1)!} (check it!). Therefore if we start with f_0(x)=1, we obtain the functions $\latex f_n=1+x+\frac12 x^2+\cdots+\frac1{n!}x^n$. This sequence converges to the sum of the infinite series \displaystyle\sum_{n=0}^\infty\frac1{n!}x^n which represents the solution E(x) on the entire real line (check that). This series can be used for a fast approximate calculation of the number e=E(1)=\sum_0^\infty \frac1{n!}.

Differential equations in the complex domain

The function E(ix)=e^{ix} satisfies the differential equation y'=\mathrm iy. The corresponding “motion on the complex plane”, x\mapsto e^{\mathrm ix}, is rotation along the (unit) circle with the unit (absolute) speed, hence the real and imaginary parts of e^{\mathrm ix} are cosine and sine respectively. In fact, the “right” definition of them is exactly like that,

\displaystyle \cos x=\textrm{Re}\,e^{\mathrm ix},\quad \sin x=\textrm{Im}\,e^{\mathrm ix} \iff e^{\mathrm ix}=\cos x+\mathrm i\sin x,\qquad x\in\mathbb R.

Thus, the Euler formula “cis” in fact is the definition of sine and cosine. Of course, it can be “proved” by substituting the imaginary value into the Taylor series for the exponent, collecting the real and imaginary parts and comparing them with the Taylor series for the sine and cosine.

In fact, both sine and cosine are in turn solutions of the real differential equations: derivating the equation y'=\mathrm iy, one concludes that y''=\mathrm i^2y=-y. It can be used to calculate the Taylor coefficients for sine and cosine.

For more details see the lecture notes.

Not completely covered in the class: solution of linear equations with constant coefficients and resonances.

Sunday, December 27, 2015

Lecture 9, Dec 22, 2015

Integral and antiderivative

  1. Area under the graph as a paradigm
  2. Definitions (upper and lower sums, integrability).
  3. Integrability of continuous functions.
  4. Newton-Leibniz formula: integral and antiderivative.
  5. Elementary rules of antiderivation (linearity, anti-Leibniz rule of “integration by parts”).
  6. Anti-chain rule, change of variables in the integral and its geometric meaning.
  7. Riemann–Stieltjes integral and change of variables in it.
  8. Integrability of discontinuous functions.

Not covered in the class: Lebesgue theorem and motivations for transition from Riemann to the Lebesgue integral.

The sketchy notes are available here.

Saturday, December 19, 2015

Lecture 8, Dec 15

Higher derivatives and better approximation

We discussed a few issues:

  • Lagrange interpolation formula: how to estimate the difference f(b)-f(a) through the derivative f'?
  • Consequence: vanishing of several derivatives at a point means that a function has a “root of high order” at this point (with explanation, what does that mean).
  • Taylor formula for polynomials: if you know all derivatives of a polynomial at some point, then you know it everywhere.
  • Peano formula for C^n-smooth functions: approximation by the Taylor polynomial with asymptotic bound for the error.
  • Lagrange formula: explicit estimate for the error.

The notes (updated) are available here.

Saturday, December 12, 2015

Lecture 7, Dec 8, 2015

Differentiability and derivative

Continuity of functions (and maps) means that they can be nicely approximated by constant functions (maps) in a sufficiently small neighborhood of each point. Yet the constant maps (easy to understand as they are) are not the only “simple” maps.

Linear maps

Linear maps naturally live on vector spaces, sets equipped with a special structure. Recall that \mathbb R is algebraically a field: real numbers cane be added, subtracted between themselves and the ratio \alpha/\beta is well defined for \beta\ne0.

Definition. A set V is said to be a vector space (over \mathbb R), if the operations of addition/subtraction V\owns u,v\mapsto u\pm v and multiplication by constant V\owns v,\ \mathbb R\owns \alpha\mapsto \alpha v are defined on it and obey the obvious rules of commutativity, associativity and distributivity. Some people prefer to call vector spaces linear spaces: the two terms are identical.

Warning. There is no “natural” multiplication V\times V\to V!


  1. The field \mathbb R itself. If we want to stress that it is considered as a vector space, we write \mathbb R^1.
  2. The set of tuples \mathbb R^n=(x_1,\dots,x_n),\ x_i\in\mathbb R is the Euclidean n-space. For n=2,3 it can be identified with the “geometric” plane and space, using coordinates.
  3. The set of all polynomials of bounded degree \leq d with real coefficients.
  4. The set of all polynomials \mathbb R[x] without any control over the degree.
  5. The set C([0,1]) of all continuous functions on the segment [0,1].

Warning. The two last examples are special: the corresponding spaces are not finite-dimensional (we did not have time to discuss what is the dimension of a linear space in general…)

Let V,Z be two (different or identical) vector spaces and f:V\to W is a function (map) between them.
Definition. The map $f$ is linear, if it preserves the operations on vectors, i.e., \forall v,w\in V,\ \alpha\in\mathbb R,\quad f(v+w)=f(v)+f(w),\ f(\alpha v)=\alpha f(v).

Sometimes we will use the notation V\overset f\longrightarrow Z.

Obvious properties of linearity.

  • f(0)=0 (Note: the two zeros may lie in different spaces!)
  • For any two given spaces V,W the linear maps between them can be added and multiplied by constants in a natural way! If V\overset {f,g}\longrightarrow W, then we define (f+g)(v)=f(v)+g(v) for any v\in V (define \alpha f yourselves). The result will be again a linear map between the same spaces.
  • If V\overset f\longrightarrow W and W\overset g\longrightarrow Z, then the composition g\circ f:V\overset f\longrightarrow W\overset g\longrightarrow Z is well defined and again linear.


  1. Any linear map \mathbb R^1\overset f\longrightarrow \mathbb R^1 has the form x\mapsto ax, \ a\in\mathbb R (do you understand why the notations \mathbb R, \mathbb R^1 are used?)
  2. Any linear map \mathbb R^n\overset f\longrightarrow \mathbb R^1 has the form (x_1,\dots,x_n)\mapsto a_1x_1+\cdots+a_nx_n for some numbers a_1,\dots,a_n. Argue that all such maps form a linear space isomorphic to \mathbb R^n back again.
  3. Explain how linear maps from \mathbb R^n to \mathbb R^m can be recorded using n\times m-matrices. How the composition of linear maps is related to the multiplication of matrices?

The first example shows that linear maps of \mathbb R^1 to itself are “labeled” by real numbers (“multiplicators“). Composition of linear maps corresponds to multiplication of the corresponding multiplicators (whence the name). A linear 1-dim map is invertible if and only if the multiplicator is nonzero.

Corollary. Invertible linear maps \mathbb R^1\to\mathbb R^1 constitute a commutative group (by composition) isomorphic to the multiplicative group \mathbb R^*=\mathbb R\smallsetminus \{0\}.


Maps of the form V\to V, \ v\mapsto v+h for a fixed vector h\in V (the domain and source coincide!) are called shifts (a.k.a. translations). Warning: The shifts are not linear unless h=0! Composition of two shifts is again a shift.

Prove that all translations form a commutative group (by composition) isomorphic to the space V itself. (Hint: this is a tautological statement).

Affine maps

A map f:V\to W between two vector spaces is called affine, if it is a composition of a linear map and translations.

Any affine map \mathbb R^1\to\mathbb R^1 has the form x\mapsto ax+b for some a,b\in\mathbb R. Sometimes it is more convenient to write the map under the form x\mapsto a(x-c)+b: this is possible for any point c\in\mathbb R^1. Note that the composition of affine maps in dimension 1 is not commutative anymore.

Key computation. Assume you are given a map f:\mathbb R^1\to\mathbb R^1 in the sense that you can evaluate it at any point c\in\mathbb R^1. Suppose an oracle tells you that this map is affine. How can you restore the explicit formula f(x)=a(x-c)+b for f?

Obviously, b=f(c). To find \displaystyle a=\frac{f(x)-b}{x-c}, we have to plug into it any point x\ne c and the corresponding value f(x). Given that b=f(c), we have \displaystyle a=\frac{f(x)-f(c)}{x-c} for any choice of x\ne c.

The expression a_c(x)=\displaystyle \frac{f(x)-f(c)}{x-c} for a non-affine function f is in general not-constant and depends on the choice of the point x.

Definition. A function f:\mathbb R^1\to\mathbb R^1 is called differentiable at the point c, if the above expression for a_c(x), albeit non-constant, has a limit as x\to c:\ a_c(x)=a+s_c(x), where s_c(x) is a function which tends to zero. The number a is called the derivative of f at the point c and denoted by f'(c) (and also by half a dozen of other symbols: \frac{df}{dx}(c),Df(c), D_xf(c), f_x(c), …).

Existence of the limit means that near the point c the function f admits a reasonable approximation by an affine function \ell(x)=a(x-c)+b: f(x)=\ell(x)+s_c(x)(x-c), i.e., the “non-affine part” s_c(x)\cdot (x-c) is small not just by itself, but also relative to small difference x-c.

Differentiability and algebraic operations

See the notes and their earlier version.

The only non-obvious moment is differentiability of the product: the product (unlike the composition) of affine functions is not affine anymore, but is immediately differentiable:

[b+a(x-c)]\cdot[q+p(x-c)]=pq+(aq+bp)(x-c)+ap(x-c)^2, but the quadratic term is vanishing relative to x-c, so the entire sum is differentiable.

Exercise. Derive the Leibniz rule for the derivative of the product.

Derivative and the local study of functions

Affine functions have no (strong) maxima or minima, unless restricted on finite segments. Yet absence of the extremum is a strong property which descends from the affine approximation to the original function. Details here and here.

Sunday, December 6, 2015

Lecture 6, Dec 1, 2015

Two properties preserved by continuous maps: compactness and connectivity

These two properties are key to existence of solutions to infinitely many problems in mathematics and physics.


Compactness (of a subset A\subseteq \mathbb R^n) is the “nearest approximation” to finiteness of A. Obviously, if A is a finite set of points, then

  1. Any infinite sequence \{a_n\}_{n=1}^\infty\subseteq A has an infinite stationary (constant) subsequence;
  2. A is bounded and closed;
  3. If \bigcup_\alpha U_\alpha\supseteq A is an arbitrary covering of A by open subsets U_\alpha, then one can always choose a finite subcovering U_{\alpha_1}\cup\cdots\cup U_{\alpha_N}\supseteq A.

The first two properties are obvious, the third one also. For each point a_1,\dots,a_N\in A it is enough to find just one open set U_{\alpha_i} which covers this point. Their union (automatically finite) covers all of A.

Definition.The following three properties of a set A\subseteq \mathbb R^n are equivalent:

  1. Any infinite sequence \{a_n\}_{n=1}^\infty\subseteq A has a partial limit (i.e., the limit of an infinite subsequence), which is again in A;
  2. A is bounded and closed;
  3. If \bigcup_\alpha U_\alpha\supseteq A is an arbitrary covering of A by open subsets U_\alpha, then one can always choose a finite subcovering U_{\alpha_1}\cup\cdots\cup U_{\alpha_N}\supseteq A.

Example. The closed segment, say, [0,1]\subset\mathbb R^1 possesses all three properties.

  1. The standard trick of division into halves and choosing each time the half that contains infinitely many members of the sequence allows to construct a partial limit for any sequence confined to [0,1].
  2. Obvious.
  3. Assume (by contradiction) that there exists a very perverse covering of [0,1], which does not allow for a choice of finite subcovering. Then at least one of the two segments, [0,\frac12],\ [\frac12,1], also suffers from the same problem (if both admit finite subcovering, one would easily construct a finite subcovering for the initial segment [0,1]). Continuing this way, we construct an infinite nested sequence of closed intervals which do not admit a finite subcovering. Their intersection is a point a\in[0,1] which must be covered by at least one open set. But then this set covers also all sufficiently small segments from our nested sequence. Contradiction.

Problem. Prove (using the Example) that the three conditions are indeed equivalent. Hint: any bounded set can be confined to a cube x_i\in [-C_i,C_i],\ i=1,\dots, n. Use the closedness of A to prove that the partial limit of any sequence is again in A.

Theorem. If f\colon A\to \mathbb R^m is a continuous map and A is compact, than f(A) is also compact.

Corollary. Any continuous function restricted on a compact is bounded and attains its extremal values.


A subset A\subseteq \mathbb R^n is called connected, if it cannot be split into two disjoint parts “apart from each other”. How this can be formalized?

Example (proto-Definition). A subset A\subseteq [0,1] is called connected, if together with any two points a,b\in A it contains all points x such that a\le x\le b.

All connected subsets of the real line can be easily described (Do it!).

How can we treat subsets A\subseteq \mathbb R^n for n>1? Two ways can be suggested.

Definition. A set A\subseteq \mathbb R^n is called path connected, if for any two points a,b\in A there exists a continuous map f\colon [0,1]\to A such that f(0)=a,\ f(1)=b.

This definition mimics the one-dimensional construction. However, this is not the only possibility to say that a set cannot be split into smaller parts.

Definition. A subset A\subseteq \mathbb R^n is called disconnected, if there exist two open disjoint sets U_1,U_2\subseteq\mathbb R^n, \ U_1\cap U_2=\varnothing, such that the two parts A\cap U_i, \ i=1,2 are both nonempty. If such partition is impossible, then A is called connected.

Problem. Prove that for subsets on the real line the two definitions coincide.

Problem. Consider the subset of the plane A which consists of the graph y=\sin \frac1x,\ x>0 and the point (0,0). Prove that it is connected but not path connected.

Further reading

Chapter 3 from Abbot, Understanding Analysis. Especially sections 3.2 (open/closed sets), 3.3 (compact sets) and 3.4 (connected sets). Pay attention to the exercises!

Create a free website or blog at