Chapter 1 covered mathematical theorems from calculus and some definitions:

A an example of a $C^1$ function is $F(x) = \int_a^x f(u) du$ when $f$ is continuous. This example can be re-integrated to get examples of $C^k$ functions.

The intermediate value theorem

the extreme value theorem

the mean value theorem. (We formulated this in an extended sense due to Cauchy.)

$$~ f(x) = T_k(x) + \frac{f^{(k+1)}(\xi)}{(k+1)!} (x - c)^{k+1}, ~$$Let $f$ be a $C^{k+1}$ function on $[a,b]$. Let $a < c < b$ and $x$ be in $(a,b)$. Then there exists an $\xi$ between $c$ and $x$ satisfying:

where $T_k(x) = f(c) + f'(c)(x-c) + \cdots + f^{(k)}(c)/k! \cdot (x-c)^k$.

$$T_k$$is the Taylor polynomial of degree $k$ for $f$ about $c$.

There are other ways to write the remainder term, and somewhat relaxed assumptions on $f$ that are possible, but this is the easiest to remember.

A less precise, but still true, form of the above is:

$$~ f(x) = T_k(x) + \mathcal{O}((x-c)^{k+1}). ~$$There exists a $\xi$ in $[a,b]$ with $f(\xi) = 0$.

There exists a $\xi$ in $[a,b]$ with $f'(\xi) = 0$.

There exists a $\xi$ in $[a,b]$ with $f'(\xi) \cdot (b-a) = f(b) - f(c)$.

What are the assumptions on $f$ used in your proof?

- $\sin(x)$
at $c=0$

- $\log(1+x)$
at $c=0$

- $1 / (1 + x)$
at $c=0$

- $\arctan(x)$
at $c=0$

- $\sin(x)$
- $e^x$

We needed a *very large* value of $k$. What if we tried this over a smaller interval, say $0 \leq \xi \leq 1/2$, instead? How big would $k$ need to be then.

We used $f^{(k)}(x) = \pm 1 / (k (1+x)^k)$.

Chapter 2 deals with floating point representation of real numbers. Some basic things we saw along the way:

We saw how the non-negative integers $0, \dots, 2^n-1$ can fit in $n$ bits in a simple manner.

We saw how the integers $-2^{n-1}, \dots, 0, \dots, 2^{n-1}-1$ can fit in $n$ bits using two's complements for the negative numbers. The advantage with this storage is fast addition and subtraction.

The basic storage uses

a sign bit

- $p$
bits to store the significand which is

*normalized*to be $1.ddd\cdots d$. some bits to store the exponent, $e_{min} \leq m \leq e_{max}$.

and all this is put together to create the floating point numbers of the form

$$~ \pm 1.ddddd\cdots d \cdot 2^m. ~$$the sign bit comes first and uses

`1`

for minus, and`0`

for plus.the exponent is stored as an unsigned integer ($0, \cdots 2^k - 1$) and there is an implicit bias to be subtracted. The value $000\cdots 0$ is special and used for $0.0$ (or $-0.0$) and subnormal numbers. The value $111\cdots 1$ is used for

`Inf`

,`-Inf`

and various types of`NaN`

.the significand has an implicit $1$ in front, except for the special numbers (e.g.,

`0`

,`Inf`

, and`NaN`

).

What is $\delta$? Some number between $-\epsilon$ and $\epsilon$. What is $\epsilon$? Good question.

`eps`

through $\epsilon = 1^+ - 1$, where $1^+$ is the next largest floating than $1$. We saw $\epsilon = 2^{-p}$. (With this, epsilon, we could have said $\epsilon/2$ in the definition of $\delta$.)We saw that if $x$ and $y$ are real numbers, that the

*relative*error of the floating point result of $x-y$ can be large if $x$ is close to $y$We saw a theorem that says even if there is no rounding error, the subtraction of $y$ from $x$ can introduce a loss of precision. Basically, if $x$ and $y$ agree to $p$ binary digits, then a shift is necessary of $p$ units. More concretely: if $x > y > 0$ and $1 - y/x \leq 2^{-p}$ then at least $p$ significant binary bits are lost in forming $x-y$.

We saw that if possible we should avoid big numbers, as the errors are then possibly bigger. (Why the book suggests finding $(a+b)/2$ using $a + (b-a)/2$, for example.)

We saw that, when possible, we should cut down on the operations used. (One reason why Horner's method for polynomial evaluations is preferred.)

We saw that errors can accumulate. In particular we discussed this theorem:

If $x_i$ are positive, the relative error in a nieve summation of $\sum x_i$ is $\mathcal{O}(n\epsilon)$.

evaluation of function when the input is uncertain. That is we evaluate $f(x+h)$ when we want to find $f(x)$. (It could be $x + h = x(1+\delta)$, say. For this we have

Or the relative error in the image is the relative error in the domain times a factor $xf'(x)/f(x)$.

evaluation of a perturbed function (which can happen with polynomials that have rounded coefficients). For this, we have $F(x) = f(x) + \epsilon g(x)$. The example we had is if $r$ is a root of $f$ and $r+h$ is a root of $F$. What can we say about $h$? We can see that

Which can be big. The example in the book uses the Wilkinson polynomial and $r=20$. (The Wilkinson polynomial actually is exactly this case, as there is necessary rounding to get its coefficients into floating point.

If $1 = 1.00 \cdot 10^2$. What is $\epsilon$?

What is $3.14 \cdot 10^0 - 3.15 \cdot 10^0$?

What is $4.00 \cdot 10^0$ times $3.00 \cdot 10^1$?

What is $\delta$ (where $fl(x \cdot y) = (x\cdot y)\cdot (1 \cdot \delta)$) when computing $1.23 \cdot 10^4$ times $4.32 \cdot 10^1$?

How many total numbers are representable in this form ($0$ is not)?

What is $\epsilon$?

what is $1.11 \cdot 2^1 - 1.00 \cdot 2^0$?

Convert the number $-1.01 \cdot 2^{-2}$ to decimal.

Let $x=1.11 \cdot 2^0$ and $y=1.11 \cdot 2^1$. Find $\delta$ in $fl(x \cdot y) = (x \cdot y)(1 + \delta)$.

`0101000101000000`

. The first bit, `0`

is the sign bit, the exponent `10100`

and significant `0101000000`

). Can you find the number? Remember the exponent is encoded and you'll need to subtract `01111`

then convert.`E = expm1(x)`

is the more precise version of $e^x - 1$:$$~
(1/2) \cdot E + E/(E+1)
~$$
Can you think of why the direct approach might cause issues for some values of $x$ in that range?

- $\log(x) - \log(y)$
- $x^{-3} (\sin(x) - x)$
- $\sin(x) - \tan(x)$
.

What value of $k$ will ensure that the error over $[0, 1/4]$ is no more than $10^{-3}$?

That is, floating point multiplication is not associative. You can verify by testing `(0.1 * 0.2)*0.3`

and `0.1 * (0.2 * 0.3)`

.

That is, if you computed the difference quotient, $(f(x+h)-f(x))/h$ in floating would you expect smaller and smaller values of $h$ would converge. Why?

This chapter is about solving for zeros of a real-valued, scalar function $f(x)$.

This came from

Intermediate value theorem. If $f(x)$ is continuous on $[a,b]$, then for any $y$ in the interval between $f(a)$ and $f(b)$, there exists $c$ such that $f(c) = y$.

The special case is when $f(a) \cdot f(b) < 0$ ($[a,b]$ is a bracket), then there is a $c$ where $f(c) = 0$.

A proof follows by subsequently bisecting the interval. If we number $a_0, b_0=a,b$, and set $c_0$ equal to $(a_0 + b_0)/2$. Then either $f(c_0)$ is positive, negative or $0$. If $0$, we can stop. If not, then either $[a_0,c_0]$ or $[c_0,b_0]$ will be a bracket. Call this $[a_1,b_1]$ and define $c_1$ as a new midpoint. We repeat and get a sequence $c_0, c_1, \dots$. If this terminates, we are done. Otherwise, since it can be shown $|c_n - c_{n+k}| \leq 2^{-n}|b_0 - a_0|$, that $c_i$ has a limt $c$. This limit will be the zero. It can't have $f(c)> 0$. The values of $c_i$ where $f(c_i)<0$ will also have limit of $c$ and by continuity $f(c) \leq 0$. (This is provided there is an infinite sequences of $c_i$s with $f(c_i) <0$, which requires proof. Similarly, it can't be $f(c) < 0$. So it must be $0$.

The point of the proof is that there is a bound on the error:

$$~ |c_n - c| \leq \frac{1}{2} |b_n - a_n| = \frac{1}{2^{(n+1)}} |b_0 - a_0|. ~$$We discussed various means to find the midpoint:

using

`(a + b)/2`

using

`a + (b-a)/2`

using

`a - a/2 + b/2`

using the midpoint after reinterpreting the floating point values as

ordered integers

The bound shows issues can occur if a) the initial error is big (poor guess) b) the first derivative is close to 0 (near a min or max) c) the concavity is large (curves too much to be well approximated by a line)

We saw three facts:

For a simple zero, we could find $\delta > 0$ so that convergence

was quadratic. (Find $\delta C(\delta) < 1$.)

For a concave up increasing $C^2$ function, we had guaranteed

convergence

For $f(x) = (g(x))^k$, we had linear convergence

The secant method had convergence between linear and quadratic, but only takes 1 function call per step. The secant method had convergence between linear and quadratic, but uses 1 function call per step, unlike Newton's method which uses 2.

Let $f(x) = x^2 - 2$. Starting with $a_0, b_0 = 1, 2$, find $a_4, b_4$.

To compute $\pi$ as a solution to $\sin(x) = 0$, one might use the bisection method with $a_0, b_0 = 3,4$. Were you to do so, how many steps would it take to find an error of no more than $10^{-16}$?

Let $e_n$ be $c_n - c$. The order of convergence of $c_n$ is $q$ provided

Newton's method to solve $\sin(x) = 0$ starting at $3$ has an error term that follows $|e_{n+1}| = C |e_n^3|$. What is the order of convergence?

Explain why the bisection method is no help in finding the zeros of $f(x) = (x-1)^2 \cdot e^x$.

In floating point, the computation of the midpoint via $(a+b)/2$ is discouraged and using $a/2 + b/2$ or $a + (b-a)/2$ is suggested. Why?

Mathematically if $a < b$, it always the case that there exists a $c = (a+b)/2$ and $a < c < b$. Is this also

*always*the case in floating point? Can you think of an example of when it wouldn't be?A simple zero for a function $f(x)$ is one where $f'(x) \neq 0$. Some algorithms have different convergence properties for functions with only simple zeros as compared to those with non-simple zeros. Would the bisection algorithm have a difference?

If you answered yes above, you could still be right, even though you'd be wrong mathematically (Why? look at the bound on the error and the assumptions on $f$.). This is because for functions with non simple zeros, you can have a lot of numeric issues creep in. The book gives an example of the function lie $f(x) = (x-1)^5$. Explain what is going on with this graph near $x=1$:

using Plots f(x) = x^5 - 5x^4 +10x^3 -10x^2 + 5x -1 plot(f, 0.999, 1.001)

For $f(x) = x^2 - 2$, and $x_0 = 1$ and $x_1 = 1.5$ compute 3 steps

of a) the bisection method, b) Newton's method, c) the secant method

We have $f(x) = \sin(x)$ has $[3,4]$ as a bracketing interval. Give a bound on the error $c_n - r$ after 10 steps of the bisection method.

We have $f(x) = x^2 - s$ has a solution $\sqrt{s}$, $s > 0$. Compute $1/2 \cdot f''(\xi)/f'(x_0)$ for $x_0 = s$. Compute the error, $e_1$.

For $f(x) = \sin(x)$, find an interval $[-\delta, \delta]$ for which the

newton iterates will converge *quadratically* to $0$.

Newton's method is applied to the function $f(x) = \log(x) - s$ to find $e^s$. If $x_n < e^s$ show $x_{n+1} < e^s$. If $e_0 > 0$ yet $x_1 > 0$, show $e_1 < 0$. (mirror the proof of one of the theorems)