Differential calculus in Banach spaces is a very important part of mathematics. In fact, the treatment of partial differential equations strongly depends on this theory. Think about the heat equation. We mention that such equations use function spaces as a workspace. The latter is a Banach space, such as the Lebesgue space, the space of continuous and bounded functions. Hence the importance of differentiation in Banach spaces.

A Banach space $E$ is a vector space endowed with a norm $\|\cdot\|$ for which Cauchy sequences converge in $E$. On the other hand, we say that $E$ is complete with respect to the norm $\|\cdot\|$.

We shall denote by $\mathcal{L}(E)$ the space of linear continuous applications from $E$ into $E$, bounded operators. If $T\in \mathcal{L}(E)$, there exists $M\ge 1$ such that $$\|Tx\|\le M \|x\|,\quad \forall x\in E.$$ As we will see in the sequel, the differentiability of an map is a linear continuous applications. Contrary to the one dimension case, where a derivative is just a real number.

## Differentiation in the real number set

Let us start by defining the differentiation of functions of one variable. We say that a function $f:D\subset \mathbb{R}\to \mathbb{R}$ is differentiable at $c\in D$ if the following limit\begin{align*}\ell=\lim_{x\to c\,x\neq c}\frac{f(x)-f(c)}{x-c}\end{align*} exists. In this case, we set $\ell:f'(c)$; called the derivative of $f$ at the point $c$.

Also, we can write \begin{align*} f(c+h)=f(c)+f'(c)h+\varepsilon(h),\end{align*} where $\varepsilon(h)/h \to 0$ as $h\to 0$.

We say that $f$ is differentiable on $D$ if $f$ is differentiable at all points of $D$. In this case, we define the differential function of $f$ by $f’:D\to \mathbb{R}$.

The important thing to remember is that the derivative at a point $c$ is a real number, that is $f'(c)\in\mathbb{R}$.

## Differential calculus in Banach spaces

In this part, we shall work with an infinite-dimensional Banach space $(E,\|\cdot\|)$. Let $f: E\to E$ be an application, not necessarily linear, and $x_0,h\in E$. We say that $f$ is differentiable at $x_0$ if there exists a linear continuous application $L\in \mathbb{L}(E)$ such that \begin{align*}f(x_0+h)=f(x_0)+Lh+\varepsilon(h)\end{align*}such that\begin{align*}\lim_{h\to 0}\frac{ \varepsilon(h)}{\|h\|}=0.\end{align*}

In this case we say that $f$ is differentiable at $x_0$ and $L$ the differential of $f$ at $x_0$, denoted by $L:=Df(x_0)$.

Contrary to $\mathbb{R}$, the differential of $f$ at a point $x_0$ is a linear continuous application from $E$ to $E$; that is $Df(x_0)\in\mathcal{L}(E)$.

## Illustrate this definition: differentiability in the matrix spaces

To that purpose, denote by $E=\mathcal{M}_{m,n}(\mathbb{R})$ the Euclidian spaces of matrices of order $m\times n$ endowed with the inner product\begin{align*}(A,B)\in E\times E\mapsto \langle A,B \rangle:={\rm Tr}(A^T B),\end{align*}where ${\rm Tr}$ is the trace. For a function $f:U\subset E\to \mathbb{R}$ defined on a open set $U$ of $E$ and differentiable in $X\in U,$ we denote by $\nabla f(X)$ the gradient of $f$ in $X$.

### General matrices

Fix $A\in E$ and let $f_A$ the application\begin{align*}f_A: E\to \mathbb{R},\quad X\mapsto f_A(X)={\rm Tr}(A^T X).\end{align*}Let us show that $f_A$ is differentiable on $E$ and determine $\nabla f_A(X)$. In fact, we know that ${\rm Tr}(\cdot)$ is linear. Then $f_A$ is a linear application, which is continuous because it takes values in $\mathbb{R}$. Thus $f_A$ is differentiable on $E$ and the differential application of $f_A$ is $Df_A(X)=f_A$ for any $X\in E$. We then have\begin{align*}Df_A(X)H=f_A(X)={\rm Tr}(A^T H)=\langle A,H \rangle.\end{align*}Hence, we obtain\begin{align*}\nabla f_A(X)=A,\qquad \forall X\in E.\end{align*}

### Differentiability of the space of invertible matrices

Assume that $m=n$, then we will work with square matrices of order $n$. Let $A\in E:=\mathcal{M}_n(\mathbb{R})$. Denote by  $U$ the set of invertible matrices of $E$. Let $g:U\subset E\to \mathbb{R}$ defined by\begin{align*}g_A(X)={\rm Tr}(X^{-1} A).\end{align*}Now we prove that $g_A$ is differentiable on all $X\in U$ and calculate $\nabla g_A(X)$. We recall that the application $\varphi(X)=X^{-1}$ is differentiable on all $X\in U$. In addition, the differential $D\varphi(X)H=-X^{-1}HX^{-1}$ for all $X\in U$ and $H\in E$. We can write \begin{align*}g_A=\psi\circ \varphi,\end{align*}where $\varphi: U\to E$ and $\psi:E\to \mathbb{R}$ are such that $\varphi(X)=X^{-1}$ and $\psi(Y)={\rm Tr}(YA)$. The application $psi$ is differentiable on each $Y\in E$ with $D\psi(Y)K={\rm Tr}(KA)$ for $K\in E$. On the other hand, the application $\varphi$ is differentiable on each $X\in U$ and\begin{align*}D\varphi(X)H=-X^{-1}HX^{-1}.\end{align*}Thus $g_A=\psi\circ\varphi$ is differentiable on each $X\in U$ and\begin{align*}Dg_A(X)H&=D\psi(\varphi(X))(D\varphi(X)H)\cr &= {\rm Tr}(-X^{-1}HX^{-1} A)\cr &= -{\rm Tr}(-X^{-1}AX^{-1} H).\end{align*}This implies that \begin{align*}\nabla g_A(X)=-(X^{-1}AX^{-1})^T.\end{align*}

### Using a canonical inner product

Let $a\in\mathbb{R}^n,$ $U$ and $E$ as in the previous question.  We define an application $h_a:U\to \mathbb{R}$ by\begin{align*}h_a(X):=\langle X^{-1}a,a\rangle.\end{align*} Here $\langle\cdot,\cdot\rangle$ is the canonical inner product in $\mathbb{R}^n$. Prove that $h_a$ is differentiable on all $X\in U,$ and determine $\nabla h_a(X)$. In fact, we write $h_a=\xi\circ \varphi$ where \begin{align*}xi: E\to \mathbb{R},\quad \xi(Y)=\langle Ya,a\rangle.\end{align*}Remark that $\xi$ is linear and continuous. Then it is differentiable on $E$ and $D\xi(Y)=\xi$. Thus $h_a$ is differentiable in all $X\in U$ and\begin{align*}Dh_a(X)H&=\langle -X^{-1}HX^{-1}a,a\rangle\cr &= -a^T\left(X^{-1}AX^{-1}\right)a\cr &= -{\rm Tr}(X^{-1}aa^T X^{-1}H).\end{align*}Hence\begin{align*}\nabla h_a(X)=-(X^{-1}aa^T X^{-1})^T.\end{align*}