-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathgeilo_workshop.tex
157 lines (106 loc) · 4.52 KB
/
geilo_workshop.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
%% LyX 2.3.7 created this file. For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass[english]{article}
\usepackage[latin9]{inputenc}
\usepackage{amsmath}
\usepackage{babel}
\begin{document}
\section{Differentiation of Loss Functions Containing ODE Definitions}
Why do we need to caculate the derivative of the simulator?
Let's assume that $S(p)$ is the solution to the simulator given parameters
$p.$ And let's assume that we are solving the inverse problem by
minimizing a cost function $C(p)=\Vert S(p)-d\Vert$. Question: what
is $\frac{dC}{dp}$?
$C(p)=\sum_{i=1}^{n}(S(p)_{i}-d_{i})^{2}$
$\frac{dC}{dp}=\sum_{i=1}^{n}2(S(p)_{i}-d_{i})\frac{dS_{i}}{dp}$
where $\frac{dS}{dp}$ is the change in the simulation output when
changing your parameters.
\section{Forward Sensitivity Analysis}
\subsection{Forward Sensitivity Analysis of an ODE}
$\frac{du(t,p)}{dt}=f(u(t,p),p,t)$
Differentiate both sides by $p$:
$\frac{d}{dt}\frac{du}{dp}=\frac{d}{dp}\frac{du(t,p)}{dt}=\frac{d}{dp}f(u(t,p),p,t)=\frac{\partial f}{\partial u}\frac{du}{dp}+\frac{\partial f}{\partial p}$
Now let's let $s=sensitivty=\frac{du(t,p)}{dp}$
$\frac{ds}{dt}=s'=\frac{\partial f(u(t,p),p,t)}{\partial u}s+\frac{\partial f(u(t,p),p,t)}{\partial p}$
\begin{align*}
u' & =f(u,p,t)\\
s' & =\frac{\partial f}{\partial u}s+\frac{\partial f}{\partial p}
\end{align*}
System of differential equations known as the forward sensitivity
equation.
For this to be an ODE, you need $s(0)$. What is $s(0)$?
$s(0)=\frac{du(0,p)}{dp}=$the derivative of the initial conditions
w.r.t. the parameters.
$\frac{d}{dt}\frac{dx}{d\alpha}=\frac{\partial f_{1}}{\partial x}\frac{dx}{d\alpha}+\frac{\partial f_{1}}{\partial\alpha}$
Take an ODE system of size $n$ with $p$ parameters, and turn it
into a new ODE of size $(n+1)p$, what you get is a time series $\frac{du(t)}{dp}$.
Downsides? Cost $\mathcal{O}(np)$
Question: can you do this in $\mathcal{O}(n+p)$?
\subsection{Forward Sensitivity Analysis of Nonlinear Solving}
So assume you have a nonlinear system:
Solve for $u^{\ast}(p)$ s.t. $f(u^{\ast}(p),p)=0$. So by definition:
\begin{align*}
f(u^{\ast}(p),p) & =0\\
\frac{d}{dp} & f(u^{\ast}(p),p)=0\\
\frac{\partial f}{\partial u} & \frac{du^{\ast}}{dp}+\frac{\partial f}{\partial p}=0\\
Ax & =b
\end{align*}
where $b=-\frac{\partial f}{\partial p}$, $A=\frac{\partial f}{\partial u}$
so then $\frac{du^{\ast}}{dp}=-\frac{\partial f}{\partial u}^{-1}\vert_{u^{\ast}}\frac{\partial f}{\partial p}$
$A(p)x=b$
$\frac{dx}{dp}$
Let $S(p)$ be the solution to solving an optimization problem with
parameters $p$. What is $\frac{dS}{dp}$?
\section{What is Forward-Mode Automatic Differentiation?}
So forward mode automatic differentiation is a scheme for computing
derivatives via dual number arithmatic. (Some formulations of forward
mode AD do not use dual numbers, though this is equivalent to forms
via computational graphs).
Definition: a dual number is a number $d=x+y\epsilon$ (where $\epsilon$
is a dimensional signifier, $(x,y)$, $x$ is known as the primal
and $y$ is the partial) which satisfies the algebra defined by:
\begin{align*}
f(d) & =f(x)+yf'(x)\epsilon\\
\epsilon^{2} & =0
\end{align*}
Think of $x$ as ``carrying forward the original value'', and $y$
as ``the derivative of the program w.r.t. $x$''.
Let's check:
$f(x+1\epsilon)=f(x)+f'(x)\epsilon$
\begin{align*}
(x+y\epsilon)+(v+w\epsilon) & =(x+v)+(y+w)\epsilon\\
(x+y\epsilon)(v+w\epsilon) & =xv+(yv+wx)\epsilon
\end{align*}
$g(f(d))=g(f(x)+yf'(x)\epsilon)=g(f(x))+g'(f(x))f'(x)y\epsilon$
$\Vert d\Vert=\Vert x\Vert$
$\Vert d\Vert=\Vert x\Vert+\Vert y\Vert$
\subsection{What does automatic differentiation do to complex codes, like a solver
for an ordinary differential equation?}
Let's figure out why the freaky magic works by using Euler's method:
$u'=f(u)$
We solve this via the iteration:
\begin{align*}
u_{n+1} & =u_{n}+hf(u_{n})\\
\end{align*}
Now let $u_{n}=x_{n}+y_{n}\epsilon$. Plug it in:
\begin{align*}
x_{n+1}+y_{n+1}\epsilon & =x_{n}+y_{n}\epsilon+hf(x_{n}+y_{n}\epsilon,)\\
x_{n+1}+y_{n+1}\epsilon & =x_{n}+y_{n}\epsilon+h(f(x_{n})+y_{n}f'(x_{n})\epsilon)
\end{align*}
now I split by taking the terms with and without epsilon:
\begin{align*}
x_{n+1} & =x_{n}+hf(x_{n})\\
y_{n+1} & =y_{n}+hy_{n}f'(x_{n})
\end{align*}
now let $h\rightarrow0$, then this is Euler's method for solving
what ODE?
\begin{align*}
x' & =f(x)\\
y' & =f'(x)y
\end{align*}
Let's change notation: $x=u$, $y=s$ then...
\begin{align*}
u' & =f(u)\\
s' & =\frac{\partial f}{\partial u}s
\end{align*}
\end{document}