Inverse Problem Theory
Inverse problems are opposed to direct problems.
Here the theory is denoted by \(g\), the model by \(\vec{m}\) and the data by \(d\).
The direct problem
In a direct problem:
\[
set of\ parameters \rightarrow theory \rightarrow prediction
\]
i.e. \(\(g(\vec{m}) = d\)\)
This means that a direct problem has a single solution.
Example of direct problem: calculate the travel time of a P-wave.
\[
t = t_{0} \ + \ \frac{\sqrt{(x_{s} - x_{r})^{2} + (z_{s} - z_{r})^{2} }}{V}
\]
The inverse problem
In an inverse problem:
\[
observed\ or\ synthetic\ data\ \rightarrow model
\]
i.e.
\[\vec{m}=g^{-1}(d)\]
- This means that in an inverse problem, there can be multiple solutions.
- \(g\) is not bijective.
Over the Gaussian hypothesis, the description of a data having a density of probability \(\rho\) is:
\[
\rho(d) = \frac{1}{\sigma \sqrt{2\pi}} exp \left( \frac{-(d - E(d))^{2}}{2 \sigma^{2}} \right)
\]
Knowing the model, it becomes:
\[
\rho(d \mid \vec{m}) = \frac{1}{\sigma \sqrt{2\pi}} exp \left( \frac{-(d - g(\vec{m}))^{2}}{2 \sigma^{2}} \right)
\]
For multiple data sets:
\[
\rho(\vec{d} \mid \vec{m}) = \prod_{i=1}^N \rho(d_i \mid \vec{m})= \prod_{i=1}^N \frac{1}{\sigma_i \sqrt{2\pi}} exp \left( \frac{-(d_i - g_i(\vec{m}))^{2}}{2 \sigma_i^{2}} \right)
\]
When \(d=d_{obs}\), this equation is also known as the likelyhood:
\[
\rho(\vec{d_{obs}} \mid \vec{m}) = \prod_{i=1}^N \rho(d_{i, obs} \mid \vec{m})= \prod_{i=1}^N \frac{1}{\sigma_i \sqrt{2\pi}} exp \left( \frac{-(d_{i, obs} - g_i(\vec{m}))^{2}}{2 \sigma_i^{2}} \right)
\]
Bayes theorem and development
The more general expression of an inverse problem is:
\[
\rho (\vec{m} \mid \vec{d_{obs}}) = \frac{\rho(\vec{m}) \rho( \vec{d_{obs}} \mid \vec{m})}{\int\ \rho(\vec{m}) \rho( \vec{d_{obs}} \mid \vec{m}) \ d\vec{m}} = \mathcal{K} \ \rho(\vec{m}) \rho( \vec{d_{obs}} \mid \vec{m})
\]
\[
\rho (\vec{m} \mid \vec{d_{obs}}) = \frac{\mathcal{K} \rho(\vec{m})}{2} \left( \prod_{i=0}^N \frac{1}{\sigma_i \sqrt{2\pi}} \right) \sum_{i=1}^N exp \left( \frac{-(d_{i, obs} - g_{i}(\vec{m}))^2}{\sigma_i^2} \right)
\]
This defines \(\mathcal{X}(\vec{m})\):
\[
\mathcal{X}(\vec{m}) = \frac{(d_{i, obs} - g_{i}(\vec{m}))^2}{\sigma_i^2}
\]
Note that \(\mathcal{X}(\vec{m})\) can also be written using a matrix form:
\[
\mathcal{X}(\vec{m}) =
(\vec{d_{obs}} - \vec{g}(\vec{m}))^T
\begin{pmatrix}
1/\sigma_i^2 & 0 & 0 \\
0 & 1/\sigma_i^2 & 0 \\
0 & 0 & 1/\sigma_i^2
\end{pmatrix}
(\vec{d_{obs}} - \vec{g}(\vec{m}))
\]
\[
\mathcal{X}(\vec{m}) =
(\vec{d_{obs}} - \vec{g}(\vec{m}))^T
\ C_{d}^{-1} \
(\vec{d_{obs}} - \vec{g}(\vec{m}))
\]
\(C_d^{-1}\) is the covariance matrix of the data. If the data are independent, there are only diagonal coefficients, otherwise extradiagonal coefficients appear.
And then:
\[
\rho (\vec{m} \mid \vec{d_{obs}}) = \frac{\mathcal{K} \rho(\vec{m})}{2} \left( \prod_{i=0}^N \frac{1}{\sigma_i \sqrt{2\pi}} \right) \sum_{i=1}^N exp \left( - \mathcal{X}(\vec{m}) \right)
\]
If only the covariance matrix \(C_d^{-1}\) is diagonal:
\[
\rho (\vec{m} \mid \vec{d_{obs}}) = \frac{\mathcal{K} \rho(\vec{m})}{(2\pi)^{\frac{N}{2}} \sqrt{det\ C_d}}
exp \left( - \frac{1}{2} (\vec{d_{obs}} - \vec{g}(\vec{m}))^T
\ C_{d}^{-1} \
(\vec{d_{obs}} - \vec{g}(\vec{m})) \right)
\]
There are now two approaches to solve the inverse problem if \(\sigma_i\) is known:
- The determinist approach consisting in minimizing \(\mathcal{X}(\vec{m})\). The final result is a single model that is the closest to the input data \(d_{i, obs}\).
- The probabilistic approach consisting in exploring the space of models and to describe the shape of \(\mathcal{X}(\vec{m})\).
- The stochastic approach testing every set of parameters (inneficient)