Simulated Annealing

Links

Quote

7.5 Simulated Annealing | Advanced Statistical Computing (bookdown.org)

Theory

Simulated annealing is a techniaue for minimizing functions using Markov Chain Monte Carlo (MCMC) samplers. It is very usefull for wiggly functions that classic Newton-style optimizers can not handle.

Let $h(\theta)$ be the function for which we want to find the minimum. Let $\theta$ be the vector of parameters in the space $S$.

The idea is to build successive approximations of the target density $\pi(\theta)$ until it is very close to the final one.

Let the uniform distribution over all the global minimizers be such that:

\[ S\star = \{ \theta \in S : h(\theta) = min_{\theta}( h(\theta)) \} \]

and define

\[ \left\{ \begin{array}{11} \pi(\theta) \propto 1 && \forall \theta \in S \star \\ \pi(\theta) = 0 &&\forall \theta \notin S \star \end{array} \right. \]

The uttimate goal is to find a way to sample from ${\pi(\theta)}$.

First begin to build am approximate density called $\pi_{T} (\theta)$ where

\[ \pi_{T}(\theta) \propto exp{\left( \frac{-h(\theta)}{T} \right)} \]

T is the temperature.

As $T \rightarrow \infty$, $\pi(T)$ approaches to unifrom density
As $T \rightarrow 0$, $\pi(T) \rightarrow \pi(\theta)$

The aim is to draw many samples from $\pi_{T}(\theta)$ initially with a large value of $T$, and to lower $T$ towards $0$ slowly. As we lower $T$, the density $\pi_{T}(\theta)$ will become more and more concentrated around the minima of $h(\theta)$.

The general stategy is to start with a large temperature to explore the whole sample space and then decrease slowly $T$ until the solution converges to a minima.

The sample preocedure is the following: Choose a symmetric proposal density $q(. \mid \theta)$, then for the iteration $n$ with state $\theta _{n}$:

Sample $\quad \theta \star \sim q(\theta \mid \theta_{n})$
Sample $\quad U \sim \mathcal{U}(0, 1)$
Compute $$ \alpha(\theta \star \mid \theta_{n}) = min \left( \frac{exp(-h(\theta \star)/T)}{exp(-h(\theta_{n})/T)}, 1 \right) = min \left( exp \left( \frac{-(h(\theta \star - h(\theta_{n})}{T} \right), 1 \right) $$
Accept $\theta \star$ as the next state if $\quad U \leq \alpha (\theta \star \mid \theta_{n})$
Decrease the temperature T

If $h(\theta \star) \leq h(\theta_{n})$ then the proposal is always accepted so that the solution converges to the minimum. If $h(\theta \star) \geq h(\theta_{n})$ then the proposal $\theta \star$ has still a chance to be accepted (even if this solution is greater and we want to find the minima). This allows to avoid getting stuck in a local minima. As T decreses it becomes less likely that an uphill proposal is accepted (in order to converge to a minima).

A cooling schedule for T has to be choosen, and in general:

\[ T_{n} = \frac{a}{log(n+b)} \quad \forall a, b > 0 \]

This make simulated annealing a very slow algorithm to converge.