The Proposal distribution Q proposes the next point that the random walk might move to. In Probability theory and Statistics, a probability distribution identifies either the probability of each value of an unidentified Random variable A random walk, sometimes denoted RW, is a Mathematical formalization of a trajectory that consists of taking successive Random steps

In mathematics and physics, the Metropolis-Hastings algorithm is a rejection sampling algorithm used to generate a sequence of samples from a probability distribution that is difficult to sample from directly. Mathematics is the body of Knowledge and Academic discipline that studies such concepts as Quantity, Structure, Space and Physics (Greek Physis - φύσις in everyday terms is the Science of Matter and its motion. In Mathematics, rejection sampling is a technique used to generate observations from a distribution. In Mathematics, Computing, Linguistics and related subjects an algorithm is a sequence of finite instructions often used for Calculation In Statistics, a sample is a Subset of a population. Typically the population is very large making a Census or a complete Enumeration In Probability theory and Statistics, a probability distribution identifies either the probability of each value of an unidentified Random variable Sampling is that part of Statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern This sequence can be used in Markov chain Monte Carlo simulation to approximate the distribution (i. e. , to generate a histogram), or to compute an integral (such as an expected value). The algorithm was named in reference to Nicholas Metropolis, who published it in 1953 for the specific case of the Boltzmann distribution, and W. Nicholas Constantine Metropolis ( June 11, 1915 &ndash October 17, 1999) was a Greek American Mathematician, Physicist WikipediaWikiProject Probability#Standards for a discussion of standards used for probability distribution articles such as this one K. Hastings,[1] who generalized it in 1970. The Gibbs sampling algorithm is a special case of the Metropolis-Hastings algorithm which is usually faster and easier to use but is less generally applicable. In Mathematics and Physics, Gibbs sampling is an Algorithm to generate a sequence of samples from the joint probability distribution of two or

The Metropolis-Hastings algorithm can draw samples from any probability distribution P(x), requiring only that a function proportional to the density can be calculated at x. In Probability theory and Statistics, a probability distribution identifies either the probability of each value of an unidentified Random variable In Bayesian applications, the normalization factor is often extremely difficult to compute, so the ability to generate a sample without knowing this constant of proportionality is a major virtue of the algorithm. The algorithm generates a Markov chain in which each state xt + 1 depends only on the previous state xt. In Mathematics, a Markov chain, named after Andrey Markov, is a Stochastic process with the Markov property. The algorithm uses a proposal density Q(x';xt), which depends on the current state xt, to generate a new proposed sample x'. This proposal is 'accepted' as the next value (xt + 1=x' ) if α drawn from U(0,1) satisfies

$\alpha < \frac{P(x')Q(x^t|x')}{P(x^t)Q(x'|x^t)} \,\!.$

If the proposal is not accepted, then the current value of x is retained: xt + 1 = xt.

For example, the proposal density could be a Gaussian function centred on the current state xt:

$Q( x'; x^t ) \sim N( x^t, \sigma^2 I) \,\!$

reading Q(x';xt) as the probability density function for x' given the previous value xt. In Mathematics, a Gaussian function (named after Carl Friedrich Gauss) is a function of the form f(x = a e^{- { (x-b^2 \over 2 This proposal density would generate samples centred around the current state with variance σ2I. The original Metropolis algorithm calls for the proposal density to be symmetric ( Q(x;y) = Q(y;x) ); the generalization by Hastings lifts this restriction. It is also permissible for Q(x',xt) not to depend on x' at all, in which case the algorithm is called "Independence Chain Metropolis-Hastings" ( as opposed to "Random Walk Metropolis-Hastings" ). The Independence Chain M-H algorithm with a suitable proposal density function can offer higher accuracy than the random walk version, but it requires some a priori knowledge of the distribution.

## Step-by-step instructions

Suppose the most recent value sampled is $x^t\,$. To follow the Metropolis-Hastings algorithm, we next draw a new proposal state $x'\,$ with probability $Q(x'; x^t)\,$, and calculate a value

$a = a_1 a_2\,$

where

$a_1 = \frac{P(x')}{P(x^t)} \,\!$

is the likelihood ratio between the proposed sample $x'\,$ and the previous sample $x^t\,$, and

$a_2 = \frac{Q( x^t; x' )}{Q(x';x^t)}$

is the ratio of the proposal density in two directions (from $x^t\,$ to $x'\,$ and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state $x^{t+1}\,$ is chosen according to the following rules.

$\begin{matrix}\mbox{If } a \geq 1: & \\& x^{t+1} = x',\end{matrix}$
$\begin{matrix}\mbox{and if } a < 1: & \\& x^{t+1} = \left\{ \begin{matrix} x'\mbox{ with probability }a \\ x^t\mbox{ with probability }1-a. \end{matrix} \right.\end{matrix}$

The Markov chain is started from a random initial value x0 and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The remaining set of accepted values of x represent a sample from the distribution P(x). In Statistics, a sample is a Subset of a population. Typically the population is very large making a Census or a complete Enumeration

The result of three Markov chains running on the 3d Rosenbrock function using the Metropolis-Hastings algorithm. In Mathematics, a Markov chain, named after Andrey Markov, is a Stochastic process with the Markov property. In Mathematical optimization, the Rosenbrock function is a non- Convex function used as a test problem for optimization algorithms It is obvious how the algorithm samples from regions where the posterior probability is high and how the chains begin to mix in these regions. The posterior probability of a Random event or an uncertain proposition is the Conditional probability that is assigned after the relevant evidence is taken The approximate position of the maximum has been illuminated.

The algorithm works best if the proposal density matches the shape of the target distribution P(x), that is $Q(x'; x^t) \approx P(x') \,\!$, but in most cases this is unknown. If a Gaussian proposal density Q is used the variance parameter σ2 has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last N samples. The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one dimensional Gaussian distribution is approx 50%, decreasing to approx 23% for an N-dimensional Gaussian target distribution. If σ2 is too small the chain will mix slowly (i. e. , the acceptance rate will be too high, so the sampling will move around the space slowly and converge slowly to P(x)). If σ2 is too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so a1 will be very small.