There is a critical (and still highly relevant) paper on sequential Monte Carlo (SMC) samplers which deserves a read from all those interesting in probabilistic modeling and approximate inference.
The paper is Sequential Monte Carlo samplers by Pierre Del Moral, Arnaud Doucet, and Ajay Jasra (DMDJ) - whose contributions include laying the conceptual bricks for modern variants of SMC by carefully discussing a methodology to sample sequentially from a sequence of probability distributions defined on the same space.
In this note, I’ll be working through the paper - elaborating where I may, and deferring to other material as required.
Sampling from probability distributions is a commonly desired operation in probabilistic modeling and inference. Especially in the latter case, direct sampling can sometimes be intractable - either because the distribution in question may be defined indirectly through marginalization of an existing density (which is often intractable, although sometimes not) or because existing methods do not apply to densities which lack an analytic CDF.
Practitioners have constructed clever approximation techniques to get around these limitations, including Monte Carlo methods - algorithms which internally use randomness to approximately perform optimization, numerical integration, and sampling from target probability distributions.
Sequential Monte Carlo (SMC) is one such method with a unique set of computational trade offs compared to other approximate inference techniques. One way of understanding sequential Monte Carlo steps is as incremental importance sampling, where particle populations are evolved and re-weighted at each step - including the effects of conditioning on new observations. Formally, SMC allows us to think about the transition:
if I have a particle representation of the LHS, I can perform an SMC update to target the RHS.
As part of the motivation, (DMDJ) discuss a number of computational comparisons between SMC samplers and Markov chain Monte Carlo (MCMC) samplers.
One easy way to understand why this technique is useful is to imagine that a you want to update a distribution in light of new evidence in an online fashion (one new observation every so often).
Traditionally, SMC algorithms were developed to solve dynamical systems filtering problems. Let’s look at the decomposition of the joint from the model displayed above.
Now, from this joint, we could use SMC to compute approximations to the sequence of distributions
where
From the importance weights, it’s easy to spot the “absolute continuity” condition:
This presentation is originally from Doucet, de Freitas, and Gordon. This presentation should also be praised for treatment of Monte Carlo sampling in general (and in computational comparison with techniques like numerical integration).
In this section, I’d like to discuss a modification to the SMC framework discussed above which introduces MCMC steps in between SMC update steps. The resultant algorithm class is profoundly interesting. I discuss this class in more depth in
There are quite a few relevant works which have contributed to my own presentation here: