Probabilistic Programming with Vectorized Programmable Inference

Paper (PDF) ACM Digital Library Source Code Zenodo Artifact

Abstract #

We present GenJAX, a new language and compiler for vectorized programmable probabilistic inference. GenJAX integrates the vectorizing map (vmap) operation from array programming frameworks such as JAX into the programmable inference paradigm, enabling compositional vectorization of features such as probabilistic program traces, stochastic branching (for expressing mixture models), and programmable inference interfaces for writing custom probabilistic inference algorithms. We formalize vectorization as a source-to-source program transformation on a core calculus for probabilistic programming ($\gen$), and prove that it correctly vectorizes both modeling and inference operations. We have implemented our approach in the GenJAX language and compiler, and have empirically evaluated this implementation on several benchmarks and case studies. Our results show that our implementation supports a wide and expressive set of programmable inference patterns and delivers performance comparable to hand-optimized JAX code.

Presentation #

Watch on YouTube: POPL 2026 Presentation

Introduction #

In recent years, probabilistic programming has demonstrated remarkable effectiveness in a range of application domains, including 3D perception and scene understanding, probabilistic robotics, automated data cleaning and analysis, particle physics, time series structure discovery, test-time control of large language models, and cognitive modeling of theory of mind. All of these applications require sophisticated probabilistic reasoning over complex, structured data and rely on probabilistic programming languages (PPLs) with programmable inference — the ability to customize probabilistic inference algorithms through proposals, kernels, and variational families — to improve the quality of posterior approximation. But fully realizing the benefits that probabilistic programming can deliver often requires substantial computational resources, as probabilistic inference scales by increasing the number of likelihood evaluations, sequential Monte Carlo particles, or Markov chain Monte Carlo chains.

We present GenJAX, a new language and compiler for vectorized programmable probabilistic inference. GenJAX integrates the vectorizing map (vmap) operation from array programming frameworks such as JAX into the context of probabilistic programming with programmable inference, enabling the compositional vectorization of features such as probabilistic program traces, stochastic branching (for expressing mixture models), and programmable inference interfaces. This vectorization enables the implementation of compute-intensive probabilistic programming and probabilistic inference operations on modern GPUs, making it possible to deploy the substantial computational resources that GPUs provide to accelerate large-scale probabilistic inference.

Design Considerations

GenJAX is designed around the interaction between vmap and several probabilistic programming features that support the implementation of sophisticated models and inference algorithms:

Compositional vectorization. Our target class of probabilistic programs features multiple vectorizable computational patterns. Examples include computing likelihoods simultaneously on many pieces of data (as part of modeling) and evolving collections of particles or chains (as part of inference). Our integration of vmap therefore supports vectorization of both modeling and inference code.
Vectorization of probabilistic program traces. Traces are structured record objects used to represent samples. They are a data lingua franca for Monte Carlo and variational inference. Under vectorization by vmap, traces support an efficient struct-of-array representation.
Vectorized stochastic branching. Probabilistic mixture models, regime-switching dynamics models, and adaptive inference algorithms require stochastic branching using random values. GenJAX supports stochastic branching while maintaining vectorization.

Essential capabilities for vectorized probabilistic programs — **Figure.** Computational patterns in vectorizable probabilistic programs. Left: Within models, vectorization can be used to parallelize conditionally independent computations. Within inference, vectorization can be used to simulate multiple particles in parallel. `vmap` should be applicable in both settings. Center: Traces are records used to represent samples from probabilistic programs. Both vectorized models and vectorized inference algorithms are designed to work with vectorized (struct-of-array) traces. Right: Probabilistic programs can branch on random values, and `vmap` of probabilistic programs should preserve this capability.

GenJAX design overview — **Figure.** The design and implementation of GenJAX. Left: GenJAX extends `vmap` to apply to both generative models and inference algorithms. Our system implements inference on a vectorized model by vectorizing inference applied to the model, which is justified by the commutativity corollary in the paper. Right: Survey of features in our language and compiler: usage of these features is illustrated in Section 2 and Section 7, and implementation is discussed in Section 6.

Contributions

GenJAX: high-performance compiler (Section 6). GenJAX is an open-source compiler that extends JAX and vmap to support programmable probabilistic inference. Probabilistic programs in GenJAX can be systematically transformed to take advantage of opportunities for vectorization in both modeling and inference. Our compiler also eliminates the overhead present in many libraries for programmable inference: we implement simulation and density interfaces using lightweight effect handlers, and exploit JAX's support for program tracing to partially evaluate inference logic away at compile time, leaving only optimized array operations. Our design maintains full compatibility with JAX's underlying ecosystem for automatic differentiation and CPU/GPU/TPU compilation.
Formal model: interaction between vmap and programmable inference features (Section 3). We develop a formal model characterizing how vmap interacts with probabilistic program traces and programmable inference interfaces. We introduce $\gen$, a calculus for probabilistic programming and programmable inference, on top of a core probabilistic array language for stochastic parallel computations. We define vmap as a program transformation, prove its correctness, and show how it interacts with programmable inference interfaces to support vectorization of probabilistic computations and traces.
Empirical evaluation (Section 7). We evaluate our design and implementation through a series of benchmarks and case studies: performance comparison, where GenJAX achieves near-handcoded JAX performance and can outperform existing vectorized and high-performance PPLs and array programming frameworks (JAX, PyTorch, Pyro, NumPyro, and Gen); and high-dimensional vectorized inference, where we study approximate Game of Life inversion (find the previous 512 x 512 board state which leads to the observed state) and sequential 2D robot localization with simulated LIDAR measurements. In both case studies, we use GenJAX to develop sophisticated vectorized inference algorithms, including vectorized Gibbs sampling and sequential Monte Carlo with vectorized proposals. Our final GenJAX programs exhibit high approximation accuracy and run in milliseconds on consumer-grade GPUs.

Explore the Paper

Choose your path through the material. The Tutorial track follows the paper's Overview section with a running polynomial regression example. The Theory track follows the Formal Model and Compiler sections with semantics, transformations, and proofs. All shows everything.

Overview (Tutorial): Vectorized Probabilistic Programming #

To introduce our language, consider the task of polynomial regression: given a dataset of pairs $(x_i, y_i) \in \mathbb{R}^2$ for $1 \leq i \leq n$, we wish to infer a polynomial relating $x$ and $y$. In the following sections, we illustrate how to solve this problem using generative functions and programmable inference in GenJAX.

Vectorizing Generative Functions with `vmap`

The first figure depicts a generative model for quadratic regression. The ultimate goal is to, given a noisy dataset $(x_i, y_i)_{1 \leq i \leq n}$, infer a quadratic function that plausibly governs the relationship between $x$ and $y$. Our model for this task is defined by composing generative functions, each defined as a @gen-decorated Python function. A key feature of GenJAX is that each random choice is assigned a string-valued name using the syntax dist @ "name". The polynomial generative function describes a prior distribution on the coefficients (a, b, c) of the underlying quadratic function. The point generative function models how an individual datapoint is generated based on those coefficients. Finally, npoint_curve calls polynomial to generate coefficients and maps the point generative function over a vector of inputs.

This is our first use of vmap: we use it to generate multiple y values in parallel, exploiting the fact that datapoints are generated conditionally independently of one another, given the coefficients. This is an instance of a general pattern that appears in many probabilistic programs, and is one key place where vectorization can yield significant speed-ups when parts of the generative model can be parallelized.

Generative functions

# Basic polynomial model
@gen
def polynomial():
  # @ denotes introduction of
  # random choices
  a = normal(0, 1) @ "a"
  b = normal(0, 1) @ "b"
  c = normal(0, 1) @ "c"
  return (a, b, c)

# Point model with noise
@gen
def point(x, a, b, c):
  y_mean = a + b * x + c * x ** 2
  y = normal(y_mean, 0.2) @ "obs"
  return y

Vectorization with vmap

@gen
def npoint_curve(xs):
  (a, b, c) = polynomial() @ "curve"
  # Vectorization for modeling: here, over data points
  ys = point.vmap(args_mapped=0)(xs, a, b, c) @ "ys"
  return (a, b, c), ys

# Vectorized sampling from the generative function
# using the simulate interface.
xs = array([0.1, 0.3, 0.4, 0.6])
traces = vmap(simulate(npoint_curve), repeat=4)(xs)

# Vectorized evaluation of the pointwise density
# using the assess interface.
xs = traces.get_args()
densities, retvals = (
    vmap(assess(npoint_curve), args_mapped=0)(
        traces, xs
    )
)

Figure. Vectorization of generative functions. Left: Probabilistic programs encoding a prior over quadratic functions, and a single-datapoint likelihood. Right: vmap can be used to parallelize the likelihood: the same program that works for single points works for many points via vmap. Inference operations are also compatible with vmap.

When vmap transforms a generative function, it induces a transformation on the values in the trace, preserving the structure of the trace while converting scalars to arrays and returning a trace in struct-of-array representation.

Vectorized Programmable Inference

Generative functions are compiled to implementations of the generative function interface, which includes methods like simulate and assess. The simulate method runs a generative function and yields an execution trace, and assess computes the probability density of a generative function at a given trace. These methods can be composed to implement inference algorithms. For example, likelihood weighting involves simulating many possible traces from the prior, and assessing them under the likelihood.

By vectorizing the compiled simulate and assess methods, we can generate or assess many traces at once. We can use vmap to scale the number of guesses, automatically transforming single-particle code into a many-particle vectorized version. If the algorithm is executed in parallel on a GPU, the number of particles can be freely increased as long as the GPU has free memory. The time remains near constant as we increase the number of particles, and the accuracy improves to convergence.

Scaling behavior of vectorized importance sampling — **Figure.** Vectorized programmable inference. Top left: Single-particle importance sampling with a proposal (the default proposal is the prior in the `npoint_curve` model, excluding the `"obs"` random variable) implemented using generative function interface methods (`simulate` and `assess`). Top right: Using `vmap`, we can automatically transform the single-particle version into a many-particle vectorized version. Middle: The vectorized version runs in parallel on GPUs; the runtime is nearly constant as long as the GPU has memory to spare. Increasing the number of particles increases accuracy. Bottom: Posterior approximations for different numbers of particles N.

Posterior approximations at different particle counts — **Figure.** Vectorized programmable inference. Top left: Single-particle importance sampling with a proposal (the default proposal is the prior in the `npoint_curve` model, excluding the `"obs"` random variable) implemented using generative function interface methods (`simulate` and `assess`). Top right: Using `vmap`, we can automatically transform the single-particle version into a many-particle vectorized version. Middle: The vectorized version runs in parallel on GPUs; the runtime is nearly constant as long as the GPU has memory to spare. Increasing the number of particles increases accuracy. Bottom: Posterior approximations for different numbers of particles N.

Improving Robustness Using Stochastic Branching

In real-world data, the assumptions of simple polynomial regression are often violated. Our polynomial model assumes every data point follows the same noise model — but what if 10% of our measurements follow a different distribution? We can improve robustness by using stochastic branching, which allows us to account for outlier observations through heterogeneous mixture modeling. Each data point gets a latent outlier flag. If the flag is true, the observation comes from a uniform distribution; if false, it follows the noisy polynomial curve.

Outlier detection comparison — **Figure.** Robust modeling with stochastic branching. Stochastic branching allows us to extend our models to explain more complex data, including data with outliers. Circle markers depict observed data points; the shading of the marker denotes the estimated posterior probability that the point is an outlier. Bottom, left: Using importance sampling to construct a posterior in our original model results in a poor explanation of the data. Bottom, middle: Extending the model to explicitly represent outliers as random variables should allow us to produce better explanations, but results in a harder inference problem which importance sampling cannot effectively solve. Bottom, right: Changing inference to vectorized MCMC using Gibbs sampling (to infer outliers) and Hamiltonian Monte Carlo (to infer continuous parameters) finds better explanations of the data, i.e., more accurate posterior approximations.

Improving Inference Accuracy Using Programmable Inference

Even when a model's assumptions are sensible, inference can fail to find good explanations of a given dataset. Importance sampling applied to the outlier model identifies likely outliers, but has wide uncertainty over the possible curves, and several curves do not seem to explain the data well. This is a kind of underfitting: by adding new latent variables to our model, we have made inference more challenging, and the "guess and check" approach of importance sampling runs into limitations, even with $N = 10^5$ particles — the limit where our GPU memory begins to saturate.

The right panel of the outlier figure illustrates the results of a custom hybrid algorithm, which combines Gibbs sampling and Hamiltonian Monte Carlo (HMC). The algorithm uses Gibbs sampling to identify which points are outliers, and HMC to sample from the posterior distribution over curves, given the inliers. This algorithm generates much more accurate posterior samples that explain the data well.

Vectorized Gibbs Sampling and the Generative Function Interface

We present the GenJAX implementation of the Gibbs sampling step of our hybrid algorithm. Our implementation highlights generative function interface methods, including trace manipulation and getter methods. In our outlier model, we apply Gibbs sampling to update the vector of outlier choices, keeping other random choices constant. As each outlier choice is conditionally independent from the others (given all the non-outlier choices), the outlier updates can be vectorized. For each element, we enumerate the unnormalized posterior density for the possible values for the outlier value, and then sample a new value from a categorical distribution.

Enumerative Gibbs update for single point

def gibbs_outlier(subtrace):
  def _assess(v):
    (x, a, b, c) = subtrace.args()
    chm = {"outlier": v,
           "obs": subtrace["obs"]}
    log_prob, _ = assess(point_with_outliers)(
      chm, x, a, b, c
    )
    return log_prob

  log_probs = vmap(_assess)(
    array([False, True])
  )
  return categorical(log_probs) == 1

Vectorized enumerative Gibbs

# `trace` is a single trace object
# whose fields store batched values.
def enumerative_gibbs(trace):
  xs  = trace.get_args()
  # `subtrace` refers to the struct-of-arrays
  # view for the "ys" addresses.
  subtrace = trace.get_subtrace("ys")
  new_outliers = vmap(gibbs_outlier)(subtrace)
  # `update` applies the generative function
  # interface method that edits a trace.
  new_trace, weight, _ = update(
    trace,
    {"ys": {"outlier": new_outliers}},
  )
  return new_trace

Figure. Vectorized enumerative Gibbs sampling for outlier detection. Left: Enumerative Gibbs update for a single data point's outlier indicator. For each possible value (inlier/outlier), we compute the log probability under the model (proportional to the unnormalized posterior) and sample a new indicator using categorical sampling. Right: Vectorized Gibbs sampling step that applies the single-point update across all data points using vmap, then updates the trace with the new outlier indicators.

simulate: sampling

# Unconstrained sampling of a trace
tr = simulate(npoint_curve)(xs)

assess: density evaluation

# Evaluate log density at traced sample
chm = get_choices(tr)
logp, retval = assess(npoint_curve)(chm, xs)

generate: importance sampling

# Constrained sampling of a trace
partial_chm = {"ys": {"obs": data}}
tr_, weight = generate(npoint_curve)(
    partial_chm, xs
)

update: trace modification

# Modify a trace given constraints
new_chm = {"curve": {"a": 1.0}}
tr_, w, discard = update(npoint_curve)(
    tr, new_chm, xs
)

Figure. Generative function interface methods. GenJAX's generative functions provide several methods for programmable inference — a way to extend the system with new variants of inference using high-level interfaces. For authoring programmable algorithms which use proposal distributions (like sequential Monte Carlo), the simulate method performs unconstrained sampling and reciprocal density evaluation. For density evaluation, assess evaluates the log joint density of a generative function on traced samples. The generate interface performs constrained sampling (using importance weighting), allowing construction of a trace with observation constraints. The update method modifies a trace with provided choices, returning an updated trace and an incremental importance weight, and is used by algorithms like Gibbs sampling or Hamiltonian Monte Carlo to modify traces.

Formal Model (Theory): $\gen$ — A Core Calculus #

In this section, we give the syntax and semantics of a core calculus for traced probabilistic programming with vectors, and formalize a program transformation that vectorizes probabilistic programs. The formal model distills key ideas from our actual implementation in JAX, described in Section 6.

Syntax of $\gen$

$\gen$ is a simply-typed lambda calculus which extends a standard array programming calculus in two main ways: (1) a probability monad for stochastic computations; and (2) a graded monad of generative functions, or traced probabilistic programs. Generative functions can be automatically compiled to the density functions and stochastic traced simulation procedures necessary for inference.

\begin{aligned} B &::= \mathbb{B} \mid \mathbb{R} \mid \mathbb{R}_{>0} \\ T &::= B \mid T[n] \\ \eta &::= 1 \mid T \mid \eta_1 \times \eta_2 \mid \{k_1 : \eta_1, \ldots, k_n : \eta_n\} \\ \tau &::= \eta \mid \tau_1 \rightarrow \tau_2 \mid \tau_1 \times \tau_2 \mid \Dm{\eta} \mid \Pm{\eta} \mid \Gm{\gamma}{\eta} \end{aligned}

Figure. Core type grammar for $\gen$: base types, batched types, ground types, and computation types.

Denotational Semantics

We give a denotational semantics for $\gen$ using quasi-Borel spaces (QBS), a standard mathematical framework for higher-order probabilistic programming. We assign to each type a space and to each term a map from the interpretation of the environment to the interpretation of its return type. A generative function is interpreted as a pair of a measure on traces and a return value function that computes the program's output given values for all random choices.

Programmable Inference Transformations

The formal model characterizes how generative functions support programmable inference by compiling to simulation and density-evaluation procedures. These transformations provide the foundation for implementing inference algorithms via high-level interfaces such as simulate, assess, generate, and update, which we use throughout the system.

Vectorization as Program Transformation

We introduce vmap as a program transform for vectorization and prove its correctness for deterministic, probabilistic, and generative computations. The proofs show how vectorization preserves distributions and trace structure and how it interacts with programmable inference interfaces.

Compiler Connection

The formal results justify the implementation strategy in the compiler section: inference on a vectorized model can be implemented by vectorizing inference applied to the model. Section 6 describes the compiler architecture and how these ideas are realized in a JAX-based implementation.

Evaluation #

We evaluate our language and compiler implementation on benchmarks and case studies designed to assess the following criteria:

Performance. How does the performance of our compiler implementation compare to leading programmable inference systems? Do our abstractions introduce overhead compared to handcoded implementations of inference? We survey the performance properties of GenJAX against open-source PPLs and tensor frameworks on standard modeling and inference tasks, for both embarrassingly-parallel algorithms (importance sampling) and iterative differentiable algorithms (Hamiltonian Monte Carlo).
Inference quality. vmap provides a convenient way to express inference problems over high-dimensional spaces. We study probabilistic Game of Life inversion on large boards using approximate inference and use GenJAX to construct an efficient nested vectorized Gibbs sampler. We also study a probabilistic model of robot localization using simulated LIDAR measurements and construct sequential Monte Carlo algorithms, including an efficient algorithm using proposals with vectorized locally optimal grid approximations.

Performance Survey Evaluation

Beta-Bernoulli inference accuracy and timing comparison — **Figure.** Performance evaluation across probabilistic programming frameworks. (a) Beta-Bernoulli inference comparing posterior accuracy and execution time for GenJAX, NumPyro, and handcoded JAX implementations with 50 observations and 2000 samples, demonstrating importance sampling using GenJAX's programmable inference abstractions is competitive with handcoded performance. (b) Scaling analysis across six frameworks showing GenJAX achieves performance is consistently near handcoded JAX and competitive with other open-source PPLs, for both importance sampling and Hamiltonian Monte Carlo.

Importance sampling timing comparison — **Figure.** Performance evaluation across probabilistic programming frameworks. (a) Beta-Bernoulli inference comparing posterior accuracy and execution time for GenJAX, NumPyro, and handcoded JAX implementations with 50 observations and 2000 samples, demonstrating importance sampling using GenJAX's programmable inference abstractions is competitive with handcoded performance. (b) Scaling analysis across six frameworks showing GenJAX achieves performance is consistently near handcoded JAX and competitive with other open-source PPLs, for both importance sampling and Hamiltonian Monte Carlo.

The top panel examines the runtime characteristics of our compiler on importance sampling in a Beta-Bernoulli model. The model infers the bias of a coin from observed flips, using a Beta(1,1) prior and Bernoulli likelihood. We observe 50 flips and construct a posterior approximation using importance sampling. The results confirm that all frameworks accurately recover the true posterior distribution. GenJAX achieves near-identical performance to handcoded JAX (100.1% relative time).

The bottom panel presents performance results for importance sampling and Hamiltonian Monte Carlo (HMC) on the polynomial regression problem from the overview. Importance sampling exhibits parallel scaling with the number of particles: vectorized PPLs and tensor frameworks have near constant scaling while the GPU is not saturated. HMC is run iteratively, so scaling is linear in the length of the chain. GenJAX is consistently close to handcoded and optimized JAX, validating that our abstractions for programmable inference introduce minimal overhead.

Probabilistic Game of Life Inversion

Conway's Game of Life evolution rules — **Figure.** Deterministic evolution rules for Conway's *Game of Life*. In our case study, we add a uniform prior over board state and probabilistic Bernoulli noise on top of the deterministic rules to construct a *Game of Life* generative function. We can then condition the observed next state and construct an inference problem whose solutions correspond to approximate inversions of the Game of Life dynamics.

Game of Life inversion showcase — **Figure.** Vectorized Gibbs sampling in the *Game of Life*. Probabilistic Game of Life inversion on the wizard book cover ($1024 \times 1024$ grid). Top: (1) Previous state (unknown, the target of our inference process); (2) Observed future state (the target pattern); (3) Vectorized Gibbs chain showing states constructed by inference in a progression from t=0 to t=499; (4) One-step deterministic evolution of final inferred state, reconstruction accuracy (measured as discrepancy between bits) is around 90%. Bottom: Benchmark timings of single vectorized Gibbs sweep performance across board sizes, comparing CPU and GPU execution times. GPU execution timings demonstrate the benefit of parallel hardware acceleration for vectorized inference. Overall: the runtime takes about 2.8 seconds for 500 iterations on an RTX 4090 GPU, with about 93% reconstruction accuracy (70,109 bits out of 1,048,576 total bits).

Game of Life Gibbs sampling throughput — **Figure.** Vectorized Gibbs sampling in the *Game of Life*. Probabilistic Game of Life inversion on the wizard book cover ($1024 \times 1024$ grid). Top: (1) Previous state (unknown, the target of our inference process); (2) Observed future state (the target pattern); (3) Vectorized Gibbs chain showing states constructed by inference in a progression from t=0 to t=499; (4) One-step deterministic evolution of final inferred state, reconstruction accuracy (measured as discrepancy between bits) is around 90%. Bottom: Benchmark timings of single vectorized Gibbs sweep performance across board sizes, comparing CPU and GPU execution times. GPU execution timings demonstrate the benefit of parallel hardware acceleration for vectorized inference. Overall: the runtime takes about 2.8 seconds for 500 iterations on an RTX 4090 GPU, with about 93% reconstruction accuracy (70,109 bits out of 1,048,576 total bits).

Game of Life inversion is the problem of inverting the dynamics of Conway's Game of Life: given a final state, what is a possible previous state that evolves to the final state under the rules of the game? Brute force discrete search is computationally intractable, requiring evaluation of $2^{N \times N}$ states, where $N$ is the linear dimension of a square Game of Life board. In this case study, we introduce probabilistic noise into the dynamics: from an initial state, we evolve forward using the deterministic rules, but then sample with Bernoulli noise around the true value of each pixel. We illustrate approximate inversion using vectorized Gibbs sampling.

Because each cell's value is conditionally independent from non-neighboring cells' values, given its eight neighbors, we partition the board's cells into conditionally independent groups (given the other groups). Within each group, we can perform parallel Gibbs updates on all the cells, an example of chromatic Gibbs. The result is a highly efficient probabilistic inversion algorithm which can invert Life states with up to 90% accuracy in a few seconds.

Robot Localization

SMC methods comparison — **Figure.** Robot localization using programmable sequential Monte Carlo. (a) Problem setup showing robot trajectory through a multi-room environment with an 8-ray LIDAR sensor model for distance-based localization. (b) Comparison of three SMC variants: Bootstrap filter, SMC+HMC, and SMC+Locally Optimal, showing particle approximation evolution and execution time performance.

In robotics, simultaneous mapping and localization (SLAM) refers to the problem of constructing a representation of the map of an environment and the position of the robot within the map based on measurements. If the map is given, the problem is called localization: a robot maneuvers through a known space and receives measurements of distance to the walls. The goal is to construct a probabilistic representation of where the robot is located. In this case study, we use GenJAX to write a model for robot localization, with Gaussian drift dynamics and a simulated LIDAR measurement. Given a sequence of LIDAR measurements over time as observations, we constrain the model to produce a posterior over robot locations.

Bootstrap filter. Sequential Monte Carlo where the prior (from the model) is used as the proposal for the latent position of the robot.
SMC + HMC. Adds HMC moves to the bootstrap filter. These moves are applied to the particle collection after resampling.
SMC + Locally Optimal. Uses a smart proposal based on enumerative grids: enumerate a grid in position space, evaluate each position against the observation likelihood, select the maximum likelihood grid point, and sample from a normal distribution around that point.

SMC supports natural vectorization over particles. In our experiments, the best algorithm from the standpoint of efficiency and accuracy is locally optimal SMC, which adds another layer of vectorization within the custom proposal. The likelihood grid evaluations can be fully vectorized, and the model already features vectorization in the LIDAR measurement model. These opportunities for vectorization (in the model, in the locally optimal proposal, and across the particle collection) are convenient to program against with vmap and lead to a highly efficient inference algorithm which can accurately track the 2D robot's location within the map in milliseconds.

Conclusion #

This work presents GenJAX, a language and compiler for vectorized probabilistic programming with programmable inference. This system integrates vmap with programmable inference features: we extend vmap support to generative functions, including support for vectorization using vmap of probabilistic program traces, stochastic branching, and programmable inference interfaces. Benchmarks show this approach yields low overhead relative to hand-optimized JAX, and simultaneously delivers greater expressiveness and competitive performance with other probabilistic programming systems targeting modern accelerators.

Future Work

Vectorized inference diagnostics. By automating the vectorized implementation of nested models and inference algorithms, GenJAX makes it easy to experiment with parallel implementations of custom Monte Carlo estimators of a broad range of information-theoretic quantities derived from probabilistic programs, including KL divergence between inference algorithms and the conditional mutual information among subsets of latent variables. Although computationally intensive on CPUs, these estimators are comprised of nested, massively parallel computations, and may become more practical and widespread given suitable automation.
Spatial or geometric probabilistic programs. We expect that GenJAX's support for array programming and programmable probabilistic inference may be well-suited for spatial computing applications. Domains such as robotics, autonomous navigation, computational imaging, and scientific simulation increasingly require sophisticated probabilistic reasoning over high-dimensional spatial data — including LiDAR point clouds, depth images, and other spatial data types. Probabilistic programming applications in these domains naturally involve computations that manipulate multi-dimensional arrays. GenJAX's design is uniquely suited to support practitioners writing these types of probabilistic programs and provides useful vectorization automation and support for compilation to efficient GPU implementations.

Artifact & Data Availability #

The artifact associated with this paper is available on Zenodo. The source code is available at https://github.com/probcomp/genjax.

Citation #

@article{becker2026genjax,
  title     = {Probabilistic Programming with Vectorized Programmable Inference},
  author    = {Becker, McCoy R. and Huot, Mathieu and Matheos, George and
               Wang, Xiaoyan and Chung, Karen and Smith, Colin and
               Ritchie, Sam and Saurous, Rif A. and Lew, Alexander K. and
               Rinard, Martin C. and Mansinghka, Vikash K.},
  journal   = {Proceedings of the ACM on Programming Languages},
  volume    = {10},
  number    = {POPL},
  articleno = {87},
  year      = {2026},
  publisher = {ACM},
  doi       = {10.1145/3776729},
}