Single Level Adaptive Sampling

This tutorial will show you how to extend an initial experimental design for a Gaussian process (GP) emulator, using an adaptive sampling technique. The idea is to take our current GP and sequentially find new design points (or batches of points) that will have 'the greatest impact' on improving the fit of the GP.

This tutorial will show you how to:

Use the function compute_single_level_loo_samples to extend an initial experimental design with a new design point (or batch of new design points).
How to repeatedly add new design points using this method.

If you are unfamiliar with how to train a GP using the EXAUQ-Toolbox, you may want to first work through the tutorial, Training A Gaussian Process Emulator.

Note

The function compute_single_level_loo_samples implements the cross-validation-based adaptive sampling for GPs, as described in Mohammadi, H. et al. (2022) "Cross-Validation-based Adaptive Sampling for Gaussian process models". DOI: https://doi.org/10.1137/21M1404260.

Setup

We'll work with the same toy simulator function found in the tutorial, Training A Gaussian Process Emulator. This is the function $$ f(x_1, x_2) = x_2 + x_1^2 + x_2^2 - \sqrt{2} + \mathrm{sin}(2\pi x_1) + \mathrm{sin}(4\pi x_1 x_2) $$ with simulator domain defined as the rectanglular input space $\mathcal{D}$ consisting of points $(x_1, x_2)$ where $0 \leq x_1 \leq 1$ and $-0.5 \leq x_2 \leq 1.5$. We can express this in code as follows:

from exauq.core.modelling import SimulatorDomain, Input
import numpy as np

# The bounds define the lower and upper bounds on each coordinate
domain = SimulatorDomain(bounds=[(0, 1), (-0.5, 1.5)])

def sim_func(x: Input) -> float:
    return (
        x[1] + x[0]**2 + x[1]**2 - np.sqrt(2)
        + np.sin(2 * np.pi * x[0]) + np.sin(4 * np.pi * x[0] * x[1])
    )

Initial design

To perform adaptive sampling, we need to begin with a Gaussian process (GP) emulator trained with an initial design. We'll do this by using a Latin hypercube designer oneshot_lhs (with the aid of scipy) and using a GP with a Matern 5/2 kernel. The approach below is a condensed version of that found in the tutorial Training A Gaussian Process Emulator.

from exauq.core.designers import oneshot_lhs
from exauq.core.modelling import TrainingDatum
from exauq.core.emulators import MogpEmulator

# Create Latin hypercube sample, setting a seed to make the sampling repeatable.
lhs_inputs = oneshot_lhs(domain, 8, seed=1)

# Calculate simulator outputs, using our toy simulator function.
outputs = [sim_func(x) for x in lhs_inputs]

# Create the training data of input/output pairs.
data = [TrainingDatum(x, y) for x, y in zip(lhs_inputs, outputs)]

# Define a GP with a Matern 5/2 kernel and fit to the data.
gp = MogpEmulator(kernel="Matern52")
gp.fit(data)

Extend the design using leave-one-out adaptive sampling

Let's now find a new design point using the leave-one-out adaptive design methodology. The idea is to take our current GP and find a new design point (or batch of points) that will have 'the greatest impact' on improving the fit of the GP, when combined with the corresponding simulator output (or outputs in the batch case). We use the function compute_single_level_loo_samples to do this. This function requires two arguments:

The GP to find the new design point for.
The SimulatorDomain describing the domain on which the simulator is defined.

By default, a batch consisting of a single, new design point will be calculated:

from exauq.core.designers import compute_single_level_loo_samples

# Suppress warnings that arise from mogp_emulator
import warnings
warnings.filterwarnings("ignore")

new_design_pts = compute_single_level_loo_samples(gp, domain)
new_design_pts[0]

Input(np.float64(0.30008247810129557), np.float64(1.4999999763299239))

If instead we want to compute multiple new design points in one go, we can do this by specifying a different batch size:

new_design_pts = compute_single_level_loo_samples(gp, domain, batch_size=5)
new_design_pts

(Input(np.float64(0.30009243552526466), np.float64(1.4999999915116702)),
 Input(np.float64(0.9999999952136182), np.float64(0.6827122696205354)),
 Input(np.float64(0.9999999937092748), np.float64(-0.175846942366223)),
 Input(np.float64(5.0418493324766445e-08), np.float64(-0.0924045134669913)),
 Input(np.float64(0.5336808912907997), np.float64(0.6070657987941341)))

Note how the new design points all lie within the simulator domain we defined earlier, i.e. they all lie in the rectanglar input space $\mathcal{D}$.

It's worth pointing out that these design points are not equal to any of the training inputs for the GP:

training_inputs = [datum.input for datum in gp.training_data]
for x in new_design_pts:
    assert not any(x == x_train for x_train in training_inputs)

Update the GP

The final step is to update the fit of the GP using the newly-calculated design points. This first requires us to compute the simulator values at the design points (in our case, using the toy function defined earlier) in order to create new training data:

new_outputs = [sim_func(x) for x in new_design_pts]

new_outputs

[np.float64(2.790343974686608),
 np.float64(1.4829391189039718),
 np.float64(-1.361853917365586),
 np.float64(-1.498079223487694),
 np.float64(-1.1652631341361346)]

Then to update the GP, we create a list of TrainingDatum to pass into the update method.

training_data = [TrainingDatum(x, y) for x, y in zip(new_design_pts, new_outputs)]
gp.update(training_data)

# Sense-check that the updated GP now has the combined data
print("Number of training data:", len(gp.training_data))

Number of training data: 13

This completes one adaptive sampling 'iteration'. It's important to note that, when creating multiple new design points in a batch, the fit of the GP is not updated between the creation of each new point in the batch.

Repeated application

In general, we can perform multiple adaptive sampling iterations to further improve the fit of the GP with newly sampled design point(s). The following code goes through five more sampling iterations, producing a single new design point at each iteration.

for i in range(5):
    # Find new design point adaptively (via batch of length 1)
    x = compute_single_level_loo_samples(gp, domain)[0]

    # Compute simulator output at new design point
    y = sim_func(x)

    # Create TrainingDatum "list"
    training_data = [TrainingDatum(x, y)]

    # Update GP fit 
    gp.update(training_data)

    # Print design point found and level applied at
    print(f"==> Updated with new design point {x}")

==> Updated with new design point (np.float64(0.8212602129815404), np.float64(-0.4998740751551899))

==> Updated with new design point (np.float64(0.47680520971659945), np.float64(1.499992711020324))

==> Updated with new design point (np.float64(0.17661845387506958), np.float64(-0.4999871781310343))

==> Updated with new design point (np.float64(0.05372041955223805), np.float64(1.4668686638487973))

==> Updated with new design point (np.float64(0.8184861882847371), np.float64(1.3799447992037384))