Gaussian Process Models

The models module contains Gaussian Process implementations used for surrogate modeling in Bayesian optimization.

Overview

FCVOpt uses Gaussian Processes to model the relationship between hyperparameters and cross-validation performance. Two main model types are available:

GPR: Standard single-task Gaussian Process regression
HGP: Hierarchical Gaussian Process for modeling fold-wise correlations

Key Concepts

Hierarchical Gaussian Processes (HGP): The core innovation in FCVOpt. Instead of treating each CV fold independently, HGP models the correlation structure between folds, enabling accurate prediction of performance on unevaluated folds.

Multi-task Learning: HGP treats each CV fold as a separate “task” and learns correlations between tasks, dramatically reducing the number of fold evaluations needed.

Kernel Functions: Both models support various kernel functions for modeling different types of relationships:

Matern kernels for smooth functions
Constant kernels for bias terms
Hamming kernels for categorical variables
Multi-task kernels for fold correlations

Usage in FCVOpt

These models are used internally by the optimizers and typically don’t need to be instantiated directly:

# HGP is used automatically when using FCVOpt
from fcvopt.optimizers import FCVOpt

optimizer = FCVOpt(cv_obj, config_space, acq='kg')
# Internally uses HGP to model fold correlations

# Standard GP is used with BayesOpt for full CV
from fcvopt.optimizers import BayesOpt

optimizer = BayesOpt(cv_obj, config_space, acq='lcb')
# Internally uses GPR for standard modeling

Advanced Usage

For custom implementations or research purposes, the models can be used directly:

from fcvopt.models import HGP
import torch

# Create hierarchical GP model
model = HGP(
    train_X=train_configs,     # Hyperparameter configurations
    train_Y=train_scores,      # CV scores for each fold
    num_folds=5,               # Number of CV folds
    fold_indices=fold_ids      # Which fold each observation belongs to
)

# Train the model
model.train()

# Make predictions on new configurations
posterior = model.posterior(test_configs)
mean = posterior.mean
variance = posterior.variance

API Reference

Hierarchical Gaussian Process

class fcvopt.models.HGP(train_x, train_y, warp_input=False)[source]

Bases: GPR

Hierarchical GP model for modeling the CV loss function

This model is a sum of a main GP f, that models the CV loss function, and a delta GP that models the deviation of the individual fold holdout losses from the CV loss.

The model is defined as

\[\begin{equation*} y_j(x) = f(x) + \delta_j(x) + \epsilon_j(x) \end{equation*}\]

where \(\delta_j(x)\) is the deviation of the individual fold holdout losses from the CV loss, and \(\epsilon_j(x)\) is the observation noise.

Parameters:

train_x (Tensor) – training data with dimensions (N x (D + 1)) where the last column contains the fold indices
train_y (Tensor) – training targets with dimensions (N x 1)
warp_input (bool) – whether to apply input warping to the inputs. Default: False

condition_on_observations(X, Y, **kwargs)[source]

Returns a new model for the true CV loss function \(f(\cdot)\) conditioned on new observations \(Y\) at the input locations \(X\).

This is used for fantasy modeling for the knowledge gradient and batch acquistion functions, where we want to obtain a new model with these observations added to the existing training data without modifying the original model.

Parameters:

X (Tensor) – input locations with dimensions (N x D), where N is the number of observations and D is the number of input dimensions.
Y (Tensor) – targets at the input locations with dimensions (N x 1)
**kwargs – not used here

Return type:

GPR

forward(x, fold_idx)[source]

Forward pass of the model for \(y_{fold\_idx}(x)\) at hyperparameters x and fold index fold_idx

Parameters:

x (Tensor) – A tensor of input locations with dimensions (N x D)
fold_idx (Tensor) – A tensor of fold indices with dimensions (N x 1)

Return type:

MultivariateNormal

forward_f(x)[source]

Forward pass of the main GP f for \(f(x)\) at x

Parameters:: x (Tensor) – A tensor of input locations with dimensions (N x D)
Return type:: MultivariateNormal

posterior(X, observation_noise=False, **kwargs)[source]

Returns the posterior distribution of the CV loss \(f(\cdot)\) at the input locations X

Note

This method returns the posterior distribution of the main GP f, not the posterior distribution of the individual fold holdout losses.

Parameters:

X (Tensor) – input locations with dimensions (N x D). Note that this should not include the fold indices.
observation_noise (bool) – whether to include the observation noise in the posterior. We recommend setting this to False. Default: False
kwargs – not used

Return type:

GPyTorchPosterior

Standard Gaussian Process Regression

class fcvopt.models.GPR(train_x, train_y, warp_input=False, covar_kernel=None)[source]

Bases: ExactGP, GPyTorchModel, FantasizeMixin

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

predict(x, return_std=False)[source]

Returns the predicted mean and optionally the standard deviation of the model at the given input points, conditioned on the training data.

Parameters:

x (Tensor) – Input points of shape (N x D)
return_std (bool) – If True, also returns the standard deviation. Default is False.

Return type:

Union[Tensor, Tuple[Tensor, Tensor]]

reset_parameters()[source]