Gaussian Process Models
The models module contains Gaussian Process implementations used for surrogate modeling in Bayesian optimization.
Overview
FCVOpt uses Gaussian Processes to model the relationship between hyperparameters and cross-validation performance. Two main model types are available:
GPR: Standard single-task Gaussian Process regression
HGP: Hierarchical Gaussian Process for modeling fold-wise correlations
Key Concepts
Hierarchical Gaussian Processes (HGP): The core innovation in FCVOpt. Instead of treating each CV fold independently, HGP models the correlation structure between folds, enabling accurate prediction of performance on unevaluated folds.
Multi-task Learning: HGP treats each CV fold as a separate “task” and learns correlations between tasks, dramatically reducing the number of fold evaluations needed.
Kernel Functions: Both models support various kernel functions for modeling different types of relationships:
Matern kernels for smooth functions
Constant kernels for bias terms
Hamming kernels for categorical variables
Multi-task kernels for fold correlations
Usage in FCVOpt
These models are used internally by the optimizers and typically don’t need to be instantiated directly:
# HGP is used automatically when using FCVOpt
from fcvopt.optimizers import FCVOpt
optimizer = FCVOpt(cv_obj, config_space, acq='kg')
# Internally uses HGP to model fold correlations
# Standard GP is used with BayesOpt for full CV
from fcvopt.optimizers import BayesOpt
optimizer = BayesOpt(cv_obj, config_space, acq='lcb')
# Internally uses GPR for standard modeling
Advanced Usage
For custom implementations or research purposes, the models can be used directly:
from fcvopt.models import HGP
import torch
# Create hierarchical GP model
model = HGP(
train_X=train_configs, # Hyperparameter configurations
train_Y=train_scores, # CV scores for each fold
num_folds=5, # Number of CV folds
fold_indices=fold_ids # Which fold each observation belongs to
)
# Train the model
model.train()
# Make predictions on new configurations
posterior = model.posterior(test_configs)
mean = posterior.mean
variance = posterior.variance
API Reference
Hierarchical Gaussian Process
- class fcvopt.models.HGP(train_x, train_y, warp_input=False)[source]
Bases:
GPRHierarchical GP model for modeling the CV loss function
This model is a sum of a main GP f, that models the CV loss function, and a delta GP that models the deviation of the individual fold holdout losses from the CV loss.
The model is defined as
\[\begin{equation*} y_j(x) = f(x) + \delta_j(x) + \epsilon_j(x) \end{equation*}\]where \(\delta_j(x)\) is the deviation of the individual fold holdout losses from the CV loss, and \(\epsilon_j(x)\) is the observation noise.
- Parameters:
train_x (
Tensor) – training data with dimensions (N x (D + 1)) where the last column contains the fold indicestrain_y (
Tensor) – training targets with dimensions (N x 1)warp_input (
bool) – whether to apply input warping to the inputs. Default: False
- condition_on_observations(X, Y, **kwargs)[source]
Returns a new model for the true CV loss function \(f(\cdot)\) conditioned on new observations \(Y\) at the input locations \(X\).
This is used for fantasy modeling for the knowledge gradient and batch acquistion functions, where we want to obtain a new model with these observations added to the existing training data without modifying the original model.
- Parameters:
X (
Tensor) – input locations with dimensions (N x D), where N is the number of observations and D is the number of input dimensions.Y (
Tensor) – targets at the input locations with dimensions (N x 1)**kwargs – not used here
- Return type:
- forward(x, fold_idx)[source]
Forward pass of the model for \(y_{fold\_idx}(x)\) at hyperparameters x and fold index fold_idx
- Parameters:
x (
Tensor) – A tensor of input locations with dimensions (N x D)fold_idx (
Tensor) – A tensor of fold indices with dimensions (N x 1)
- Return type:
MultivariateNormal
- forward_f(x)[source]
Forward pass of the main GP f for \(f(x)\) at x
- Parameters:
x (
Tensor) – A tensor of input locations with dimensions (N x D)- Return type:
MultivariateNormal
- posterior(X, observation_noise=False, **kwargs)[source]
Returns the posterior distribution of the CV loss \(f(\cdot)\) at the input locations X
Note
This method returns the posterior distribution of the main GP f, not the posterior distribution of the individual fold holdout losses.
- Parameters:
X (
Tensor) – input locations with dimensions (N x D). Note that this should not include the fold indices.observation_noise (
bool) – whether to include the observation noise in the posterior. We recommend setting this to False. Default: Falsekwargs – not used
- Return type:
GPyTorchPosterior
Standard Gaussian Process Regression
- class fcvopt.models.GPR(train_x, train_y, warp_input=False, covar_kernel=None)[source]
Bases:
ExactGP,GPyTorchModel,FantasizeMixin- forward(x)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- predict(x, return_std=False)[source]
Returns the predicted mean and optionally the standard deviation of the model at the given input points, conditioned on the training data.
- Parameters:
x (
Tensor) – Input points of shape (N x D)return_std (
bool) – If True, also returns the standard deviation. Default is False.
- Return type:
Union[Tensor,Tuple[Tensor,Tensor]]