Introduction to FCVOpt
This notebook walks through the FCVOpt API for efficient hyperparameter optimization using fractional cross-validation. We tune a Random Forest classifier on a synthetic dataset to illustrate the core concepts and workflow.
What is FCVOpt?
FCVOpt addresses a fundamental tension in hyperparameter optimization: K-fold cross-validation is more reliable than a single train-test split, but fitting K models per configuration makes optimization prohibitively expensive.
The key insight is that CV folds are not independent—configurations that perform well on one fold tend to perform well on others. FCVOpt exploits this structure via a hierarchical Gaussian process (HGP) that jointly models performance across all folds. This allows the optimizer to evaluate just a single fold per configuration while still reasoning about full K-fold performance, yielding substantial speedups with little loss in quality.
In contrast, standard Bayesian optimization with K-fold CV requires all K folds to be evaluated at each candidate configuration before a decision can be made.
[1]:
# Import required libraries
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import zero_one_loss
from fcvopt.optimizers import FCVOpt
from fcvopt.crossvalidation import SklearnCVObj
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Integer, Float
Generating the Data
We generate a synthetic binary classification dataset with 2,000 samples and 50 features, of which only 10 are truly informative and 25 are linear combinations of those. A 10% label noise rate (flip_y=0.1) makes the task non-trivial.
[2]:
# Generate sample classification data
X, y = make_classification(
n_samples=1500,
n_features=50,
n_informative=10,
n_redundant=25,
n_classes=2,
flip_y=0.1,
random_state=42
)
print(f"Shape of features matrix: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")
Shape of features matrix: (1500, 50)
Class distribution: [761 739]
The FCVOpt API
FCVOpt follows a simple three-step workflow:
1. Define a Cross-Validation Objective ← what to evaluate and how
↓
2. Define a Hyperparameter Search Space ← what to optimize over
↓
3. Run the Optimizer ← find the best configuration
Each step is covered in detail below.
Step 1: Define the Cross-Validation Objective
The CV objective bundles together everything needed to evaluate a hyperparameter configuration:
Estimator — the model to tune (
RandomForestClassifier)Data — the features and labels (
X,y)Loss metric — the quantity to minimize (misclassification rate)
CV scheme — how to split the data (10-fold stratified CV)
For scikit-learn–compatible estimators, FCVOpt provides SklearnCVObj as a convenient wrapper. Under the hood, calling cv_obj.cvloss(params) fits and evaluates the model on each fold and returns the average loss—this is the function the optimizer will minimize.
[3]:
# Create CV objective for Random Forest
cv_obj = SklearnCVObj(
estimator=RandomForestClassifier(random_state=42),
X=X, y=y,
loss_metric=zero_one_loss, # Minimize misclassification rate
task='classification',
n_splits=10,
rng_seed=42
)
print(f"Created CV objective with {cv_obj.cv.get_n_splits()} folds")
Created CV objective with 10 folds
Step 2: Define the Hyperparameter Search Space
The configuration space declares which hyperparameters to tune and their valid ranges. We use log-scale bounds for all parameters since their effects are roughly multiplicative—e.g., increasing the number of trees by 50 matters more at 50 than at 500.
Hyperparameter |
Range |
Scale |
Description |
|---|---|---|---|
|
[50, 1000] |
Log |
Number of trees in the forest |
|
[1, 15] |
Log |
Maximum depth of each tree |
|
[0.01, 1.0] |
Log |
Fraction of features considered at each split |
|
[2, 200] |
Log |
Minimum samples required to split a node |
FCVOpt’s ConfigurationSpace extends the standard ConfigSpace with utilities for Latin Hypercube sampling and conversion between named configurations and numeric arrays used by the GP model.
[4]:
# Define hyperparameter search space
config = ConfigurationSpace()
config.add([
Integer('n_estimators', bounds=(50, 1000), log=True),
Integer('max_depth', bounds=(1, 15), log=True),
Float('max_features', bounds=(0.01, 1.0), log=True),
Integer('min_samples_split', bounds=(2, 200), log=True)
])
print(config)
Configuration space object:
Hyperparameters:
max_depth, Type: UniformInteger, Range: [1, 15], Default: 4, on log-scale
max_features, Type: UniformFloat, Range: [0.01, 1.0], Default: 0.1, on log-scale
min_samples_split, Type: UniformInteger, Range: [2, 200], Default: 20, on log-scale
n_estimators, Type: UniformInteger, Range: [50, 1000], Default: 224, on log-scale
Step 3: Initialize and Run the Optimizer
With the objective and search space defined, we can create an FCVOpt instance and run the optimization loop. The key constructor arguments are:
Argument |
Description |
|---|---|
|
The callable loss function to minimize ( |
|
Total number of CV folds (must match the objective) |
|
The hyperparameter search space |
|
Acquisition function: |
|
Local directory for MLflow logs (see below) |
|
Name for this optimization run in MLflow |
|
Random seed for reproducibility |
Choosing an acquisition function: 'LCB' is fast and strikes a good balance between exploration and exploitation. 'KG' (Knowledge Gradient) often finds better configurations but is more computationally expensive per iteration.
Experiment Tracking with MLflow
MLflow is an open-source library for tracking machine learning experiments. FCVOpt uses it to automatically record everything that happens during optimization—so you can inspect, compare, and resume runs without any extra bookkeeping code.
At each iteration, FCVOpt logs to MLflow:
Metrics (indexed by iteration): incumbent observed loss (
f_inc_obs), estimated loss from the GP (f_inc_est), GP fitting time, and acquisition optimization timeArtifacts: a per-iteration JSON snapshot with the candidate and incumbent configurations, and periodic checkpoints of the GP model weights (
.pthfiles)Parameters & tags: acquisition function, seed, batch size, and other run settings
There are two ways to tell FCVOpt where to write these logs:
Option |
When to use |
Example |
|---|---|---|
|
Local logging to a directory on disk |
|
|
Remote MLflow server, or an explicit |
|
Only one of the two should be provided. If neither is given, logs are written to ./mlruns/ in the current directory.
Once a run is complete (or even mid-run), you can browse all logged data with the MLflow UI:
mlflow ui --backend-store-uri ./hp_opt_runs/
This opens a browser dashboard where you can plot metrics over iterations, compare different runs side by side, and download artifacts. You can also restore a previous optimizer state directly from a logged run using FCVOpt.restore_from_mlflow().
We run 50 trials below. Each trial selects a hyperparameter configuration via the acquisition function, evaluates it on a single held-out fold chosen
[5]:
# Initialize FCVOpt optimizer
optimizer = FCVOpt(
obj=cv_obj.cvloss,
n_folds=cv_obj.cv.get_n_splits(),
config=config,
acq_function='LCB', # Lower Confidence Bound acquisition
tracking_dir='./hpt_opt_runs/', # MLflow tracking directory
experiment='rf_tuning_example',
seed=123
)
# run for 50 trials
best_conf = optimizer.optimize(n_trials=50)
# end run
optimizer.end_run()
Number of candidates evaluated.....: 50
Single-fold observed loss (best)...: 0.146667
Estimated full CV loss (best)......: 0.129033
Best configuration at termination:
Configuration(values={
'max_depth': 15,
'max_features': 0.3571846673984,
'min_samples_split': 6,
'n_estimators': 460,
})
Evaluating and Deploying the Best Configuration
After optimization, best_conf holds the best configuration found. The end-of-run summary prints two loss values:
Single-fold observed loss — the raw loss measured on whichever single held-out fold was evaluated for the best configuration. This is a biased estimate of true CV performance.
Estimated full CV loss — the HGP’s prediction of what the full K-fold CV loss would be. It becomes more accurate as more trials accumulate observations across folds.
To get a definitive, unbiased estimate of generalization performance, we call cv_obj(best_conf), which evaluates the configuration on all 10 folds and returns their average.
[6]:
# Evaluate best configuration
best_cv_mcr = cv_obj(best_conf)
print(f" 10-fold CV Misclassification Rate....:{best_cv_mcr:.6f}")
10-fold CV Misclassification Rate....:0.124667
Train the final model
Finally, we retrain on the full dataset using the best hyperparameters. This final model is what you would deploy or use for inference.
[7]:
# get the model with best hyperparmeters found
best_model = cv_obj.construct_model(dict(best_conf))
# train the model on the data
_ = best_model.fit(X, y)