Introduction to FCVOpt

This notebook walks through the FCVOpt API for efficient hyperparameter optimization using fractional cross-validation. We tune a Random Forest classifier on a synthetic dataset to illustrate the core concepts and workflow.

What is FCVOpt?

FCVOpt addresses a fundamental tension in hyperparameter optimization: K-fold cross-validation is more reliable than a single train-test split, but fitting K models per configuration makes optimization prohibitively expensive.

The key insight is that CV folds are not independent—configurations that perform well on one fold tend to perform well on others. FCVOpt exploits this structure via a hierarchical Gaussian process (HGP) that jointly models performance across all folds. This allows the optimizer to evaluate just a single fold per configuration while still reasoning about full K-fold performance, yielding substantial speedups with little loss in quality.

In contrast, standard Bayesian optimization with K-fold CV requires all K folds to be evaluated at each candidate configuration before a decision can be made.

[1]:

# Import required libraries
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import zero_one_loss

from fcvopt.optimizers import FCVOpt
from fcvopt.crossvalidation import SklearnCVObj
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Integer, Float

Generating the Data

We generate a synthetic binary classification dataset with 2,000 samples and 50 features, of which only 10 are truly informative and 25 are linear combinations of those. A 10% label noise rate (flip_y=0.1) makes the task non-trivial.

[2]:

# Generate sample classification data
X, y = make_classification(
    n_samples=1500,
    n_features=50,
    n_informative=10,
    n_redundant=25,
    n_classes=2,
    flip_y=0.1,
    random_state=42
)

print(f"Shape of features matrix: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")

Shape of features matrix: (1500, 50)
Class distribution: [761 739]

The FCVOpt API

FCVOpt follows a simple three-step workflow:

1. Define a Cross-Validation Objective   ←  what to evaluate and how
         ↓
2. Define a Hyperparameter Search Space  ←  what to optimize over
         ↓
3. Run the Optimizer                     ←  find the best configuration

Each step is covered in detail below.

Step 1: Define the Cross-Validation Objective

The CV objective bundles together everything needed to evaluate a hyperparameter configuration:

Estimator — the model to tune (RandomForestClassifier)
Data — the features and labels (X, y)
Loss metric — the quantity to minimize (misclassification rate)
CV scheme — how to split the data (10-fold stratified CV)

For scikit-learn–compatible estimators, FCVOpt provides SklearnCVObj as a convenient wrapper. Under the hood, calling cv_obj.cvloss(params) fits and evaluates the model on each fold and returns the average loss—this is the function the optimizer will minimize.

[3]:

# Create CV objective for Random Forest
cv_obj = SklearnCVObj(
    estimator=RandomForestClassifier(random_state=42),
    X=X, y=y,
    loss_metric=zero_one_loss,  # Minimize misclassification rate
    task='classification',
    n_splits=10,
    rng_seed=42
)

print(f"Created CV objective with {cv_obj.cv.get_n_splits()} folds")

Created CV objective with 10 folds

Step 2: Define the Hyperparameter Search Space

The configuration space declares which hyperparameters to tune and their valid ranges. We use log-scale bounds for all parameters since their effects are roughly multiplicative—e.g., increasing the number of trees by 50 matters more at 50 than at 500.

Hyperparameter	Range	Scale	Description
`n_estimators`	[50, 1000]	Log	Number of trees in the forest
`max_depth`	[1, 15]	Log	Maximum depth of each tree
`max_features`	[0.01, 1.0]	Log	Fraction of features considered at each split
`min_samples_split`	[2, 200]	Log	Minimum samples required to split a node

FCVOpt’s ConfigurationSpace extends the standard ConfigSpace with utilities for Latin Hypercube sampling and conversion between named configurations and numeric arrays used by the GP model.

[4]:

# Define hyperparameter search space
config = ConfigurationSpace()
config.add([
    Integer('n_estimators', bounds=(50, 1000), log=True),
    Integer('max_depth', bounds=(1, 15), log=True),
    Float('max_features', bounds=(0.01, 1.0), log=True),
    Integer('min_samples_split', bounds=(2, 200), log=True)
])

print(config)

Configuration space object:
  Hyperparameters:
    max_depth, Type: UniformInteger, Range: [1, 15], Default: 4, on log-scale
    max_features, Type: UniformFloat, Range: [0.01, 1.0], Default: 0.1, on log-scale
    min_samples_split, Type: UniformInteger, Range: [2, 200], Default: 20, on log-scale
    n_estimators, Type: UniformInteger, Range: [50, 1000], Default: 224, on log-scale

Step 3: Initialize and Run the Optimizer

With the objective and search space defined, we can create an FCVOpt instance and run the optimization loop. The key constructor arguments are:

Argument	Description
`obj`	The callable loss function to minimize (`cv_obj.cvloss`)
`n_folds`	Total number of CV folds (must match the objective)
`config`	The hyperparameter search space
`acq_function`	Acquisition function: `'LCB'` (Lower Confidence Bound) or `'KG'` (Knowledge Gradient)
`tracking_dir`	Local directory for MLflow logs (see below)
`experiment`	Name for this optimization run in MLflow
`seed`	Random seed for reproducibility

Choosing an acquisition function: 'LCB' is fast and strikes a good balance between exploration and exploitation. 'KG' (Knowledge Gradient) often finds better configurations but is more computationally expensive per iteration.

Experiment Tracking with MLflow

MLflow is an open-source library for tracking machine learning experiments. FCVOpt uses it to automatically record everything that happens during optimization—so you can inspect, compare, and resume runs without any extra bookkeeping code.

At each iteration, FCVOpt logs to MLflow:

Metrics (indexed by iteration): incumbent observed loss (f_inc_obs), estimated loss from the GP (f_inc_est), GP fitting time, and acquisition optimization time
Artifacts: a per-iteration JSON snapshot with the candidate and incumbent configurations, and periodic checkpoints of the GP model weights (.pth files)
Parameters & tags: acquisition function, seed, batch size, and other run settings

There are two ways to tell FCVOpt where to write these logs:

Option	When to use	Example
`tracking_dir`	Local logging to a directory on disk	`tracking_dir='./hp_opt_runs/'`
`tracking_uri`	Remote MLflow server, or an explicit `file:` URI	`tracking_uri='http://localhost:5000'`

Only one of the two should be provided. If neither is given, logs are written to ./mlruns/ in the current directory.

Once a run is complete (or even mid-run), you can browse all logged data with the MLflow UI:

mlflow ui --backend-store-uri ./hp_opt_runs/

This opens a browser dashboard where you can plot metrics over iterations, compare different runs side by side, and download artifacts. You can also restore a previous optimizer state directly from a logged run using FCVOpt.restore_from_mlflow().

We run 50 trials below. Each trial selects a hyperparameter configuration via the acquisition function, evaluates it on a single held-out fold chosen

[5]:

# Initialize FCVOpt optimizer
optimizer = FCVOpt(
    obj=cv_obj.cvloss,
    n_folds=cv_obj.cv.get_n_splits(),
    config=config,
    acq_function='LCB',  # Lower Confidence Bound acquisition
    tracking_dir='./hpt_opt_runs/',  # MLflow tracking directory
    experiment='rf_tuning_example',
    seed=123
)

# run for 50 trials
best_conf = optimizer.optimize(n_trials=50)

# end run
optimizer.end_run()


Number of candidates evaluated.....: 50
Single-fold observed loss (best)...: 0.146667
Estimated full CV loss (best)......: 0.129033

 Best configuration at termination:
 Configuration(values={
  'max_depth': 15,
  'max_features': 0.3571846673984,
  'min_samples_split': 6,
  'n_estimators': 460,
})

Evaluating and Deploying the Best Configuration

After optimization, best_conf holds the best configuration found. The end-of-run summary prints two loss values:

Single-fold observed loss — the raw loss measured on whichever single held-out fold was evaluated for the best configuration. This is a biased estimate of true CV performance.
Estimated full CV loss — the HGP’s prediction of what the full K-fold CV loss would be. It becomes more accurate as more trials accumulate observations across folds.

To get a definitive, unbiased estimate of generalization performance, we call cv_obj(best_conf), which evaluates the configuration on all 10 folds and returns their average.

[6]:

# Evaluate best configuration
best_cv_mcr = cv_obj(best_conf)
print(f" 10-fold CV Misclassification Rate....:{best_cv_mcr:.6f}")

 10-fold CV Misclassification Rate....:0.124667

Train the final model

Finally, we retrain on the full dataset using the best hyperparameters. This final model is what you would deploy or use for inference.

[7]:

# get the model with best hyperparmeters found
best_model = cv_obj.construct_model(dict(best_conf))

# train the model on the data
_ = best_model.fit(X, y)