Tuning TabularResNet Hyperparameters
This example demonstrates how to use FCVOpt to tune hyperparameters for TabularResNet, a deep learning architecture designed for tabular data (Gorishniy et al., 2021).
ResNetCVObj implements the full training loop in pure PyTorch—no additional training library is required. It includes early stopping, learning-rate scheduling, and gradient clipping out of the box.
Key features:
Uses
ResNetCVObj, which subclassesCVObjectiveand implementsfit_and_testwith a self-contained PyTorch training loopBoth architectural and optimization hyperparameters are tuned
Batch size is fixed at construction time, not tuned (see
batch_sizeargument)
[1]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler
# FCVOpt imports
from fcvopt.crossvalidation import ResNetCVObj
from fcvopt.optimizers import FCVOpt
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Integer, Float, Categorical
Understanding TabularResNet
TabularResNet (Gorishniy et al., 2021) is a deep learning architecture specifically designed for tabular data. It consists of:
Architecture Components:
Input Stem: A single fully-connected layer that projects input features to a hidden dimension
Residual Blocks: The core building blocks, each computing:
x_out = x_in + Dropout( Linear( Dropout( ReLU( Linear( Norm(x_in) ) ) ) ) )
Each block includes:
Normalization: Either BatchNorm or LayerNorm
Two Linear layers: First expands by
hidden_factor, second projects backReLU activation: Between the two linear layers
Dropout: Applied twice (hidden and residual paths)
Residual connection: Adds input to transformed output
Prediction Head: Norm → ReLU → Linear to output dimension
Generate Sample Data
We use the same binary classification dataset as in the LightGBM notebook.
[2]:
# Generate binary classification dataset with class imbalance (90% vs 10%)
# Using 2000 samples, 25 features (5 informative, 10 redundant)
X, y = make_classification(
n_samples= 2000,
n_features= 25,
n_informative= 5,
n_redundant=10,
n_classes= 2, n_clusters_per_class= 2,
weights=[0.9, 0.1], # imbalanced data,
random_state=23
)
print(f"Shape of features matrix: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")
Shape of features matrix: (2000, 25)
Class distribution: [1796 204]
Define Hyperparameter Search Space
TabularResNet has two categories of hyperparameters:
Architectural Hyperparameters:
Hyperparameter |
Range |
Scale |
Description |
|---|---|---|---|
|
[1, 4] |
log |
Number of residual blocks in the network |
|
[16, 128] |
log |
Width of hidden representations |
|
{batchnorm, layernorm} |
Type of normalization layer |
|
|
[1.0, 4.0] |
uniform |
Expansion factor inside residual blocks |
|
[0.0, 0.3] |
uniform |
Dropout rate inside residual blocks |
|
[0.0, 0.2] |
uniform |
Dropout rate on residual output |
Optimization Hyperparameters
Hyperparameter |
Range |
Scale |
Description |
|---|---|---|---|
|
[1e-4, 1e-2] |
log |
Learning rate for optimizer |
|
[1e-6, 1e-3] |
log |
L2 regularization strength |
Note: Batch size is fixed via the batch_size argument of ResNetCVObj and is generally not worth tuning. We use narrower ranges than the recommended defaults for faster optimization in this example.
[3]:
# Create configuration space for hyperparameter search
config = ConfigurationSpace()
# Architectural hyperparameters
config.add([
Integer('n_hidden', bounds=(1, 4), log=True), # Number of residual blocks
Integer('layer_size', bounds=(16, 128), log=True), # Hidden layer width
Categorical('normalization', items=['batchnorm', 'layernorm']),
Float('hidden_factor', bounds=(1.0, 4.0)), # Expansion factor in blocks
Float('hidden_dropout', bounds=(0.0, 0.3)), # Dropout inside blocks
Float('residual_dropout', bounds=(0.0, 0.2)), # Dropout on residual
])
# Optimization hyperparameters
config.add([
Float('lr', bounds=(1e-4, 1e-2), log=True),
Float('weight_decay', bounds=(1e-6, 1e-3), log=True),
])
print(config)
Configuration space object:
Hyperparameters:
hidden_dropout, Type: UniformFloat, Range: [0.0, 0.3], Default: 0.15
hidden_factor, Type: UniformFloat, Range: [1.0, 4.0], Default: 2.5
layer_size, Type: UniformInteger, Range: [16, 128], Default: 45, on log-scale
lr, Type: UniformFloat, Range: [0.0001, 0.01], Default: 0.001, on log-scale
n_hidden, Type: UniformInteger, Range: [1, 4], Default: 2, on log-scale
normalization, Type: Categorical, Choices: {batchnorm, layernorm}, Default: batchnorm
residual_dropout, Type: UniformFloat, Range: [0.0, 0.2], Default: 0.1
weight_decay, Type: UniformFloat, Range: [1e-06, 0.001], Default: 3.16227766e-05, on log-scale
Define Cross-Validation Objective
ResNetCVObj implements the full training loop internally. Each call to fit_and_test for a given fold will:
Hold out 10 % of the training data as an internal validation set
Train with mini-batch gradient descent (AdamW by default)
Apply early stopping (patience = 15 epochs) and restore the best checkpoint
Apply
ReduceLROnPlateaulearning-rate schedulingApply gradient norm clipping (limit = 5.0)
The input_preprocessor (here StandardScaler) is fitted on the training split of each fold and applied to both train and test to avoid leakage.
The needs_proba=True flag tells the objective to pass predicted probabilities (sigmoid output) to loss_metric rather than hard labels—required for AUC.
[4]:
# Define loss metric: minimize (1 - AUC)
def auc_loss(y_true, y_pred):
return 1 - roc_auc_score(y_true, y_pred)
# Create CV objective that wraps TabularResNet
cv_obj = ResNetCVObj(
X=X,
y=y,
task='binary_classification', # BCEWithLogitsLoss, single output logit
loss_metric=auc_loss,
needs_proba=True, # AUC requires predicted probabilities
n_splits=10,
stratified=True, # Preserve class distribution in folds
max_epochs=50, # Early stopping may halt training earlier
batch_size=256, # Fixed; not tuned
rng_seed=42,
input_preprocessor=StandardScaler(),
)
print(f"Created ResNetCVObj")
print(f"Number of CV folds: {cv_obj.cv.get_n_splits()}")
print(f"Training samples: {len(cv_obj.y)}")
Created ResNetCVObj
Number of CV folds: 10
Training samples: 2000
Run Hyperparameter Optimization with FCVOpt
[5]:
# Initialize FCVOpt optimizer
optimizer = FCVOpt(
obj=cv_obj, # CV objective (callable)
n_folds=cv_obj.cv.get_n_splits(), # Total number of folds
config=config, # Search space
acq_function='LCB',
tracking_dir='./hpt_opt_runs/',
experiment='resnet_tuning',
seed=123,
)
# Run optimization
# Note: neural network training makes each evaluation moderately expensive
print("Starting optimization...\n")
best_conf = optimizer.optimize(n_trials=50)
Starting optimization...
Number of candidates evaluated.....: 50
Single-fold observed loss (best)...: 0.0344444
Estimated full CV loss (best)......: 0.0567568
Best configuration at termination:
Configuration(values={
'hidden_dropout': 0.1811278859421,
'hidden_factor': 4.0,
'layer_size': 53,
'lr': 0.01,
'n_hidden': 4,
'normalization': 'batchnorm',
'residual_dropout': 0.0,
'weight_decay': 1e-06,
})
Evaluate Best Configuration
Evaluate the best hyperparameters found by FCVOpt.
[6]:
# Evaluate the best configuration found by FCVOpt on all 10 folds
best_cv_loss = cv_obj(best_conf)
best_cv_auc = 1 - best_cv_loss
print(f"10-fold CV Loss (1 - AUC): {best_cv_loss:.4f}")
print(f"10-fold CV ROC-AUC: {best_cv_auc:.4f}")
print(f"\nBest hyperparameters:")
for key, value in best_conf.items():
print(f" {key}: {value}")
10-fold CV Loss (1 - AUC): 0.0734
10-fold CV ROC-AUC: 0.9266
Best hyperparameters:
hidden_dropout: 0.1811278859421
hidden_factor: 4.0
layer_size: 53
lr: 0.01
n_hidden: 4
normalization: batchnorm
residual_dropout: 0.0
weight_decay: 1e-06
[ ]: