Tuning TabularResNet Hyperparameters

This example demonstrates how to use FCVOpt to tune hyperparameters for TabularResNet, a deep learning architecture designed for tabular data (Gorishniy et al., 2021).

ResNetCVObj implements the full training loop in pure PyTorch—no additional training library is required. It includes early stopping, learning-rate scheduling, and gradient clipping out of the box.

Key features:

  • Uses ResNetCVObj, which subclasses CVObjective and implements fit_and_test with a self-contained PyTorch training loop

  • Both architectural and optimization hyperparameters are tuned

  • Batch size is fixed at construction time, not tuned (see batch_size argument)

[1]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler

# FCVOpt imports
from fcvopt.crossvalidation import ResNetCVObj
from fcvopt.optimizers import FCVOpt
from fcvopt.configspace import ConfigurationSpace
from ConfigSpace import Integer, Float, Categorical

Understanding TabularResNet

TabularResNet (Gorishniy et al., 2021) is a deep learning architecture specifically designed for tabular data. It consists of:

Architecture Components:

  1. Input Stem: A single fully-connected layer that projects input features to a hidden dimension

  2. Residual Blocks: The core building blocks, each computing:

    x_out = x_in + Dropout( Linear( Dropout( ReLU( Linear( Norm(x_in) ) ) ) ) )
    

    Each block includes:

    • Normalization: Either BatchNorm or LayerNorm

    • Two Linear layers: First expands by hidden_factor, second projects back

    • ReLU activation: Between the two linear layers

    • Dropout: Applied twice (hidden and residual paths)

    • Residual connection: Adds input to transformed output

  3. Prediction Head: Norm → ReLU → Linear to output dimension

Generate Sample Data

We use the same binary classification dataset as in the LightGBM notebook.

[2]:
# Generate binary classification dataset with class imbalance (90% vs 10%)
# Using 2000 samples, 25 features (5 informative, 10 redundant)
X, y = make_classification(
    n_samples= 2000,
    n_features= 25,
    n_informative= 5,
    n_redundant=10,
    n_classes= 2, n_clusters_per_class= 2,
    weights=[0.9, 0.1], # imbalanced data,
    random_state=23
)

print(f"Shape of features matrix: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")
Shape of features matrix: (2000, 25)
Class distribution: [1796  204]

Define Hyperparameter Search Space

TabularResNet has two categories of hyperparameters:

Architectural Hyperparameters:

Hyperparameter

Range

Scale

Description

n_hidden

[1, 4]

log

Number of residual blocks in the network

layer_size

[16, 128]

log

Width of hidden representations

normalization

{batchnorm, layernorm}

Type of normalization layer

hidden_factor

[1.0, 4.0]

uniform

Expansion factor inside residual blocks

hidden_dropout

[0.0, 0.3]

uniform

Dropout rate inside residual blocks

residual_dropout

[0.0, 0.2]

uniform

Dropout rate on residual output

Optimization Hyperparameters

Hyperparameter

Range

Scale

Description

lr

[1e-4, 1e-2]

log

Learning rate for optimizer

weight_decay

[1e-6, 1e-3]

log

L2 regularization strength

Note: Batch size is fixed via the batch_size argument of ResNetCVObj and is generally not worth tuning. We use narrower ranges than the recommended defaults for faster optimization in this example.

[3]:
# Create configuration space for hyperparameter search
config = ConfigurationSpace()

# Architectural hyperparameters
config.add([
    Integer('n_hidden', bounds=(1, 4), log=True),        # Number of residual blocks
    Integer('layer_size', bounds=(16, 128), log=True),   # Hidden layer width
    Categorical('normalization', items=['batchnorm', 'layernorm']),
    Float('hidden_factor', bounds=(1.0, 4.0)),           # Expansion factor in blocks
    Float('hidden_dropout', bounds=(0.0, 0.3)),          # Dropout inside blocks
    Float('residual_dropout', bounds=(0.0, 0.2)),        # Dropout on residual
])

# Optimization hyperparameters
config.add([
    Float('lr', bounds=(1e-4, 1e-2), log=True),
    Float('weight_decay', bounds=(1e-6, 1e-3), log=True),
])

print(config)
Configuration space object:
  Hyperparameters:
    hidden_dropout, Type: UniformFloat, Range: [0.0, 0.3], Default: 0.15
    hidden_factor, Type: UniformFloat, Range: [1.0, 4.0], Default: 2.5
    layer_size, Type: UniformInteger, Range: [16, 128], Default: 45, on log-scale
    lr, Type: UniformFloat, Range: [0.0001, 0.01], Default: 0.001, on log-scale
    n_hidden, Type: UniformInteger, Range: [1, 4], Default: 2, on log-scale
    normalization, Type: Categorical, Choices: {batchnorm, layernorm}, Default: batchnorm
    residual_dropout, Type: UniformFloat, Range: [0.0, 0.2], Default: 0.1
    weight_decay, Type: UniformFloat, Range: [1e-06, 0.001], Default: 3.16227766e-05, on log-scale

Define Cross-Validation Objective

ResNetCVObj implements the full training loop internally. Each call to fit_and_test for a given fold will:

  • Hold out 10 % of the training data as an internal validation set

  • Train with mini-batch gradient descent (AdamW by default)

  • Apply early stopping (patience = 15 epochs) and restore the best checkpoint

  • Apply ReduceLROnPlateau learning-rate scheduling

  • Apply gradient norm clipping (limit = 5.0)

The input_preprocessor (here StandardScaler) is fitted on the training split of each fold and applied to both train and test to avoid leakage.

The needs_proba=True flag tells the objective to pass predicted probabilities (sigmoid output) to loss_metric rather than hard labels—required for AUC.

[4]:
# Define loss metric: minimize (1 - AUC)
def auc_loss(y_true, y_pred):
    return 1 - roc_auc_score(y_true, y_pred)

# Create CV objective that wraps TabularResNet
cv_obj = ResNetCVObj(
    X=X,
    y=y,
    task='binary_classification',  # BCEWithLogitsLoss, single output logit
    loss_metric=auc_loss,
    needs_proba=True,              # AUC requires predicted probabilities
    n_splits=10,
    stratified=True,               # Preserve class distribution in folds
    max_epochs=50,                 # Early stopping may halt training earlier
    batch_size=256,                # Fixed; not tuned
    rng_seed=42,
    input_preprocessor=StandardScaler(),
)

print(f"Created ResNetCVObj")
print(f"Number of CV folds: {cv_obj.cv.get_n_splits()}")
print(f"Training samples: {len(cv_obj.y)}")
Created ResNetCVObj
Number of CV folds: 10
Training samples: 2000

Run Hyperparameter Optimization with FCVOpt

[5]:
# Initialize FCVOpt optimizer
optimizer = FCVOpt(
    obj=cv_obj,                         # CV objective (callable)
    n_folds=cv_obj.cv.get_n_splits(),   # Total number of folds
    config=config,                      # Search space
    acq_function='LCB',
    tracking_dir='./hpt_opt_runs/',
    experiment='resnet_tuning',
    seed=123,
)

# Run optimization
# Note: neural network training makes each evaluation moderately expensive
print("Starting optimization...\n")
best_conf = optimizer.optimize(n_trials=50)
Starting optimization...


Number of candidates evaluated.....: 50
Single-fold observed loss (best)...: 0.0344444
Estimated full CV loss (best)......: 0.0567568

 Best configuration at termination:
 Configuration(values={
  'hidden_dropout': 0.1811278859421,
  'hidden_factor': 4.0,
  'layer_size': 53,
  'lr': 0.01,
  'n_hidden': 4,
  'normalization': 'batchnorm',
  'residual_dropout': 0.0,
  'weight_decay': 1e-06,
})

Evaluate Best Configuration

Evaluate the best hyperparameters found by FCVOpt.

[6]:
# Evaluate the best configuration found by FCVOpt on all 10 folds
best_cv_loss = cv_obj(best_conf)
best_cv_auc = 1 - best_cv_loss

print(f"10-fold CV Loss (1 - AUC): {best_cv_loss:.4f}")
print(f"10-fold CV ROC-AUC:        {best_cv_auc:.4f}")
print(f"\nBest hyperparameters:")
for key, value in best_conf.items():
    print(f"  {key}: {value}")
10-fold CV Loss (1 - AUC): 0.0734
10-fold CV ROC-AUC:        0.9266

Best hyperparameters:
  hidden_dropout: 0.1811278859421
  hidden_factor: 4.0
  layer_size: 53
  lr: 0.01
  n_hidden: 4
  normalization: batchnorm
  residual_dropout: 0.0
  weight_decay: 1e-06
[ ]: