{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tuning TabularResNet Hyperparameters\n", "\n", "This example demonstrates how to use FCVOpt to tune hyperparameters for TabularResNet, a deep learning architecture designed for tabular data (Gorishniy et al., 2021).\n", "\n", "`ResNetCVObj` implements the full training loop in pure PyTorch—no additional training library is required. It includes early stopping, learning-rate scheduling, and gradient clipping out of the box.\n", "\n", "Key features:\n", "- Uses `ResNetCVObj`, which subclasses `CVObjective` and implements `fit_and_test` with a self-contained PyTorch training loop\n", "- Both architectural and optimization hyperparameters are tuned\n", "- Batch size is fixed at construction time, not tuned (see `batch_size` argument)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from sklearn.datasets import make_classification\n", "from sklearn.metrics import roc_auc_score\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "# FCVOpt imports\n", "from fcvopt.crossvalidation import ResNetCVObj\n", "from fcvopt.optimizers import FCVOpt\n", "from fcvopt.configspace import ConfigurationSpace\n", "from ConfigSpace import Integer, Float, Categorical" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understanding TabularResNet\n", "\n", "TabularResNet (Gorishniy et al., 2021) is a deep learning architecture specifically designed for tabular data. It consists of:\n", "\n", "### Architecture Components:\n", "\n", "1. **Input Stem**: A single fully-connected layer that projects input features to a hidden dimension\n", " \n", "2. **Residual Blocks**: The core building blocks, each computing:\n", " ```\n", " x_out = x_in + Dropout( Linear( Dropout( ReLU( Linear( Norm(x_in) ) ) ) ) )\n", " ```\n", " \n", " Each block includes:\n", " - **Normalization**: Either BatchNorm or LayerNorm\n", " - **Two Linear layers**: First expands by `hidden_factor`, second projects back\n", " - **ReLU activation**: Between the two linear layers\n", " - **Dropout**: Applied twice (hidden and residual paths)\n", " - **Residual connection**: Adds input to transformed output\n", "\n", "3. **Prediction Head**: Norm → ReLU → Linear to output dimension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Sample Data\n", "\n", "We use the same binary classification dataset as in the LightGBM notebook. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape of features matrix: (2000, 25)\n", "Class distribution: [1796 204]\n" ] } ], "source": [ "# Generate binary classification dataset with class imbalance (90% vs 10%)\n", "# Using 2000 samples, 25 features (5 informative, 10 redundant)\n", "X, y = make_classification(\n", " n_samples= 2000,\n", " n_features= 25,\n", " n_informative= 5,\n", " n_redundant=10,\n", " n_classes= 2, n_clusters_per_class= 2,\n", " weights=[0.9, 0.1], # imbalanced data,\n", " random_state=23\n", ")\n", "\n", "print(f\"Shape of features matrix: {X.shape}\")\n", "print(f\"Class distribution: {np.bincount(y)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define Hyperparameter Search Space\n", "\n", "TabularResNet has two categories of hyperparameters:\n", "\n", "**Architectural Hyperparameters**:\n", "\n", "| Hyperparameter | Range | Scale | Description |\n", "|----------------|-------|-------|-------------|\n", "| `n_hidden` | [1, 4] | log | Number of residual blocks in the network |\n", "| `layer_size` | [16, 128] | log | Width of hidden representations |\n", "| `normalization` | {batchnorm, layernorm} | - | Type of normalization layer |\n", "| `hidden_factor` | [1.0, 4.0] | uniform | Expansion factor inside residual blocks |\n", "| `hidden_dropout` | [0.0, 0.3] | uniform | Dropout rate inside residual blocks |\n", "| `residual_dropout` | [0.0, 0.2] | uniform | Dropout rate on residual output |\n", "\n", "**Optimization Hyperparameters**\n", "\n", "| Hyperparameter | Range | Scale | Description |\n", "|----------------|-------|-------|-------------|\n", "| `lr` | [1e-4, 1e-2] | log | Learning rate for optimizer |\n", "| `weight_decay` | [1e-6, 1e-3] | log | L2 regularization strength |\n", "\n", "Note: Batch size is fixed via the `batch_size` argument of `ResNetCVObj` and is generally not worth tuning. We use narrower ranges than the recommended defaults for faster optimization in this example." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Configuration space object:\n", " Hyperparameters:\n", " hidden_dropout, Type: UniformFloat, Range: [0.0, 0.3], Default: 0.15\n", " hidden_factor, Type: UniformFloat, Range: [1.0, 4.0], Default: 2.5\n", " layer_size, Type: UniformInteger, Range: [16, 128], Default: 45, on log-scale\n", " lr, Type: UniformFloat, Range: [0.0001, 0.01], Default: 0.001, on log-scale\n", " n_hidden, Type: UniformInteger, Range: [1, 4], Default: 2, on log-scale\n", " normalization, Type: Categorical, Choices: {batchnorm, layernorm}, Default: batchnorm\n", " residual_dropout, Type: UniformFloat, Range: [0.0, 0.2], Default: 0.1\n", " weight_decay, Type: UniformFloat, Range: [1e-06, 0.001], Default: 3.16227766e-05, on log-scale\n", "\n" ] } ], "source": [ "# Create configuration space for hyperparameter search\n", "config = ConfigurationSpace()\n", "\n", "# Architectural hyperparameters\n", "config.add([\n", " Integer('n_hidden', bounds=(1, 4), log=True), # Number of residual blocks\n", " Integer('layer_size', bounds=(16, 128), log=True), # Hidden layer width\n", " Categorical('normalization', items=['batchnorm', 'layernorm']),\n", " Float('hidden_factor', bounds=(1.0, 4.0)), # Expansion factor in blocks\n", " Float('hidden_dropout', bounds=(0.0, 0.3)), # Dropout inside blocks\n", " Float('residual_dropout', bounds=(0.0, 0.2)), # Dropout on residual\n", "])\n", "\n", "# Optimization hyperparameters\n", "config.add([\n", " Float('lr', bounds=(1e-4, 1e-2), log=True),\n", " Float('weight_decay', bounds=(1e-6, 1e-3), log=True),\n", "])\n", "\n", "print(config)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define Cross-Validation Objective\n", "\n", "`ResNetCVObj` implements the full training loop internally. Each call to `fit_and_test` for a given fold will:\n", "\n", "- Hold out 10 % of the training data as an internal validation set\n", "- Train with mini-batch gradient descent (AdamW by default)\n", "- Apply early stopping (patience = 15 epochs) and restore the best checkpoint\n", "- Apply `ReduceLROnPlateau` learning-rate scheduling\n", "- Apply gradient norm clipping (limit = 5.0)\n", "\n", "The `input_preprocessor` (here `StandardScaler`) is fitted on the training split of each fold and applied to both train and test to avoid leakage.\n", "\n", "The `needs_proba=True` flag tells the objective to pass predicted probabilities (sigmoid output) to `loss_metric` rather than hard labels—required for AUC." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created ResNetCVObj\n", "Number of CV folds: 10\n", "Training samples: 2000\n" ] } ], "source": [ "# Define loss metric: minimize (1 - AUC)\n", "def auc_loss(y_true, y_pred):\n", " return 1 - roc_auc_score(y_true, y_pred)\n", "\n", "# Create CV objective that wraps TabularResNet\n", "cv_obj = ResNetCVObj(\n", " X=X,\n", " y=y,\n", " task='binary_classification', # BCEWithLogitsLoss, single output logit\n", " loss_metric=auc_loss,\n", " needs_proba=True, # AUC requires predicted probabilities\n", " n_splits=10,\n", " stratified=True, # Preserve class distribution in folds\n", " max_epochs=50, # Early stopping may halt training earlier\n", " batch_size=256, # Fixed; not tuned\n", " rng_seed=42,\n", " input_preprocessor=StandardScaler(),\n", ")\n", "\n", "print(f\"Created ResNetCVObj\")\n", "print(f\"Number of CV folds: {cv_obj.cv.get_n_splits()}\")\n", "print(f\"Training samples: {len(cv_obj.y)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Hyperparameter Optimization with FCVOpt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting optimization...\n", "\n", "\n", "Number of candidates evaluated.....: 50\n", "Single-fold observed loss (best)...: 0.0344444\n", "Estimated full CV loss (best)......: 0.0567568\n", "\n", " Best configuration at termination:\n", " Configuration(values={\n", " 'hidden_dropout': 0.1811278859421,\n", " 'hidden_factor': 4.0,\n", " 'layer_size': 53,\n", " 'lr': 0.01,\n", " 'n_hidden': 4,\n", " 'normalization': 'batchnorm',\n", " 'residual_dropout': 0.0,\n", " 'weight_decay': 1e-06,\n", "})\n" ] } ], "source": [ "# Initialize FCVOpt optimizer\n", "optimizer = FCVOpt(\n", " obj=cv_obj, # CV objective (callable)\n", " n_folds=cv_obj.cv.get_n_splits(), # Total number of folds\n", " config=config, # Search space\n", " acq_function='LCB',\n", " tracking_dir='./hpt_opt_runs/',\n", " experiment='resnet_tuning',\n", " seed=123,\n", ")\n", "\n", "# Run optimization\n", "# Note: neural network training makes each evaluation moderately expensive\n", "print(\"Starting optimization...\\n\")\n", "best_conf = optimizer.optimize(n_trials=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Best Configuration\n", "\n", "Evaluate the best hyperparameters found by FCVOpt." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10-fold CV Loss (1 - AUC): 0.0734\n", "10-fold CV ROC-AUC: 0.9266\n", "\n", "Best hyperparameters:\n", " hidden_dropout: 0.1811278859421\n", " hidden_factor: 4.0\n", " layer_size: 53\n", " lr: 0.01\n", " n_hidden: 4\n", " normalization: batchnorm\n", " residual_dropout: 0.0\n", " weight_decay: 1e-06\n" ] } ], "source": [ "# Evaluate the best configuration found by FCVOpt on all 10 folds\n", "best_cv_loss = cv_obj(best_conf)\n", "best_cv_auc = 1 - best_cv_loss\n", "\n", "print(f\"10-fold CV Loss (1 - AUC): {best_cv_loss:.4f}\")\n", "print(f\"10-fold CV ROC-AUC: {best_cv_auc:.4f}\")\n", "print(f\"\\nBest hyperparameters:\")\n", "for key, value in best_conf.items():\n", " print(f\" {key}: {value}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "fcvopt_test (3.10.19)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 4 }