{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tuning TabularResNet Hyperparameters\n",
    "\n",
    "This example demonstrates how to use FCVOpt to tune hyperparameters for TabularResNet, a deep learning architecture designed for tabular data (Gorishniy et al., 2021).\n",
    "\n",
    "`ResNetCVObj` implements the full training loop in pure PyTorch—no additional training library is required. It includes early stopping, learning-rate scheduling, and gradient clipping out of the box.\n",
    "\n",
    "Key features:\n",
    "- Uses `ResNetCVObj`, which subclasses `CVObjective` and implements `fit_and_test` with a self-contained PyTorch training loop\n",
    "- Both architectural and optimization hyperparameters are tuned\n",
    "- Batch size is fixed at construction time, not tuned (see `batch_size` argument)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.metrics import roc_auc_score\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# FCVOpt imports\n",
    "from fcvopt.crossvalidation import ResNetCVObj\n",
    "from fcvopt.optimizers import FCVOpt\n",
    "from fcvopt.configspace import ConfigurationSpace\n",
    "from ConfigSpace import Integer, Float, Categorical"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding TabularResNet\n",
    "\n",
    "TabularResNet (Gorishniy et al., 2021) is a deep learning architecture specifically designed for tabular data. It consists of:\n",
    "\n",
    "### Architecture Components:\n",
    "\n",
    "1. **Input Stem**: A single fully-connected layer that projects input features to a hidden dimension\n",
    "   \n",
    "2. **Residual Blocks**: The core building blocks, each computing:\n",
    "   ```\n",
    "   x_out = x_in + Dropout( Linear( Dropout( ReLU( Linear( Norm(x_in) ) ) ) ) )\n",
    "   ```\n",
    "   \n",
    "   Each block includes:\n",
    "   - **Normalization**: Either BatchNorm or LayerNorm\n",
    "   - **Two Linear layers**: First expands by `hidden_factor`, second projects back\n",
    "   - **ReLU activation**: Between the two linear layers\n",
    "   - **Dropout**: Applied twice (hidden and residual paths)\n",
    "   - **Residual connection**: Adds input to transformed output\n",
    "\n",
    "3. **Prediction Head**: Norm → ReLU → Linear to output dimension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Sample Data\n",
    "\n",
    "We use the same binary classification dataset as in the LightGBM notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Shape of features matrix: (2000, 25)\n",
      "Class distribution: [1796  204]\n"
     ]
    }
   ],
   "source": [
    "# Generate binary classification dataset with class imbalance (90% vs 10%)\n",
    "# Using 2000 samples, 25 features (5 informative, 10 redundant)\n",
    "X, y = make_classification(\n",
    "    n_samples= 2000,\n",
    "    n_features= 25,\n",
    "    n_informative= 5,\n",
    "    n_redundant=10,\n",
    "    n_classes= 2, n_clusters_per_class= 2,\n",
    "    weights=[0.9, 0.1], # imbalanced data,\n",
    "    random_state=23\n",
    ")\n",
    "\n",
    "print(f\"Shape of features matrix: {X.shape}\")\n",
    "print(f\"Class distribution: {np.bincount(y)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Hyperparameter Search Space\n",
    "\n",
    "TabularResNet has two categories of hyperparameters:\n",
    "\n",
    "**Architectural Hyperparameters**:\n",
    "\n",
    "| Hyperparameter | Range | Scale | Description |\n",
    "|----------------|-------|-------|-------------|\n",
    "| `n_hidden` | [1, 4] | log | Number of residual blocks in the network |\n",
    "| `layer_size` | [16, 128] | log | Width of hidden representations |\n",
    "| `normalization` | {batchnorm, layernorm} | - | Type of normalization layer |\n",
    "| `hidden_factor` | [1.0, 4.0] | uniform | Expansion factor inside residual blocks |\n",
    "| `hidden_dropout` | [0.0, 0.3] | uniform | Dropout rate inside residual blocks |\n",
    "| `residual_dropout` | [0.0, 0.2] | uniform | Dropout rate on residual output |\n",
    "\n",
    "**Optimization Hyperparameters**\n",
    "\n",
    "| Hyperparameter | Range | Scale | Description |\n",
    "|----------------|-------|-------|-------------|\n",
    "| `lr` | [1e-4, 1e-2] | log | Learning rate for optimizer |\n",
    "| `weight_decay` | [1e-6, 1e-3] | log | L2 regularization strength |\n",
    "\n",
    "Note: Batch size is fixed via the `batch_size` argument of `ResNetCVObj` and is generally not worth tuning. We use narrower ranges than the recommended defaults for faster optimization in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Configuration space object:\n",
      "  Hyperparameters:\n",
      "    hidden_dropout, Type: UniformFloat, Range: [0.0, 0.3], Default: 0.15\n",
      "    hidden_factor, Type: UniformFloat, Range: [1.0, 4.0], Default: 2.5\n",
      "    layer_size, Type: UniformInteger, Range: [16, 128], Default: 45, on log-scale\n",
      "    lr, Type: UniformFloat, Range: [0.0001, 0.01], Default: 0.001, on log-scale\n",
      "    n_hidden, Type: UniformInteger, Range: [1, 4], Default: 2, on log-scale\n",
      "    normalization, Type: Categorical, Choices: {batchnorm, layernorm}, Default: batchnorm\n",
      "    residual_dropout, Type: UniformFloat, Range: [0.0, 0.2], Default: 0.1\n",
      "    weight_decay, Type: UniformFloat, Range: [1e-06, 0.001], Default: 3.16227766e-05, on log-scale\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Create configuration space for hyperparameter search\n",
    "config = ConfigurationSpace()\n",
    "\n",
    "# Architectural hyperparameters\n",
    "config.add([\n",
    "    Integer('n_hidden', bounds=(1, 4), log=True),        # Number of residual blocks\n",
    "    Integer('layer_size', bounds=(16, 128), log=True),   # Hidden layer width\n",
    "    Categorical('normalization', items=['batchnorm', 'layernorm']),\n",
    "    Float('hidden_factor', bounds=(1.0, 4.0)),           # Expansion factor in blocks\n",
    "    Float('hidden_dropout', bounds=(0.0, 0.3)),          # Dropout inside blocks\n",
    "    Float('residual_dropout', bounds=(0.0, 0.2)),        # Dropout on residual\n",
    "])\n",
    "\n",
    "# Optimization hyperparameters\n",
    "config.add([\n",
    "    Float('lr', bounds=(1e-4, 1e-2), log=True),\n",
    "    Float('weight_decay', bounds=(1e-6, 1e-3), log=True),\n",
    "])\n",
    "\n",
    "print(config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Cross-Validation Objective\n",
    "\n",
    "`ResNetCVObj` implements the full training loop internally. Each call to `fit_and_test` for a given fold will:\n",
    "\n",
    "- Hold out 10 % of the training data as an internal validation set\n",
    "- Train with mini-batch gradient descent (AdamW by default)\n",
    "- Apply early stopping (patience = 15 epochs) and restore the best checkpoint\n",
    "- Apply `ReduceLROnPlateau` learning-rate scheduling\n",
    "- Apply gradient norm clipping (limit = 5.0)\n",
    "\n",
    "The `input_preprocessor` (here `StandardScaler`) is fitted on the training split of each fold and applied to both train and test to avoid leakage.\n",
    "\n",
    "The `needs_proba=True` flag tells the objective to pass predicted probabilities (sigmoid output) to `loss_metric` rather than hard labels—required for AUC."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created ResNetCVObj\n",
      "Number of CV folds: 10\n",
      "Training samples: 2000\n"
     ]
    }
   ],
   "source": [
    "# Define loss metric: minimize (1 - AUC)\n",
    "def auc_loss(y_true, y_pred):\n",
    "    return 1 - roc_auc_score(y_true, y_pred)\n",
    "\n",
    "# Create CV objective that wraps TabularResNet\n",
    "cv_obj = ResNetCVObj(\n",
    "    X=X,\n",
    "    y=y,\n",
    "    task='binary_classification',  # BCEWithLogitsLoss, single output logit\n",
    "    loss_metric=auc_loss,\n",
    "    needs_proba=True,              # AUC requires predicted probabilities\n",
    "    n_splits=10,\n",
    "    stratified=True,               # Preserve class distribution in folds\n",
    "    max_epochs=50,                 # Early stopping may halt training earlier\n",
    "    batch_size=256,                # Fixed; not tuned\n",
    "    rng_seed=42,\n",
    "    input_preprocessor=StandardScaler(),\n",
    ")\n",
    "\n",
    "print(f\"Created ResNetCVObj\")\n",
    "print(f\"Number of CV folds: {cv_obj.cv.get_n_splits()}\")\n",
    "print(f\"Training samples: {len(cv_obj.y)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Hyperparameter Optimization with FCVOpt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Starting optimization...\n",
      "\n",
      "\n",
      "Number of candidates evaluated.....: 50\n",
      "Single-fold observed loss (best)...: 0.0344444\n",
      "Estimated full CV loss (best)......: 0.0567568\n",
      "\n",
      " Best configuration at termination:\n",
      " Configuration(values={\n",
      "  'hidden_dropout': 0.1811278859421,\n",
      "  'hidden_factor': 4.0,\n",
      "  'layer_size': 53,\n",
      "  'lr': 0.01,\n",
      "  'n_hidden': 4,\n",
      "  'normalization': 'batchnorm',\n",
      "  'residual_dropout': 0.0,\n",
      "  'weight_decay': 1e-06,\n",
      "})\n"
     ]
    }
   ],
   "source": [
    "# Initialize FCVOpt optimizer\n",
    "optimizer = FCVOpt(\n",
    "    obj=cv_obj,                         # CV objective (callable)\n",
    "    n_folds=cv_obj.cv.get_n_splits(),   # Total number of folds\n",
    "    config=config,                      # Search space\n",
    "    acq_function='LCB',\n",
    "    tracking_dir='./hpt_opt_runs/',\n",
    "    experiment='resnet_tuning',\n",
    "    seed=123,\n",
    ")\n",
    "\n",
    "# Run optimization\n",
    "# Note: neural network training makes each evaluation moderately expensive\n",
    "print(\"Starting optimization...\\n\")\n",
    "best_conf = optimizer.optimize(n_trials=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate Best Configuration\n",
    "\n",
    "Evaluate the best hyperparameters found by FCVOpt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10-fold CV Loss (1 - AUC): 0.0734\n",
      "10-fold CV ROC-AUC:        0.9266\n",
      "\n",
      "Best hyperparameters:\n",
      "  hidden_dropout: 0.1811278859421\n",
      "  hidden_factor: 4.0\n",
      "  layer_size: 53\n",
      "  lr: 0.01\n",
      "  n_hidden: 4\n",
      "  normalization: batchnorm\n",
      "  residual_dropout: 0.0\n",
      "  weight_decay: 1e-06\n"
     ]
    }
   ],
   "source": [
    "# Evaluate the best configuration found by FCVOpt on all 10 folds\n",
    "best_cv_loss = cv_obj(best_conf)\n",
    "best_cv_auc = 1 - best_cv_loss\n",
    "\n",
    "print(f\"10-fold CV Loss (1 - AUC): {best_cv_loss:.4f}\")\n",
    "print(f\"10-fold CV ROC-AUC:        {best_cv_auc:.4f}\")\n",
    "print(f\"\\nBest hyperparameters:\")\n",
    "for key, value in best_conf.items():\n",
    "    print(f\"  {key}: {value}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fcvopt_test (3.10.19)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}