Hyperparameters & tuning

The default settings in genboostgpu.enet_boosting.boosting_elastic_net() work well for exploratory runs, but large cohorts benefit from carefully tuned parameters. This page summarises the main knobs and shows how to automate the search with Optuna.

Core boosting parameters

n_iter (default 50)

Maximum boosting iterations. Early stopping typically halts before this limit; increase when signals are weak.

batch_size (default 500)

Size of the working set evaluated per iteration. Larger values improve model stability but consume more memory. For high-M windows, consider setting working_set={"K": 2048, "refresh": 5}.

n_trials (default 20)

Number of Optuna trials used to tune ElasticNet hyperparameters per window. When fixed_alpha/fixed_l1_ratio are provided the tuning is skipped and n_trials is coerced to 1.

alphas (default (0.1, 1.0))

Range of ElasticNet alpha values searched by Optuna. Provide a tuple or a (low, high) pair to widen the space.

l1_ratios (default (0.1, 0.9))

Range of ElasticNet l1_ratio values.

subsample_frac (default 0.7)

Fraction of samples used within each Optuna trial. Reducing this speeds up tuning at the cost of more variance.

ridge_grid (default (1e-3, ..., 10))

Candidate ridge regression alphas evaluated during the final refit. Provide a tuple of floats or integers.

val_frac (default 0.2)

Fraction of samples kept aside for validation when early stopping monitors val_r2.

patience / min_delta / warmup (defaults 5 / 1e-4 / 5)

Early stopping controls. warmup delays checks, min_delta is the minimum improvement, and patience counts how many stagnant iterations to allow.

random_state (default 13)

Seed used for validation splits and Optuna sampling.

working_set

Dict with "K" (number of SNPs to evaluate) and "refresh" (how often to recompute correlations). Use it to stabilise runtimes on very large windows.

Global tuning workflow

For cohort-wide defaults, use the helpers in genboostgpu.tuning:

  1. genboostgpu.tuning.select_tuning_windows() stratifies a subset of windows based on SNP counts and chromosomes.

  2. genboostgpu.tuning.global_tune_params() converts high-level targets (c_lambda, c_ridge, subsample_frac) into per-window ElasticNet parameters via genboostgpu.hyperparams.enet_from_targets(). The helper reuses the same Optuna ridge refit stack as the per-window runs and, when multiple GPUs are present, parallelises evaluation with the new max_in_flight defaults so global sweeps finish quickly.

  3. Pass the resulting dictionary to the fixed_params callback in genboostgpu.orchestration.run_windows_with_dask().

Optuna integration

The boosting core uses optuna.create_study with a median pruner and the default sampler. To obtain deterministic behaviour, set Optuna’s global seed before launching any windows:

import functools
import optuna
from optuna.samplers import TPESampler

optuna.study.create_study = functools.partial(
    optuna.create_study,
    sampler=TPESampler(seed=42, multivariate=True),
)

results = boosting_elastic_net(
    X, y, snp_ids,
    n_trials=30,
    alphas=(0.01, 1.0),
    l1_ratios=(0.05, 0.95),
    random_state=42,
)

If you prefer not to monkey-patch, constrain alphas/l1_ratios and rely on fixed_alpha/fixed_l1_ratio to avoid stochastic searches. Per-window seeds can be threaded through fixed_params and stored alongside outputs for later replay.