Hyperparameters & tuning
The default settings in genboostgpu.enet_boosting.boosting_elastic_net()
work well for exploratory runs, but large cohorts benefit from carefully tuned
parameters. This page summarises the main knobs and shows how to automate the
search with Optuna.
Core boosting parameters
n_iter(default50)Maximum boosting iterations. Early stopping typically halts before this limit; increase when signals are weak.
batch_size(default500)Size of the working set evaluated per iteration. Larger values improve model stability but consume more memory. For high-M windows, consider setting
working_set={"K": 2048, "refresh": 5}.n_trials(default20)Number of Optuna trials used to tune ElasticNet hyperparameters per window. When
fixed_alpha/fixed_l1_ratioare provided the tuning is skipped andn_trialsis coerced to1.alphas(default(0.1, 1.0))Range of ElasticNet
alphavalues searched by Optuna. Provide a tuple or a(low, high)pair to widen the space.l1_ratios(default(0.1, 0.9))Range of ElasticNet
l1_ratiovalues.subsample_frac(default0.7)Fraction of samples used within each Optuna trial. Reducing this speeds up tuning at the cost of more variance.
ridge_grid(default(1e-3, ..., 10))Candidate ridge regression alphas evaluated during the final refit. Provide a tuple of floats or integers.
val_frac(default0.2)Fraction of samples kept aside for validation when early stopping monitors
val_r2.patience/min_delta/warmup(defaults5/1e-4/5)Early stopping controls.
warmupdelays checks,min_deltais the minimum improvement, andpatiencecounts how many stagnant iterations to allow.random_state(default13)Seed used for validation splits and Optuna sampling.
working_setDict with
"K"(number of SNPs to evaluate) and"refresh"(how often to recompute correlations). Use it to stabilise runtimes on very large windows.
Global tuning workflow
For cohort-wide defaults, use the helpers in genboostgpu.tuning:
genboostgpu.tuning.select_tuning_windows()stratifies a subset of windows based on SNP counts and chromosomes.genboostgpu.tuning.global_tune_params()converts high-level targets (c_lambda,c_ridge,subsample_frac) into per-window ElasticNet parameters viagenboostgpu.hyperparams.enet_from_targets(). The helper reuses the same Optuna ridge refit stack as the per-window runs and, when multiple GPUs are present, parallelises evaluation with the newmax_in_flightdefaults so global sweeps finish quickly.Pass the resulting dictionary to the
fixed_paramscallback ingenboostgpu.orchestration.run_windows_with_dask().
Optuna integration
The boosting core uses optuna.create_study with a median pruner and the
default sampler. To obtain deterministic behaviour, set Optuna’s global seed
before launching any windows:
import functools
import optuna
from optuna.samplers import TPESampler
optuna.study.create_study = functools.partial(
optuna.create_study,
sampler=TPESampler(seed=42, multivariate=True),
)
results = boosting_elastic_net(
X, y, snp_ids,
n_trials=30,
alphas=(0.01, 1.0),
l1_ratios=(0.05, 0.95),
random_state=42,
)
If you prefer not to monkey-patch, constrain alphas/l1_ratios and rely on
fixed_alpha/fixed_l1_ratio to avoid stochastic searches. Per-window seeds
can be threaded through fixed_params and stored alongside outputs for later
replay.