Hyperparameters & tuning ======================== The default settings in :func:`genboostgpu.enet_boosting.boosting_elastic_net` work well for exploratory runs, but large cohorts benefit from carefully tuned parameters. This page summarises the main knobs and shows how to automate the search with Optuna. Core boosting parameters ------------------------ ``n_iter`` (default ``50``) Maximum boosting iterations. Early stopping typically halts before this limit; increase when signals are weak. ``batch_size`` (default ``500``) Size of the working set evaluated per iteration. Larger values improve model stability but consume more memory. For high-M windows, consider setting ``working_set={"K": 2048, "refresh": 5}``. ``n_trials`` (default ``20``) Number of Optuna trials used to tune ElasticNet hyperparameters per window. When ``fixed_alpha``/``fixed_l1_ratio`` are provided the tuning is skipped and ``n_trials`` is coerced to ``1``. ``alphas`` (default ``(0.1, 1.0)``) Range of ElasticNet ``alpha`` values searched by Optuna. Provide a tuple or a ``(low, high)`` pair to widen the space. ``l1_ratios`` (default ``(0.1, 0.9)``) Range of ElasticNet ``l1_ratio`` values. ``subsample_frac`` (default ``0.7``) Fraction of samples used within each Optuna trial. Reducing this speeds up tuning at the cost of more variance. ``ridge_grid`` (default ``(1e-3, ..., 10)``) Candidate ridge regression alphas evaluated during the final refit. Provide a tuple of floats or integers. ``val_frac`` (default ``0.2``) Fraction of samples kept aside for validation when early stopping monitors ``val_r2``. ``patience`` / ``min_delta`` / ``warmup`` (defaults ``5`` / ``1e-4`` / ``5``) Early stopping controls. ``warmup`` delays checks, ``min_delta`` is the minimum improvement, and ``patience`` counts how many stagnant iterations to allow. ``random_state`` (default ``13``) Seed used for validation splits and Optuna sampling. ``working_set`` Dict with ``"K"`` (number of SNPs to evaluate) and ``"refresh"`` (how often to recompute correlations). Use it to stabilise runtimes on very large windows. Global tuning workflow ---------------------- For cohort-wide defaults, use the helpers in :mod:`genboostgpu.tuning`: 1. :func:`genboostgpu.tuning.select_tuning_windows` stratifies a subset of windows based on SNP counts and chromosomes. 2. :func:`genboostgpu.tuning.global_tune_params` converts high-level targets (``c_lambda``, ``c_ridge``, ``subsample_frac``) into per-window ElasticNet parameters via :func:`genboostgpu.hyperparams.enet_from_targets`. The helper reuses the same Optuna ridge refit stack as the per-window runs and, when multiple GPUs are present, parallelises evaluation with the new ``max_in_flight`` defaults so global sweeps finish quickly. 3. Pass the resulting dictionary to the ``fixed_params`` callback in :func:`genboostgpu.orchestration.run_windows_with_dask`. Optuna integration ------------------ The boosting core uses ``optuna.create_study`` with a median pruner and the default sampler. To obtain deterministic behaviour, set Optuna's global seed before launching any windows: .. code-block:: python import functools import optuna from optuna.samplers import TPESampler optuna.study.create_study = functools.partial( optuna.create_study, sampler=TPESampler(seed=42, multivariate=True), ) results = boosting_elastic_net( X, y, snp_ids, n_trials=30, alphas=(0.01, 1.0), l1_ratios=(0.05, 0.95), random_state=42, ) If you prefer not to monkey-patch, constrain ``alphas``/``l1_ratios`` and rely on ``fixed_alpha``/``fixed_l1_ratio`` to avoid stochastic searches. Per-window seeds can be threaded through ``fixed_params`` and stored alongside outputs for later replay.