Reproducibility
Re-running the same GENBoostGPU experiment should yield consistent SNP sets and variance estimates. Use the checklist below to lock down sources of randomness and capture metadata.
Random seeds
Set seeds in Python’s
randommodule, NumPy, and CuPy before invoking any pipelines:import random import numpy as np import cupy as cp random.seed(42) np.random.seed(42) cp.random.seed(42)
Pass
random_stateexplicitly togenboostgpu.enet_boosting.boosting_elastic_net()(default13). The orchestrator propagates this viafixed_paramswhen you reuse tuned hyperparameters.When performing global tuning, set the
seedargument ingenboostgpu.tuning.select_tuning_windows()so the sampling of windows is stable.Optuna supports deterministic execution through
OPTUNA_SEEDor by monkey-patchingoptuna.create_studyas shown in Hyperparameters & tuning.
Deterministic settings
Fix hyperparameters via
fixed_alpha,fixed_l1_ratio, andfixed_subsamplewhen you want to avoid per-window Optuna searches.Keep the validation split deterministic by ensuring
val_fracstays within(0, 0.9)so the same RNG path is followed.Disable working-set adaptation by passing
adaptive_trials=Falsetogenboostgpu.enet_boosting.boosting_elastic_net()if you need an identical number of trials per window.
Logging & artefacts
Each call to
genboostgpu.vmr_runner.run_single_window()writes betas andh2trajectories throughgenboostgpu.data_io.save_results(). Archive these files alongside your downstream analyses.Append configuration dictionaries (hyperparameters, seeds, versions) to the saved TSV/Parquet outputs using the
metaargument insave_results.Capture software versions with
session_info.show()(used inexamples/simu_test_100n.py) and store them next to summary tables.Use structured logging (e.g.,
logging.config.dictConfig) in your wrapper scripts to mirror the information produced by Dask and Optuna.