Reproducibility

Re-running the same GENBoostGPU experiment should yield consistent SNP sets and variance estimates. Use the checklist below to lock down sources of randomness and capture metadata.

Random seeds

  • Set seeds in Python’s random module, NumPy, and CuPy before invoking any pipelines:

    import random
    import numpy as np
    import cupy as cp
    
    random.seed(42)
    np.random.seed(42)
    cp.random.seed(42)
    
  • Pass random_state explicitly to genboostgpu.enet_boosting.boosting_elastic_net() (default 13). The orchestrator propagates this via fixed_params when you reuse tuned hyperparameters.

  • When performing global tuning, set the seed argument in genboostgpu.tuning.select_tuning_windows() so the sampling of windows is stable.

  • Optuna supports deterministic execution through OPTUNA_SEED or by monkey-patching optuna.create_study as shown in Hyperparameters & tuning.

Deterministic settings

  • Fix hyperparameters via fixed_alpha, fixed_l1_ratio, and fixed_subsample when you want to avoid per-window Optuna searches.

  • Keep the validation split deterministic by ensuring val_frac stays within (0, 0.9) so the same RNG path is followed.

  • Disable working-set adaptation by passing adaptive_trials=False to genboostgpu.enet_boosting.boosting_elastic_net() if you need an identical number of trials per window.

Logging & artefacts

  • Each call to genboostgpu.vmr_runner.run_single_window() writes betas and h2 trajectories through genboostgpu.data_io.save_results(). Archive these files alongside your downstream analyses.

  • Append configuration dictionaries (hyperparameters, seeds, versions) to the saved TSV/Parquet outputs using the meta argument in save_results.

  • Capture software versions with session_info.show() (used in examples/simu_test_100n.py) and store them next to summary tables.

  • Use structured logging (e.g., logging.config.dictConfig) in your wrapper scripts to mirror the information produced by Dask and Optuna.