Workflow ======== At a high level, GENBoostGPU moves data from disk or memory, filters and scores SNPs, trains elastic net models in a boosting loop, and evaluates variance explained. The diagram below highlights the major stages. .. code-block:: text +-------------+ +------------------+ +-----------------+ +------------------+ | Data input | ---> | SNP preprocessing| ---> | Boosting elastic| ---> | Evaluation & | | (PLINK, CuPy| | (filtering, LD) | | net iterations | | persistence | +-------------+ +------------------+ +-----------------+ +------------------+ | | | | v v v v data_io.load_* snp_processing.* enet_boosting.boosting_* data_io.save_results Module responsibilities ----------------------- :mod:`genboostgpu.data_io` Reads PLINK and phenotype files, emits CuPy/cuDF objects, and saves outputs to TSV/Parquet. :mod:`genboostgpu.snp_processing` Applies zero-variance filtering, missing value imputation, cis-window selection, and LD clumping. :mod:`genboostgpu.enet_boosting` Implements the boosting loop, Optuna-based ElasticNet tuning, and final ridge refit. :mod:`genboostgpu.cpg_orchestration` CpG-centric orchestration utilities for scheduling boosting tasks across traits, chromosomes, or distributed Dask workers. :mod:`genboostgpu.orchestration` High-level entry point. Launches :func:`genboostgpu.vmr_runner.run_single_window` across windows, optionally using :class:`dask_cuda.LocalCUDACluster` for multi-GPU execution. Putting it together ------------------- 1. Build a list of windows (chromosome, start, end, phenotype ID/path). 2. Load or provide genotype/phenotype objects (:mod:`genboostgpu.data_io`). 3. Call :func:`genboostgpu.orchestration.run_windows_with_dask` to schedule work. 4. Inspect the resulting pandas DataFrame plus the saved parquet/TSV files. For more detailed orchestration examples, see :doc:`tutorials/index`.