GENBoostGPU

GPU-accelerated elastic net boosting for large-scale methylation and SNP studies. GENBoostGPU orchestrates feature preprocessing, Optuna-powered hyperparameter search, and elastic net boosting on top of RAPIDS, CuPy, and Dask so you can model thousands of genomic windows in parallel without leaving Python.

Key features

  • Adaptive window orchestration – distribute genboostgpu.orchestration jobs across one or many GPUs with auto-tuned max_in_flight concurrency.

  • Automated SNP curation – zero-variance filtering, missing data imputation, and LD clumping in genboostgpu.snp_processing.

  • Elastic net boosting core – reproducible variance decomposition and ridge refits from genboostgpu.enet_boosting.

  • Flexible I/O – load PLINK data, CuPy arrays, or parquet outputs with genboostgpu.data_io.

  • Tuning toolbox – global and per-window hyperparameter utilities in genboostgpu.tuning, including cohort-wide Optuna refits.

  • Reproducibility guardrails – documented seeding, metadata capture, and structured logging patterns for consistent reruns.

Supported platforms

GENBoostGPU targets Linux with NVIDIA GPUs (Ampere or newer) and CUDA 12.x. Multi-GPU orchestration requires RAPIDS cudf/cuML 25.8 and dask-cuda 25.8 or newer. Development and documentation can be performed on CPU-only machines by installing the mock/documentation requirements.

Get started

  • Quick start – minimal pipeline example with saved outputs.

  • Installation – environment setup for CPU docs versus GPU production.

  • User guide – deep dives on data formats, workflow, tuning, scaling, and reproducibility.

  • Tutorials – walkthroughs based on the scripts in examples/.

  • API Reference – autogenerated API reference.

  • Troubleshooting – common fixes for CUDA, RAPIDS, and Dask issues.

  • Contributing – guidelines for development, style, and tests.

  • Changelog – highlights from each release.

Citation

If you use GENBoostGPU in academic or industrial work, please cite:

Alexis Bennett and Kynon J.M. Benjamin. GENBoostGPU: GPU-accelerated elastic net boosting for large-scale epigenomics. DOI: 10.5281/zenodo.17238798.

Indices and tables