Quick start

The snippet below shows the fastest way to orchestrate a single window analysis with genboostgpu.orchestration.run_windows_with_dask(). It generates toy CuPy arrays, seeds all RNGs with 42, and saves results to results/.

Note

Ensure you have followed the Installation guide and have a CUDA 12 GPU visible to the process (CUDA_VISIBLE_DEVICES).

 1import cupy as cp
 2import numpy as np
 3import pandas as pd
 4
 5from genboostgpu.orchestration import run_windows_with_dask
 6
 7np.random.seed(42)
 8cp.random.seed(42)
 9
10n_samples, n_snps = 256, 512
11geno = cp.asarray(np.random.normal(size=(n_samples, n_snps)), dtype=cp.float32)
12bim = pd.DataFrame({
13    "chrom": ["21"] * n_snps,
14    "snp": [f"rs{i}" for i in range(n_snps)],
15    "pos": np.arange(n_snps) * 100 + 150_000,
16})
17pheno = cp.asarray(np.random.normal(size=n_samples), dtype=cp.float32)
18
19windows = [{
20    "chrom": 21,
21    "start": 150_000,
22    "end": 150_000,
23    "pheno": pheno,
24    "pheno_id": "trait_1",
25}]
26
27results = run_windows_with_dask(
28    windows,
29    geno_arr=geno,
30    bim=bim,
31    outdir="results",
32    window_size=200_000,
33    n_iter=30,
34    n_trials=5,
35    batch_size=512,
36    prefix="quickstart",
37)
38
39results.to_csv("results/trait_1_summary.csv", index=False)
40print(results.head())

The call triggers genboostgpu.vmr_runner under the hood, which filters SNPs in the cis-window, performs boosting iterations, and writes parquet/TSV files to results/.

More to explore

  • Inspect the saved parquet at results/quickstart.summary_windows.parquet for window-level metrics.

  • Dive into Workflow to understand how each module contributes.

  • Try the richer Simulation tutorial and VMR caudate tutorial walkthroughs that reuse the scripts shipped in examples/.