Skip to content

Phenotype Association

Phenotype Association Analysis

This page describes two complementary ways to run phenotype-protein association analyses with covariate adjustment:

  • Discovery mode over all available phenotypes using scripts/pheno_discovery.py
  • Targeted analysis for specified phenotypes integrated into the broader comparison workflow via scripts/compare_result_oneplatform.py

Discovery: pheno_discovery.py

The discovery script scans all phenotype columns, classifies each as binary or continuous, and performs per-protein association tests using the truth matrix. All tests adjust for the specified age and gender covariates, with the exception that a phenotype does not adjust for itself (gender phenotypes adjust only for age; age phenotypes adjust only for gender). Multiple testing correction is performed using FDR (Benjamini–Hochberg).

Usage:

python scripts/pheno_discovery.py \
  --truth_a PATH/to/truth.csv \
  --phenotype_file PATH/to/phenotypes.csv \
  --output_dir results_pheno \
  --gender_col GENDER --age_col V5AGE52 \
  [--transpose]
  • truth_a: Features × samples by default (use --transpose if your input is samples × features). First column is the index.
  • phenotype_file: Samples as index in the first column; remaining columns are phenotypes.
  • gender_col / age_col: Covariate column names in the phenotype file.

Outputs:

  • results_pheno/summary_binary_all.csv and summary_continuous_all.csv
  • results_pheno/summary_binary_top5.csv and summary_continuous_top5.csv
  • results_pheno/summary.txt (concise report)
  • Per-phenotype result tables:
  • results_pheno/associations_binary/binary_associations_.csv
  • results_pheno/associations_continuous/continuous_associations_.csv

Each per-phenotype table contains effect estimates (odds ratios for binary; beta for continuous), standard errors, p-values, and FDR-adjusted p-values (p_adj).

Targeted: compare_result_oneplatform.py

The scripts/compare_result_oneplatform.py workflow supports phenotype association analysis for a user-specified set of phenotype columns alongside comprehensive method comparison and figure generation. Provide a phenotype file and explicit phenotype columns via --binary_pheno and --continuous_pheno. The analysis adjusts for age and gender as covariates and will generate per-phenotype result tables and summary figures.

Key flags (excerpt):

python scripts/compare_result_oneplatform.py \
  --truth_a TRUTH_A --imp_a_m1 IMP_A_M1 --imp_a_m2 IMP_A_M2 \
  --platform_a_name "Platform A" \
  --phenotype_file data/phenotypes.csv \
  --binary_pheno DIABETES HYPERTENSION \
  --continuous_pheno AGE BMI \
  [--transpose] \
  --output_dir outputs

Refer to the Comparison documentation for full usage and outputs.