GWAS Report

Overview
Methods
GWAS parameters
Filtered samples and variants
Genome-wide significant markers
GCTA-COJO results
CLUMP results
Unpruned significant markers
QQ-plot for \(p\)-values
Manhattan plot
SNP density plot
Histogram of \(\beta\) values
Kernel density plot of \(\beta\) values
Regression diagnostics
- Regression coefficients obtained with glm():
Variance inflation factors
Technical information

Overview

These are the GWAS results for job run_all with 337.475 samples, run for the phenotype Current .

Methods

The association between Current and each variant with a minor allele count \(\ge\) 30 and an imputation quality metric \(r^2 \ge\) 0.8 was tested using the logistic regression model:

Current ~ marker + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

The association tests were performed using plink2/2.00-alpha-2-20190429, including a total of 19.064.366 markers (after filtering). Markers having a Hardy-Weinberg equilibrium exact test p-value below 1.0e-6 were filtered out.
861 significant markers have been found. The genomic inflation factor was \(\lambda =\) 1.1111664.

GWAS parameters

The plink2 program was run with the following parameters:

Minor allele count cutoff: 30
Minor allele frequency cutoff: 0.00
Maximum variance inflation factor: 50
Sample missing call rate cutoff: 0.1
Variant missing call rate cutoff: 0.1
Minimum imputation quality: 0.8
Maximum imputation quality: 2.0
Hardy-Weinberg exact test p-value threshold: 1.0e-6

Filtered samples and variants

The table displays the number of samples/variants that have been removed due to different filters:

SGENO: Samples removed due to missing genotype. Cutoff: 0.1
VGENO: Variants removed due to missing genotype. Cutoff: 0.1
VHWE: Variants removed after Hardy-Weinberg exact test. P-value cutoff: 1.0e-6
VFREQ: Variants removed due to allele frequency threshold(s). Cutoffs: mac = 30 ; maf = 0.00
VIMP: Variants removed due to imputation quality filter (MACH-R2). Cutoff: 0.8 - 2.0

Genome-wide significant markers

Markers with \(p \le 5 \cdot 10^{-8}\) are displayed, no matter what the corresponding \(\beta\)-value is.
Spreadsheet with significant markers (only available if copied separately)
ID: link to Phenoscanner
POS: link to marker position at UCSC Genome Brower
genes: link to nearest gene (UCSC Genome Brower)
Note : A1_FREQ, BETA, SE, and P are not available if only one significant marker resides on a whole chromosome.

GCTA-COJO results

GCTA cojo was not conducted.

CLUMP results

Plink clumping was not conducted.

Unpruned significant markers

QQ-plot for \(p\)-values

Manhattan plot

SNP density plot

Histogram of \(\beta\) values

Kernel density plot of \(\beta\) values

Regression diagnostics

A regression was conducted using the linear model

Current ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

(Note that no marker genotype has been included). The dataframe used as regression input is stored in run_all_Current_regression_frame.RData, while the residuals are in run_all_Current_regression_resid.RData.

Regression coefficients obtained with `glm()`:

Variance inflation factors

Variance inflation factors (VIF) are calculated in order to discover multicollinearity. VIF can be obtained by regressing a single independent variable against all other independent variables. As a rule of thumb, no variance inflation factor should be bigger than 10. Otherwise, highly correlated variables should be removed from the model.

Technical information

GWAS workfolder /castor/project/proj/GWAS_TEST/run_all
Parameter file: /castor/project/proj/GWAS_TEST/run_all/run_all_gwas_params.txt