GWAS Report

Overview

These are the GWAS results for job Test with 27.243 samples, run for the phenotype liver_fat .

Methods

The association between liver_fat and each variant with a minor allele count \(\ge\) 30 and an imputation quality metric \(r^2 \ge\) 0.8 was tested using the linear regression model:

liver_fat ~ marker + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

The association tests were performed using plink2/2.00-alpha-2.3-20200124, including a total of 14.308.111 markers (after filtering). Markers having a Hardy-Weinberg equilibrium exact test p-value below 1.0e-6 were filtered out.
662 significant markers have been found. The genomic inflation factor was \(\lambda =\) 1.0509461.

GWAS parameters

The plink2 program was run with the following parameters:

Minor allele count cutoff: 30
Minor allele frequency cutoff: 0.00
Maximum variance inflation factor: 50
Sample missing call rate cutoff: 0.1
Variant missing call rate cutoff: 0.1
Minimum imputation quality: 0.8
Maximum imputation quality: 2.0
Hardy-Weinberg exact test p-value threshold: 1.0e-6

Filtered samples and variants

The table displays the number of samples/variants that have been removed due to different filters:

SGENO: Samples removed due to missing genotype. Cutoff: 0.1
VGENO: Variants removed due to missing genotype. Cutoff: 0.1
VHWE: Variants removed after Hardy-Weinberg exact test. P-value cutoff: 1.0e-6
VFREQ: Variants removed due to allele frequency threshold(s). Cutoffs: mac = 30 ; maf = 0.00
VIMP: Variants removed due to imputation quality filter (MACH-R2). Cutoff: 0.8 - 2.0

Genome-wide significant markers

Markers with \(p \le 5 \cdot 10^{-8}\) are displayed, no matter what the corresponding \(\beta\)-value is.
Spreadsheet with significant markers (only available if copied separately)
ID: navigate to Phenoscanner
POS: link to marker position at UCSC Genome Brower
genes: link to nearest gene (UCSC Genome Brower)
Note : A1_FREQ, BETA, SE, and P are not available if only one significant marker resides on a whole chromosome.

GCTA-COJO results

The GCTA-COJO program was run using the following parameters:

\(p\)-value: 5.0e-8
window size: 5000 kB
collinearity threshold (\(R^2\)): 0.9
minor allele frequency cutoff (A1_FREQ): 0
reference genome: FTD_rand

Table with corresponding original results

CLUMP results

The plink-clump program was run using the following parameters:

primary \(p\)-value: 5e-8
secondary \(p\)-value: 5e-6
correlation threshold: 0.01
window size: 5000 kb
reference genome: FTD_rand

Comparison of the markers identified by COJO and by CLUMP

The number of common markers indentified with the two methods is 4.
This corresponds to 100 % of the clump markers and 100 % of the cojo markers.

QQ-plot for \(p\)-values

Manhattan plot

SNP density plot

Histogram of \(\beta\) values

Kernel density plot of \(\beta\) values

Regression diagnostics

A regression was conducted using the linear model

liver_fat ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

(Note that no marker genotype has been included). The dataframe used as regression input is stored in Test_liver_fat_regression_frame.RData, while the residuals are in Test_liver_fat_regression_resid.RData.

Some metrics obtained using the linear regression model:

sigma is the estimatimated standard deviation of the noise term
Fstat is the value of the F-statistic
Rsquared is coefficient of determination
Rsq.adj is coefficient of determination adjusted for the number of predictors
AIC is the Akaike information criterion (which estimates the quality of a gregession model, relative to others)

Regression coefficients obtained with `lm()`:

Variance inflation factors

Variance inflation factors (VIF) are calculated in order to discover multicollinearity. VIF can be obtained by regressing a single independent variable against all other independent variables. As a rule of thumb, no variance inflation factor should be bigger than 10. Otherwise, highly correlated variables should be removed from the model.

Histogram for the magnitudes of the residuals

According to the linear model established, the residuals should be normally distributed. Consequently, the histogram below should approximately resemble a normal distribution. The residuals have been saved to r residfile .

Normal Q-Q plot of the residuals

This plot shows if residuals are normally distributed. It is desirable that the points displaying the residuals are located close to the straight line.

Technical information

GWAS workfolder /castor/project/proj/GWAS_TEST/Test
Parameter file: /castor/project/proj/GWAS_TEST/Test/Test_gwas_params.txt