Overview

These are the GWAS results for job Test with 27.243 samples, run for the phenotype liver_fat .


Methods

The association between liver_fat and each variant with a minor allele count \(\ge\) 30 and an imputation quality metric \(r^2 \ge\) 0.8 was tested using the linear regression model:

       liver_fat ~ marker + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

The association tests were performed using plink2/2.00-alpha-2.3-20200124, including a total of 14.308.111 markers (after filtering). Markers having a Hardy-Weinberg equilibrium exact test p-value below 1.0e-6 were filtered out.
662 significant markers have been found. The genomic inflation factor was \(\lambda =\) 1.0509461.


GWAS parameters

The plink2 program was run with the following parameters:


Filtered samples and variants

The table displays the number of samples/variants that have been removed due to different filters:


Genome-wide significant markers


GCTA-COJO results

The GCTA-COJO program was run using the following parameters:



Table with corresponding original results



CLUMP results

The plink-clump program was run using the following parameters:



Comparison of the markers identified by COJO and by CLUMP



QQ-plot for \(p\)-values

Manhattan plot

SNP density plot

Histogram of \(\beta\) values

Kernel density plot of \(\beta\) values


Regression diagnostics

A regression was conducted using the linear model

       liver_fat ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age

(Note that no marker genotype has been included). The dataframe used as regression input is stored in Test_liver_fat_regression_frame.RData, while the residuals are in Test_liver_fat_regression_resid.RData.


Some metrics obtained using the linear regression model:

  • sigma is the estimatimated standard deviation of the noise term
  • Fstat is the value of the F-statistic
  • Rsquared is coefficient of determination
  • Rsq.adj is coefficient of determination adjusted for the number of predictors
  • AIC is the Akaike information criterion (which estimates the quality of a gregession model, relative to others)


Regression coefficients obtained with lm():



Variance inflation factors

Variance inflation factors (VIF) are calculated in order to discover multicollinearity. VIF can be obtained by regressing a single independent variable against all other independent variables. As a rule of thumb, no variance inflation factor should be bigger than 10. Otherwise, highly correlated variables should be removed from the model.


Histogram for the magnitudes of the residuals

According to the linear model established, the residuals should be normally distributed. Consequently, the histogram below should approximately resemble a normal distribution. The residuals have been saved to r residfile .


Normal Q-Q plot of the residuals

This plot shows if residuals are normally distributed. It is desirable that the points displaying the residuals are located close to the straight line.


Technical information