Overview
These are the GWAS results for job Test with 27.243 samples, run for the phenotype liver_fat .
Methods
The association between liver_fat and each variant with a minor allele count \(\ge\) 30 and an imputation quality metric \(r^2 \ge\) 0.8 was tested using the linear regression model:
liver_fat ~ marker + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age
The association tests were performed using plink2/2.00-alpha-2.3-20200124, including a total of 14.308.111 markers (after filtering). Markers having a Hardy-Weinberg equilibrium exact test p-value below 1.0e-6 were filtered out.
662 significant markers have been found. The genomic inflation factor was \(\lambda =\) 1.0509461.
GWAS parameters
The plink2 program was run with the following parameters:
- Minor allele count cutoff: 30
- Minor allele frequency cutoff: 0.00
- Maximum variance inflation factor: 50
- Sample missing call rate cutoff: 0.1
- Variant missing call rate cutoff: 0.1
- Minimum imputation quality: 0.8
- Maximum imputation quality: 2.0
- Hardy-Weinberg exact test p-value threshold: 1.0e-6
Filtered samples and variants
The table displays the number of samples/variants that have been removed due to different filters:
- SGENO: Samples removed due to missing genotype. Cutoff: 0.1
- VGENO: Variants removed due to missing genotype. Cutoff: 0.1
- VHWE: Variants removed after Hardy-Weinberg exact test. P-value cutoff: 1.0e-6
- VFREQ: Variants removed due to allele frequency threshold(s). Cutoffs: mac = 30 ; maf = 0.00
- VIMP: Variants removed due to imputation quality filter (MACH-R2). Cutoff: 0.8 - 2.0
Genome-wide significant markers
- Markers with \(p \le 5 \cdot 10^{-8}\) are displayed, no matter what the corresponding \(\beta\)-value is.
- Spreadsheet with significant markers (only available if copied separately)
- ID: navigate to Phenoscanner
- POS: link to marker position at UCSC Genome Brower
- genes: link to nearest gene (UCSC Genome Brower)
- Note : A1_FREQ, BETA, SE, and P are not available if only one significant marker resides on a whole chromosome.
GCTA-COJO results
The GCTA-COJO program was run using the following parameters:
- \(p\)-value: 5.0e-8
- window size: 5000 kB
- collinearity threshold (\(R^2\)): 0.9
- minor allele frequency cutoff (A1_FREQ): 0
- reference genome: FTD_rand
Table with corresponding original results
CLUMP results
The plink-clump program was run using the following parameters:
- primary \(p\)-value: 5e-8
- secondary \(p\)-value: 5e-6
- correlation threshold: 0.01
- window size: 5000 kb
- reference genome: FTD_rand
Comparison of the markers identified by COJO and by CLUMP
- The number of common markers indentified with the two methods is 4.
- This corresponds to 100 % of the clump markers and 100 % of the cojo markers.
QQ-plot for \(p\)-values
Manhattan plot
SNP density plot
Histogram of \(\beta\) values
Kernel density plot of \(\beta\) values
Regression diagnostics
A regression was conducted using the linear model
liver_fat ~ PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11 + PC12 + PC13 + PC14 + PC15 + PC16 + PC17 + PC18 + PC19 + PC20 + array + sex + age
(Note that no marker genotype has been included). The dataframe used as regression input is stored in Test_liver_fat_regression_frame.RData, while the residuals are in Test_liver_fat_regression_resid.RData.
Some metrics obtained using the linear regression model:
- sigma is the estimatimated standard deviation of the noise term
- Fstat is the value of the F-statistic
- Rsquared is coefficient of determination
- Rsq.adj is coefficient of determination adjusted for the number of predictors
- AIC is the Akaike information criterion (which estimates the quality of a gregession model, relative to others)
Regression coefficients obtained with lm()
:
Variance inflation factors
Variance inflation factors (VIF) are calculated in order to discover multicollinearity. VIF can be obtained by regressing a single independent variable against all other independent variables. As a rule of thumb, no variance inflation factor should be bigger than 10. Otherwise, highly correlated variables should be removed from the model.
Histogram for the magnitudes of the residuals
According to the linear model established, the residuals should be normally distributed. Consequently, the histogram below should approximately resemble a normal distribution. The residuals have been saved to r residfile
.
Normal Q-Q plot of the residuals
This plot shows if residuals are normally distributed. It is desirable that the points displaying the residuals are located close to the straight line.