The GWAS diagnose was conducted for rs188247550_T_C based on 27.243 observations (which is the number of common samples for response variable, predictor variable, and covariates, see Venn diagram below).
The Venn diagram displays which samples are common to the response and predictor variables and the covariates.
The heatmap indicates the strength of the correlations between the independent variables (use the script examine_covariates to get detailed information).
Variance inflation factors (VIF) are calculated in order to discover multicollinearity. VIF can be obtained by regressing a single independent variable against all other independent variables. As a rule of thumb, no variance inflation factor should be bigger than 10. Otherwise, highly correlated variables should be removed from the model.
lm()
in Rliver_fat_a ~ geno + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + array + sex + age
The scatterplot shows the genotype on the x-axis and the phenotype on the y-axis. The x-values are jittered for better visibility. Hypothetical outliers are marked blue , while hypothetical influential observations (i.e. observations with high values of Cook’s D) are marked red . Note that at least six observations with the highest values of Cook’s D are marked in the plot, no matter if they exceed the calculated cutoff for being influential or not. For details regarding Cook’s D, check the section “Cook’s distance” below.
lm()
:lm()
:lm()
According to the linear model established, the residuals should be normally distributed. Consequently, the histogram below should approximately resemble a normal distribution. The residuals have been saved to diagnose_residuals_rs188247550_T_C_allele_T.RData .
This plot shows if residuals have non-linear patterns (which should not be the case). It is desirable that the residuals are equally spread around the horizontal line without distinct patterns. The p-value for the Non-constant Variance Score Test (ncvTest in R) is 1,623053e-69.
This plot shows if residuals are normally distributed. It is desirable that the points displaying the residuals are located close to the straight line.
The plot shows if residuals are spread equally along the ranges of predictors, allowing to check the equal variance (homoscedasticity) assumption. It is desirable that we see a horizontal line with equally (randomly) spread points.
The residuals should be independent according to the assumptions of the linear model applied. This means that the autocorrelation for any lag should be small. It is therefore desirable that all vertical lines standing for the magnitudes of autocorrelation are well inside the blue dashed lines displayed in the plot. The p-value for the Durbin-Watson-Test is 0,02
Cook’s distance quantifies the influence of each observation on the regression results. Cook’s distance is inferred by recalculating the regression results after removal of a single observation from the input dataset. It summarizes how much the results are changed when the observation is removed. The cutoff for Cook’s distance is 0.9528287 (calculated as the median of the F-distribution for 14 and 27229 degrees of freedom). According to this cutoff, we have 0 variables being influental.
The plot supports identification of influential observations. Influential observations are located at the upper right or the lower right corner of this plot. Cases outside the dashed line (Cook’s distance) might be influential to the regression results. i.e. the regression results will be altered if these observations are excluded from the model. ( Note that the dashed line indicating Cook’s distance may not be visible in the plot if all observations have a magninude of Cook’s D which is below the cutoff.)
Outliers were calculated using the function outlierTest in R. The number of hypothetical outliers obtained by this function was 169.
The inverse response plot displays the response variable (i.e. the phenotype) on the x-axis and the fitted values on the y-axis. A relationship bettween these variables in the form \(Y_{fitted} = \beta_0 + \beta_1 \cdot Y_{response}^\lambda\) is fitted by using the nls function in R. The estimated \(\lambda\) for the model considered here is -0,9750917
The position of the marker rs188247550 is 19.396.616 on chromosome 19.