Fisher’s Exact test

What does this test do?

Decides if two (or more) proportions differ significantly
Example I:
- in group A, 20 out of 40 animals survived (proportion = 0.5)
- in group B, 5 out of 25 animals survived (proportion = 0.2)
- is there a statistically significant difference between these proportions?
Example II (enrichment analysis, differentially expressed genes):
- in a test group, 10 of 20 genes are up-regulated (proportion = 0.5)
- in a control group, 15 of 20 genes are up-regulated (proportion = 0.75)
- is there a statistically significant difference between these proportions?
- (or, are the up-regulated genes enriched in the control group)
can also be seen as test of independence:
- is the survival rate independent of the group membership? (example I)
- is the gene regulation independent of the group membership? (example II)
the test works only for relativly low numbers (high computational workload)
for larger numbers, use the prop.test() function (next chapter)
\(H_0\) : \(p_1 = p_2\) (equal proportions, independence)
a small p-value means that the null is rejected, i.e. that the proportions differ significantly
function: fisher.test()
entries should be nonnegative integers

Running the test

We use this example from Wikipedia, where the handedness of men and women was compared:

48 woman and 52 men were asked if they are left- or right-handed.
Is the proportion of left-handed persons significantly bigger for one the sexes?
This is equivalent to the question if handedness is independent of sex. Therefore, this can be seen as a test of independence.

The numbers have to be arranged in a contingency table:

x <- matrix(c(43,9,44,4), ncol = 2, byrow = T)
colnames(x) <- c("Right-handed", "Left-handed")
rownames(x) <- c("Male", "Female")
x

##        Right-handed Left-handed
## Male             43           9
## Female           44           4

We have 9 out of 52 left-handed individuals among men, and 4 out of 48 left-handed individuals among women, i.e. the proportions are:

men: \(9/52 = 0.173\)
women: \(4/48 = 0.833\)

The proportions are in fact different, but keep in mind that we cannot see with the naked eye if they are significantly different. Significance is only reached when the numbers are big enough to exclude random differences caused by too small samples.

The test is conducted using the function fisher.test of the R stats-package:

fisher.test(x)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 0.2392
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.09150811 1.71527769
## sample estimates:
## odds ratio 
##  0.4378606

Reading the output

The null hypothesis is that the proportions are the same, which is equivalent to the statement that the variables sex and handedness are independent.

The high p-value in the output above indicates that the null is not rejected, i.e. we cannot state that there is a significant difference regarding handedness between women and men.

The confidence interval refers to the odds ratio. It includes the true value with 95% chance (if the parameter conf.level in the function call is left unchanged). We see that 1 is included in the interval, confirming the conclusions drawn from looking at the p-value.

Remarks

Arrangement of the numbers in the table

Just if you wonder. The arrangement of the numbers in the contingency table does not mattter:

y = t(x)
y

##              Male Female
## Right-handed   43     44
## Left-handed     9      4

fisher.test(y)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  y
## p-value = 0.2392
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.09150811 1.71527769
## sample estimates:
## odds ratio 
##  0.4378606

Odds ratio in the output

An estimate of the odds ratio is displayed in the output above.
Citation from the help page of the function: “Note that the conditional Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used.” That means you cannot reproduce the given odds ratio by just using the numbers in the contingency table.