What does logistic regression do?


\[ \ln \left( \frac{p}{1-p} \right) = \beta_0 + \beta_1 \cdot x_1 + \beta_2 \cdot x_2 + \dots \]

\[ p = \frac{1}{1 + e^{ - \left( \beta_0 + \beta_1 \cdot x_1 + \beta_2 \cdot x_2 + \dots \right) }} \]


Assumptions


Logistic regression using glm()

Beetles data (for a given dosage of an insecticide: is beetle number x killed or not?):

The data can be summarized by registering the portion of killed beetles:

beetles = read.table("beetles.txt", header = TRUE)
beetles
##   Dose Exposed Killed
## 1 49.1      59      6
## 2 53.0      60     13
## 3 56.9      62     18
## 4 60.8      56     28
## 5 64.8      63     52
## 6 68.7      59     53
## 7 72.6      62     61
## 8 76.5      60     60

Add a column with the proportion of killed beetles:

Proportion = beetles$Killed / beetles$Exposed
beetles = cbind(beetles, Proportion)
beetles
##   Dose Exposed Killed Proportion
## 1 49.1      59      6  0.1016949
## 2 53.0      60     13  0.2166667
## 3 56.9      62     18  0.2903226
## 4 60.8      56     28  0.5000000
## 5 64.8      63     52  0.8253968
## 6 68.7      59     53  0.8983051
## 7 72.6      62     61  0.9838710
## 8 76.5      60     60  1.0000000
plot(Proportion ~ Dose, data = beetles, main = "Proportion killed vs. dosage")

If this is an S-shaped function varying between 0 and 1, then logistic regression might work !

\(\begin{aligned} p = \frac{1}{1 + e^{ - t }} = \frac{e^t}{1 + e^t} \end{aligned}\)


Logistic function


survived = beetles$Exposed - beetles$Killed
killed = beetles$Killed
exposure = beetles$Dose
logfit = glm(cbind(killed, survived) ~ exposure, family = binomial) # logit link!
summary(logfit)
## 
## Call:
## glm(formula = cbind(killed, survived) ~ exposure, family = binomial)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2746  -0.4668   0.7688   0.9544   1.2990  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -14.82300    1.28959  -11.49   <2e-16 ***
## exposure      0.24942    0.02139   11.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 284.2024  on 7  degrees of freedom
## Residual deviance:   7.3849  on 6  degrees of freedom
## AIC: 37.583
## 
## Number of Fisher Scoring iterations: 4


Reading the output


Draw the regression line using the glm() results

beta.zero = coef(logfit)[1]
beta.zero   # -14.82
## (Intercept) 
##     -14.823
beta.one  = coef(logfit)[2]
beta.one    # 0.25  
##  exposure 
## 0.2494156

\[ p(x) = \frac{e^{-14.82 + 0.25x}}{1 + e^{-14.82 + 0.25x}} \]

plot(Proportion ~ Dose, data = beetles)    # observed
fitted <- function(x) { exp(beta.zero + beta.one*x)/(1 + exp(beta.zero + beta.one*x)) }
lowest.x = min(beetles$Dose)
highest.x = max(beetles$Dose)
curve(fitted(x), from = lowest.x, to = highest.x, n = 201, add = TRUE, col = "red")