For the final week in the 'Regression Modeling in Practice' course, I will be testing a logistic regression model using the Marscrater dataset to examine my research question.
- Is crater diameter associated with crater depth?
In addition to my primary explanatory variable (crater diameter), I will consider the effect of a potential confounding variable (Crater Latitude) on the association between crater depth (response variable) and crater diameter.
Null Hypothesis (H0):
- Crater depth is not associated with crater diameter.
Alternative Hypothesis (H1):
- Crater depth is associated with crater diameter.
My response, explanatory and confounding variables are originally all quantitative. I have therefore converted them into binary categorical variables for the purpose of this exercise. Crater Depth was collapsed into 2 categories; 0 = Shallow Craters and 1 = Deep Craters. Crater Diameter was collapsed into 2 categories; 0 = Small Craters and 1 = Large Craters. Crater Latitude was collapsed into 2 categories; 0 = Polar Regions and 1 = Equatorial Regions.
Interpretation of Logistic Regression Models
Summary of Results
After adjusting for potential confounding factors (Crater Latitude), the odds of having a deep crater were more than 66 times higher for large craters than for small craters (OR =66.67, 95% CI = 55.74 to 79.73, p < 0.0001). Crater Latitude (Location) was also significantly associated with crater depth, such that craters located at the Equatorial Region were significantly more likely to be deep craters (OR = 4.45, 95% CI = 3.69 to 5.35, p = < 0.0001). The results support my alternative hypothesis for the association between crater diameter (primary explanatory variable) and crater depth (response variable).
Logistic Regression Between Response Variable and Primary Explanatory Variable
Based on the model results above, the logistic regression is statistically significant at a p-value < 0.0001. Large craters are 66.63 times more likely to be associated with deep craters than shallow craters. The odds ratio indicates that there’s a 95% certainty that the true population odds ratio fall between 55.77 and 79.61.
But what happens if I run a logistic regression between crater depth and crater diameter while controlling for crater location?
Logistic Regression Between Response Variable, Primary Explanatory Variable and Confounding Variable
From the model results above, there is no case of confounding. As you can see, both crater diameter and crater location are independently associated with crater depth (p < 0.0001 for both variables).
Among all craters in Mars, the sample population classified as large craters are 66.67 times more likely to be associated with deep craters than shallow craters after controlling for crater location. Also, craters located at the equatorial region are 4.45 times more likely to be deep craters than shallow craters, after controlling for crater diameter. Crater diameter is more strongly associated with crater depth than crater location, since it has higher odds ratio (66.67 compared to 4.45).
For the population of Mars craters, using the 95% CI, we can say that large craters are anywhere between 55.74 to 79.73 times more likely to be associated with deep craters than shallow craters. And craters located at the equatorial region are between 3.69 and 5.35 times more likely to be deep craters than shallow craters. Both of these estimates are calculated after accounting for the alternate disorder.
Python Code: Testing a Logistic Regression Model
Data Loading and Error Handling
Logistic Regression Model
Posted on February 28, 2016 by Okechukwu Ossai