Assignment 2_DAT: Running a Chi-Square Test of Independence

This week I am going to use the Chi-square test of independence to test my Week 1 hypothesis about the relationship between crater depth and crater location using the Marscrater dataset. The research question again is, "does crater depth depend on crater location (crater latitude)?"

For this analysis, I will use a categorical explanatory variable with 3 levels (MARS_REGION), which was obtained by collapsing crater latitude variable into 3 categories. MARS_REGION 1 = North Pole, MARS_REGION 2 = Near Equatorial Region, MARS_REGION 3 = South Pole.

The response variable is also a categorical variable (DEPTH_CATEGORY) with 2 levels, obtained by collapsing crater depth variable into 2 categories. DEPTH_CATEGORY 0 = Shallow Craters, DEPTH_CATEGORY 1 = Deep Craters. The shallow craters are craters with depth less than 1km while deep craters are craters with depth greater than or equal to 1km.

The aim of this week's hypothesis testing is to investigate if the proportions of craters classified as shallow or deep are equal or not equal in the different regions on Mars; North Pole, Equatorial and South Pole regions.

Null Hypothesis (H0):

  • There is no difference in the depths of craters located in different regions on Mars. (Crater depth and crater latitude are independent)

Alternative Hypothesis (H1):

  • There is difference in the depths of craters located in different regions on Mars. (Crater depth and crater latitude are not independent)

Methodology:

In order to determine whether crater depth and crater latitude are independent or dependent, I performed a  3 x 2 Chi-square test of independence using a categorical explanatory variable with 3 levels and a categorical response variable with 2 levels.

Python Code: Chi-Square Test

DAT_Code 2

Python Code: Post Hoc Chi-Square Test

DAT_Code 3

Model Interpretation for Chi-Square Tests

DAT_Code Results 3

Excel_Image_Week4-1

Excel_Image_Week4-3

In determining whether there is an association between crater depth and crater location,  a chi-square test of independence revealed that among uneroded fresh craters (my sample), deep craters were more likely to be located around the Equator (12.7%) compared to the North Pole (2.7%) and South Pole (3.6%) respectively. It is observed that crater depth increases as you move away from the Polar Regions (North and South) towards the Equatorial region. The table of observed and expected counts also shows that both shallow and deep craters are more abundant at the Equator than at the Poles. Chi-square value (X2) = 342.70, df = 2, p = 0.0000

The test statistic (X2) is very large and the p value of approximately zero is far less than 0.05 and the 3-level Bonferroni adjusted p value of 0.017, indicating that the data are different enough from the null hypothesis and so it should be rejected. However, since the explanatory variable is more than 2 categories, a post hoc test is required to avoid type 1 error.

Model Interpretation for Post Hoc Chi-Square Test Results

DAT_Code Results 4

Excel_Image_Week4-2

The explanatory variable (MARS_REGION) has 3 categories; 1, 2 and 3. A post hoc chi-square test was performed to determine which groups are different from the others. So, I ran a 2 x 2 post hoc chi-square test for each of the three-paired comparisons; comparing MARS_REGION 1 & 2, MARS_REGION 1 & 3, and MARS_REGION 2 & 3.

The table above shows the p values for each of the paired comparisons from the chi-square test output. Focusing on deep craters (DEPTH_CATEGORY 1), it is observed that crater depth in MARS_REGION 2 (Equator) is significantly different from MARS_REGION 1 (North Pole) and from MARS_REGION 3 (South Pole) with p value of 0.0000. This p value is less than 0.05 and also less than the Bonferroni adjusted p value of 0.017. The respective X2 values of 194.30 and 164.83 are very large. So, I reject the null hypothesis. This implies that there is an association between crater depth and crater location. Crater depth is determined by the location of the crater on Mars, for craters deeper than 1km. Deep craters tend to occur more at the Equatorial region than at the South and North Poles respectively.

However, looking at the paired comparison between MARS_REGION 1 (North Pole) and from MARS_REGION 3 (South Pole), it is observed that the depths are not significantly different from each other. So, I accept the null hypothesis for this particular paired comparison, since the p value of 0.0925 is not less than 0.05 and also not less than the Bonferroni adjusted p value of 0.017. The North Pole and South Pole groups do not differ from one another.

Conclusion: Crater depth distribution is similar between the two Polar Regions (North and South Poles) but become significantly different as you move towards the Equator. This post hoc results, which indicated association between crater depth and crater location, is better visualized with the plot below.

output_DAT_wk2-1_edited

 

Author:
Posted on January 10, 2016 by Okechukwu Ossai

 

Posted in Data Analysis Tools Course.

admin

Leave a Reply

Your email address will not be published. Required fields are marked *