Week 3 Python Data Analysis Interpretation:
Creating and Visualizing New Working Dataset
The plot below shows a 2D map of the entire original Mars crater database. All the 384,343 craters were plotted with Latitude on the Y axis and Longitude on the X axis. The red histogram on top of the plot shows the distribution of Crater Longitude while the blue histogram to the right of the plot shows the distribution of Crater Latitude. Notice that the number of craters is just too large for any meaningful visualization. Defining a new subset working dataset is therefore necessary for optimal data analysis.
The second figure below is a scatterplot of Crater Diameter (size) vs Crater Depth. This plot was used to define a new subset working dataset, with all invalid points and outliers removed. The red rectangular block defines the boundaries of the working dataset, which includes only craters with diameter greater than 0 and less than or equal to 100km; and depth greater than 0 and less than or equal to 3km.
The plot below is another 2D crater location map but this time only displaying the working dataset, which contains 76,512 craters. Notice that the visualization has greatly improved compared to the location map of the entire database shown above. Most importantly, crater latitude and longitude histograms have similar distribution in both the working dataset and the entire Mars database. This suggests that the working dataset is an excellent representative sample of the entire Mars crater population.
The plot reveals an overall trend in crater occurrence with some well-defined crater clusters. Overall, there are more craters in the southern hemisphere (0 to -90 degrees latitude) than in the northern hemisphere (0 to 90 degrees latitude). Moreover, crater population density increases in the middle of Mars near the equatorial region (-45 to 45 degrees latitude) but decreases towards the tips of both north and south poles respectively.
Frequency Distribution of First Variable: DEPTH_GROUP
I created a new variable called DEPTH_GROUP by collapsing crater depth variable (DEPTH_RIMFLOOR_TOPOG) into 10 different groups. Craters in the 0.0 to 0.3 km depth group are the most abundant, with 41,242 craters making up about 53.9% of the working dataset crater population.
Frequency Distribution of Second Variable: DIAMETER_GROUP
A second new variable called DIAMETER_GROUP was also created by collapsing crater diameter variable (DIAM_CIRCLE_IMAGE) into 20 different groups. Craters with diameter between 0 and 5 km are the most abundant, with 32,845 craters making up about 42.93% of the working dataset crater population.
Frequency Distribution of Third Variable: LATITUDE_GROUP
A third and final new variable, LATITUDE_GROUP, was created by collapsing crater latitude variable (LATITUDE_CIRCLE_IMAGE) into 12 different groups or regions. The highest crater occurrence is within the -30 to -15 degrees LATITUDE_GROUP. This group contains 12,886 craters which accounts for about 16.84% of the working dataset crater population. The distribution also shows that craters mostly occur in the middle of Mars (across 6 groups) between -45 to 45 degrees latitude. There are also more craters (47,974 craters) in the southern hemisphere than the northern hemisphere (76,512 - 47,974 = 28,538 craters).
Frequency Distribution of Fourth Variable: NUMBER_LAYERS
Having defined a new working dataset, I therefore generated a new frequency distribution for the NUMBER_LAYERS variable. The most abundant NUMBER_LAYERS is 0. There are 58,447 craters (about 76.39%) which do not have ejecta layers.
First part of week 3 assignment.
1. Python Data Analysis Code (Week 3): click here to view
Author:
Posted on December 13, 2015 by Okechukwu Ossai
Recent Comments