Importing Libraries
import pandas as pd import numpy as np
This part of the code imports the necessary Python libraries. pandas
is used for data manipulation and analysis, and numpy
is used for working with arrays, although it isn’t directly utilized in the given code snippets.
Creating a DataFrame
data = {'students': [65, 82, 72, 92, 83, 74, 54, 84, 65, 66]} df = pd.DataFrame(data)
data
: A dictionary with one key ('students'
) and its corresponding values given as a list of integers representing scores.df
: A DataFrame created from thedata
dictionary usingpandas.DataFrame()
. This structure is particularly useful for handling tabular data with potentially heterogeneously-typed columns.
Calculating the Mean
mean_value = df['students'].mean() print(f"Mean value is {mean_value}") print("Mean Value is...{}".format(mean_value))
- The
mean()
method calculates the average of the numbers in the'students'
column of the DataFrame. - The mean value is printed in two formats using Python’s formatted string literals and the
format()
method. Both lines output the mean, which is73.7
.
Calculating the Median
median_value = df['students'].median() print(f'Median using Pandas: {median_value}')
- The
median()
method computes the median value of the data in the'students'
column. The median is the value separating the higher half from the lower half of a data sample. - The median value is
73.0
and is printed using formatted string literals.
Calculating the Mode
mode_value = df['students'].mode() print(f'Mode using Pandas: {mode_value}')
- The
mode()
method identifies the most frequently occurring value(s) in the'students'
column. - The output is a pandas Series showing that
65
appears most frequently. Sincemode()
can return multiple values if there’s a tie, it always returns a Series. The mode of this dataset is displayed as the first entry (0
) in the Series with a value of65
.
Calculating Quartiles
data = {'Values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]} df = pd.DataFrame(data) q1 = df['Values'].quantile(0.25) q2 = df['Values'].quantile(0.50) q3 = df['Values'].quantile(0.75) print(f"Print the value of Quartile Q1 ={q2}")
data
: A dictionary containing a list of values from 1 to 11.df
: A DataFrame created from thedata
dictionary, which is used for further analysis.quantile()
: A method used to calculate the quartile values of the dataset.0.25
(Q1),0.50
(Q2, which is also the median), and0.75
(Q3) are the respective quartiles.
- The print statement incorrectly refers to Q2 as “Quartile Q1”. The output “Print the value of Quartile Q1 =6.0” shows the median of the dataset (Q2), not Q1.
Calculating Percentiles
p20 = df['Values'].quantile(0.20) p60 = df['Values'].quantile(0.60) print(f'20th percentile: {p20}, 60th percentile: {p60}')
- This part calculates the 20th and 60th percentiles of the dataset using the
quantile()
method with parameters0.20
and0.60
. - The printed output gives the values of these percentiles. For instance, the 20th percentile is the value below which 20% of the observations may be found, and similarly for the 60th percentile.
Calculating the Range
range_value = df['students'].max() - df['students'].min() print(f'Range: {range_value}')
- This part of the code is intended to calculate the range of the dataset, which is the difference between the maximum and minimum values.
- However, there is an error in the code. The DataFrame
df
does not have a column named'students'
; it should reference'Values'
instead. - The correct calculation should use
df['Values'].max() - df['Values'].min()
to find the range. - Once corrected, the
range_value
would correctly compute as11 - 1 = 10
.
Define Data and Create DataFrame
data1 = np.array([2, 4, 4, 4, 5, 5, 7, 9]) data2 = np.array([10, 13, 15, 14, 10, 16, 18, 21]) df = pd.DataFrame({ 'Data1': data1, 'Data2': data2 })
data1
anddata2
are numpy arrays containing numerical data.- A pandas DataFrame
df
is created with two columns namedData1
andData2
, holding the respective data sets.
Step 2: Calculate Basic Statistical Measures
mean_data1 = np.mean(data1) mean_data2 = np.mean(data2) std_data1 = np.std(data1, ddof=0) std_data2 = np.std(data2, ddof=0) mean_dev_data1 = np.mean(np.abs(data1 - mean_data1)) mean_dev_data2 = np.mean(np.abs(data2 - mean_data2))
- Means of
data1
anddata2
are calculated usingnp.mean()
. - Standard deviations are calculated with
np.std()
usingddof=0
, which denotes the divisor used in calculations isN
(number of elements), indicating population standard deviation. - Mean deviations (average of absolute deviations from the mean) are calculated for both datasets.
Calculate Combined Metrics
combined_mean = np.mean(np.concatenate([data1, data2])) combined_std = np.std(np.concatenate([data1, data2]), ddof=0)
- The mean and standard deviation of the combined data from
data1
anddata2
.
Calculate Range and Coefficients
range_data1 = np.ptp(data1) range_data2 = np.ptp(data2) coeff_of_range1 = range_data1 / (np.max(data1) + np.min(data1)) coeff_of_range2 = range_data2 / (np.max(data2) + np.min(data2))
- Range (difference between maximum and minimum values) is calculated using
np.ptp()
. - Coefficient of range (range divided by the sum of maximum and minimum values) for both datasets.
Calculate Quartiles and Coefficients of Quartile Deviation
quartiles_data1 = np.percentile(data1, [25, 75]) quartiles_data2 = np.percentile(data2, [25, 75]) coeff_of_quartile_dev1 = (quartiles_data1[1] - quartiles_data1[0]) / (quartiles_data1[1] + quartiles_data1[0]) coeff_of_quartile_dev2 = (quartiles_data2[1] - quartiles_data2[0]) / (quartiles_data2[1] + quartiles_data2[0])
- Quartiles are calculated using
np.percentile()
. - Coefficient of quartile deviation (difference between upper and lower quartiles divided by their sum) for both datasets.
Calculate Coefficient of Variation
coeff_of_variation1 = (std_data1 / mean_data1) * 100 coeff_of_variation2 = (std_data2 / mean_data2) * 100
- Coefficient of variation (standard deviation divided by the mean, expressed as a percentage) for both datasets.
Print Results
print(f"Mean of Data1: {mean_data1}, Data2: {mean_data2}") print(f"Standard Deviation of Data1: {std_data1}, Data2: {std_data2}") print(f"Mean Deviation of Data1: {mean_dev_data1}, Data2: {mean_dev_data2}") print(f"Combined Mean: {combined_mean}") print(f"Combined Standard Deviation: {combined_std}") print(f"Coefficient of Range Data1: {coeff_of_range1}, Data2: {coeff_of_range2}") print(f"Coefficient of Quartile Deviation Data1: {coeff_of_quartile_dev1}, Data2: {coeff_of_quartile_dev2}") print(f"Coefficient of Variation Data1: {coeff_of_variation1}, Data2: {coeff_of_variation2}")
All calculated values are printed out, providing a comprehensive statistical analysis of the two datasets.
Complete the Quiz
Leave a Reply