### Importing Libraries

import pandas as pd import numpy as np

This part of the code imports the necessary Python libraries. `pandas`

is used for data manipulation and analysis, and `numpy`

is used for working with arrays, although it isn’t directly utilized in the given code snippets.

### Creating a DataFrame

data = {'students': [65, 82, 72, 92, 83, 74, 54, 84, 65, 66]} df = pd.DataFrame(data)

`data`

: A dictionary with one key (`'students'`

) and its corresponding values given as a list of integers representing scores.`df`

: A DataFrame created from the`data`

dictionary using`pandas.DataFrame()`

. This structure is particularly useful for handling tabular data with potentially heterogeneously-typed columns.

### Calculating the Mean

mean_value = df['students'].mean() print(f"Mean value is {mean_value}") print("Mean Value is...{}".format(mean_value))

- The
`mean()`

method calculates the average of the numbers in the`'students'`

column of the DataFrame. - The mean value is printed in two formats using Python’s formatted string literals and the
`format()`

method. Both lines output the mean, which is`73.7`

.

### Calculating the Median

median_value = df['students'].median() print(f'Median using Pandas: {median_value}')

- The
`median()`

method computes the median value of the data in the`'students'`

column. The median is the value separating the higher half from the lower half of a data sample. - The median value is
`73.0`

and is printed using formatted string literals.

### Calculating the Mode

mode_value = df['students'].mode() print(f'Mode using Pandas: {mode_value}')

- The
`mode()`

method identifies the most frequently occurring value(s) in the`'students'`

column. - The output is a pandas Series showing that
`65`

appears most frequently. Since`mode()`

can return multiple values if there’s a tie, it always returns a Series. The mode of this dataset is displayed as the first entry (`0`

) in the Series with a value of`65`

.

### Calculating Quartiles

data = {'Values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]} df = pd.DataFrame(data) q1 = df['Values'].quantile(0.25) q2 = df['Values'].quantile(0.50) q3 = df['Values'].quantile(0.75) print(f"Print the value of Quartile Q1 ={q2}")

`data`

: A dictionary containing a list of values from 1 to 11.`df`

: A DataFrame created from the`data`

dictionary, which is used for further analysis.`quantile()`

: A method used to calculate the quartile values of the dataset.`0.25`

(Q1),`0.50`

(Q2, which is also the median), and`0.75`

(Q3) are the respective quartiles.

- The print statement incorrectly refers to Q2 as “Quartile Q1”. The output “Print the value of Quartile Q1 =6.0” shows the median of the dataset (Q2), not Q1.

### Calculating Percentiles

p20 = df['Values'].quantile(0.20) p60 = df['Values'].quantile(0.60) print(f'20th percentile: {p20}, 60th percentile: {p60}')

- This part calculates the 20th and 60th percentiles of the dataset using the
`quantile()`

method with parameters`0.20`

and`0.60`

. - The printed output gives the values of these percentiles. For instance, the 20th percentile is the value below which 20% of the observations may be found, and similarly for the 60th percentile.

### Calculating the Range

range_value = df['students'].max() - df['students'].min() print(f'Range: {range_value}')

- This part of the code is intended to calculate the range of the dataset, which is the difference between the maximum and minimum values.
- However, there is an error in the code. The DataFrame
`df`

does not have a column named`'students'`

; it should reference`'Values'`

instead. - The correct calculation should use
`df['Values'].max() - df['Values'].min()`

to find the range. - Once corrected, the
`range_value`

would correctly compute as`11 - 1 = 10`

.

### Define Data and Create DataFrame

data1 = np.array([2, 4, 4, 4, 5, 5, 7, 9]) data2 = np.array([10, 13, 15, 14, 10, 16, 18, 21]) df = pd.DataFrame({ 'Data1': data1, 'Data2': data2 })

`data1`

and`data2`

are numpy arrays containing numerical data.- A pandas DataFrame
`df`

is created with two columns named`Data1`

and`Data2`

, holding the respective data sets.

### Step 2: Calculate Basic Statistical Measures

mean_data1 = np.mean(data1) mean_data2 = np.mean(data2) std_data1 = np.std(data1, ddof=0) std_data2 = np.std(data2, ddof=0) mean_dev_data1 = np.mean(np.abs(data1 - mean_data1)) mean_dev_data2 = np.mean(np.abs(data2 - mean_data2))

- Means of
`data1`

and`data2`

are calculated using`np.mean()`

. - Standard deviations are calculated with
`np.std()`

using`ddof=0`

, which denotes the divisor used in calculations is`N`

(number of elements), indicating population standard deviation. - Mean deviations (average of absolute deviations from the mean) are calculated for both datasets.

### Calculate Combined Metrics

combined_mean = np.mean(np.concatenate([data1, data2])) combined_std = np.std(np.concatenate([data1, data2]), ddof=0)

- The mean and standard deviation of the combined data from
`data1`

and`data2`

.

### Calculate Range and Coefficients

range_data1 = np.ptp(data1) range_data2 = np.ptp(data2) coeff_of_range1 = range_data1 / (np.max(data1) + np.min(data1)) coeff_of_range2 = range_data2 / (np.max(data2) + np.min(data2))

- Range (difference between maximum and minimum values) is calculated using
`np.ptp()`

. - Coefficient of range (range divided by the sum of maximum and minimum values) for both datasets.

### Calculate Quartiles and Coefficients of Quartile Deviation

quartiles_data1 = np.percentile(data1, [25, 75]) quartiles_data2 = np.percentile(data2, [25, 75]) coeff_of_quartile_dev1 = (quartiles_data1[1] - quartiles_data1[0]) / (quartiles_data1[1] + quartiles_data1[0]) coeff_of_quartile_dev2 = (quartiles_data2[1] - quartiles_data2[0]) / (quartiles_data2[1] + quartiles_data2[0])

- Quartiles are calculated using
`np.percentile()`

. - Coefficient of quartile deviation (difference between upper and lower quartiles divided by their sum) for both datasets.

### Calculate Coefficient of Variation

coeff_of_variation1 = (std_data1 / mean_data1) * 100 coeff_of_variation2 = (std_data2 / mean_data2) * 100

- Coefficient of variation (standard deviation divided by the mean, expressed as a percentage) for both datasets.

### Print Results

print(f"Mean of Data1: {mean_data1}, Data2: {mean_data2}") print(f"Standard Deviation of Data1: {std_data1}, Data2: {std_data2}") print(f"Mean Deviation of Data1: {mean_dev_data1}, Data2: {mean_dev_data2}") print(f"Combined Mean: {combined_mean}") print(f"Combined Standard Deviation: {combined_std}") print(f"Coefficient of Range Data1: {coeff_of_range1}, Data2: {coeff_of_range2}") print(f"Coefficient of Quartile Deviation Data1: {coeff_of_quartile_dev1}, Data2: {coeff_of_quartile_dev2}") print(f"Coefficient of Variation Data1: {coeff_of_variation1}, Data2: {coeff_of_variation2}")

All calculated values are printed out, providing a comprehensive statistical analysis of the two datasets.

**Complete the Quiz**

## Leave a Reply