Cumulative Relative Frequency Table with Python
2 min readJul 8, 2021
I came across the concept of cumulative relative frequency tables as I was refreshing myself with introductory statistics material. I thought to implement the concept using Python with the NumPy and pandas library. The data set consists of a convenient sample, for the sake of simplicity, of thirty people and the average hours of sleep received on a weeknight; It can be found at:
https://www.kaggle.com/mlomuscio/sleepstudypilot
Here is code to construct a cumulative relative frequency table with Python:
import numpy as np
import pandas as pd
# Initialize a numpy array with sample data
sample_data = np.array([8, 6, 6, 7, 7,
7, 7, 7, 4, 6,
10, 7, 7, 8, 7,
8, 6, 8, 9, 8,
2, 4, 5, 7, 5,
7, 6, 6, 7, 9])
# Construct a pandas DataFrame with the sample data
sample_df = pd.DataFrame(sample_data, index=[s for s in range(1, 31)], columns=["Avg. Hours Slept"])
# Construct a frequency table of sample data
series = pd.Series(sample_data, dtype=int)
# Sort the amount of hours slept (ascending)
frequencies = series.value_counts().sort_index()
frequency_dataframe = pd.DataFrame(frequencies, columns=["Frequency"])
frequency_dataframe.index.name = "Avg. Hours Slept"
# Calculate the relative frequencies for each frequency of average hours of sleep
relative_frequencies = [frequency / sample_data.size for frequency in frequencies]
# Add the relative frequencies to frequency_dataframe
frequency_dataframe.insert(1, "Relative Frequency", relative_frequencies, True)
# Construct a cumulative relative frequency table table of sample_data
cumulative_relative_frequencies = np.empty(len(relative_frequencies), dtype=float)
for index in range(cumulative_relative_frequencies.size):
if index == 0:
cumulative_relative_frequencies[index] = relative_frequencies[index]
else:
cumulative_relative_frequencies[index] = cumulative_relative_frequencies[index - 1] + relative_frequencies[index]
# Add the cumulative relative frequencies to frequency_dataframe
frequency_dataframe.insert(2, "Cumulative Relative Frequency", cumulative_relative_frequencies, True)
# Display the cumulative relative frequency table
print(frequency_dataframe)