Cumulative Relative Frequency Table with Python

Kevin Barreiro
2 min readJul 8, 2021

--

I came across the concept of cumulative relative frequency tables as I was refreshing myself with introductory statistics material. I thought to implement the concept using Python with the NumPy and pandas library. The data set consists of a convenient sample, for the sake of simplicity, of thirty people and the average hours of sleep received on a weeknight; It can be found at:

https://www.kaggle.com/mlomuscio/sleepstudypilot

Here is code to construct a cumulative relative frequency table with Python:

import numpy as np
import pandas as pd


# Initialize a numpy array with sample data
sample_data = np.array([8, 6, 6, 7, 7,
7, 7, 7, 4, 6,
10, 7, 7, 8, 7,
8, 6, 8, 9, 8,
2, 4, 5, 7, 5,
7, 6, 6, 7, 9])

# Construct a pandas DataFrame with the sample data
sample_df = pd.DataFrame(sample_data, index=[s for s in range(1, 31)], columns=["Avg. Hours Slept"])

# Construct a frequency table of sample data
series = pd.Series(sample_data, dtype=int)

# Sort the amount of hours slept (ascending)
frequencies = series.value_counts().sort_index()
frequency_dataframe = pd.DataFrame(frequencies, columns=["Frequency"])
frequency_dataframe.index.name = "Avg. Hours Slept"

# Calculate the relative frequencies for each frequency of average hours of sleep
relative_frequencies = [frequency / sample_data.size for frequency in frequencies]

# Add the relative frequencies to frequency_dataframe
frequency_dataframe.insert(1, "Relative Frequency", relative_frequencies, True)

# Construct a cumulative relative frequency table table of sample_data
cumulative_relative_frequencies = np.empty(len(relative_frequencies), dtype=float)
for index in range(cumulative_relative_frequencies.size):
if index == 0:
cumulative_relative_frequencies[index] = relative_frequencies[index]
else:
cumulative_relative_frequencies[index] = cumulative_relative_frequencies[index - 1] + relative_frequencies[index]

# Add the cumulative relative frequencies to frequency_dataframe
frequency_dataframe.insert(2, "Cumulative Relative Frequency", cumulative_relative_frequencies, True)

# Display the cumulative relative frequency table
print(frequency_dataframe)

--

--

Kevin Barreiro
Kevin Barreiro

Written by Kevin Barreiro

Writing about topics related to Information Systems.

No responses yet