Pivot Table with Python

Kevin Barreiro
3 min readDec 9, 2021

Abstract

PivotTables facilitate the exploration of data to uncover insights that would have otherwise remained hidden without the construction thereof or the implementation of some arduous mathematical processes. In this notebook, we will contruct a PivotTable using the Pandas library for the Python programming language. The data we will utilize to perform the data analysis will be a sample of fictitious sales transactions that occurred on December 23rd, 2021.

Python Packages

The packages we will use for the anaylisys of the sales transaction dataset will be NumPy and Pandas. Begin by importing the packages:

import numpy as np
import pandas as pd

Data

The sales transaction dataset can be downloaded at the following link: https://www.kaggle.com/kevinbarreiro/sales-transactions. Once downloaded, we need to provide an appropriate file path to the CSV file to import the dataset. Upon the CSV file being located and read by a function within the Pandas package we imported in the previous step, a DataFrame will be constructed from the CSV file and referenced by a variable we will call stdf, which means sales transaction dataframe.

stdf = pd.read_csv('sales-transaction.csv')print(stdf)
A screenshot of the stdf dataframe.

Data Cleaning

Notice that the data in the Amount column of the stdf dataframe is of the str data type because of the dollar symbol attached to each amount. We need to remove the dollar symbol from each amount and cast the column to the float data type. No other columns will be modified for the sake of this notebook.

stdf['Amount'] = stdf['Amount'].str\
.replace(r'$', '', regex=True)\
.astype(float)
print(stdf)
A screenshot of the stdf dataframe after cleansing the data.

Pivot Table

Given that the data has been imported and cleaned up, we will construct two PivotTables from the stdf dataframe containing sales transaction data for December 23rd, 2021.

Revenue by Region

table = pd.pivot_table(
stdf,
index=['Region'],
margins=True,
margins_name='Grand Total',
aggfunc=np.sum)
print(table.filter(['Amount']))
A screenshot of the Revenue by Region PivotTable.

The Revenue by Region PivotTable displays revenue accumulated from sales transactions on December 23rd, 2021 for each region, with the last row displaying the total revenue collected on that date from sales transactions.

Product Revenue by Region

table = pd.pivot_table(
stdf,
index=['Region'],
columns=['Product'],
margins=True,
margins_name='Grand Total',
aggfunc=np.sum)
print(table['Amount'])
A screenshot of the Product Revenue by Region PivotTable.

The Product Revenue by Region PivotTable displays the revenue accumulated from sales transactions on December 23rd, 2021 for each product, by region. The Grand Total column displays the revenue accumulated for each region for all products in the region, while the Grand Total row displays the total revenue accumulated for each product in all regions. The value in the last row and last column displays the total revenue accumulated in all regions and products.

Region Revenue by Source and Time of Day

table = pd.pivot_table(
stdf,
index=['Source of Payment', 'Occurred At'],
columns=['Region'],
margins=True,
margins_name='Grand Total',
aggfunc=np.sum)
print(table['Amount'])
A screenshot of the Region Revenue by Source and Occurred At PivotTable.

The Region Revenue by Source and Ocurred At PivotTable displays the revenue accumulated from sales transactions on December 23rd, 2021 organized by the source of the payment, namely either through Email, In-Store, or the Web, then by the actual time the transaction took place in each source. The PivotTable then segments the sales transactions further into its corresponding region. The Grand Total column and row is the same concept as the previous PivotTable we constructed with the value in the last row and the last column displaying the total revenue collected from all sales transactions that occurred.

Conclusion

Thanks for reading; I hope it was useful.

Best regards,

Kevin Barreiro [themastersdev]

--

--

Kevin Barreiro

Writing about topics related to Information Systems.