Chi-Square Goodness-of-Fit Test

11.3. Chi-Square Goodness-of-Fit Test#

Here, I show a more elegant way to count the number of birth per month by using the groupby() method of a Pandas dataframe. Let’s start by loading the births data:

import pandas as pd
births=pd.read_csv('https://www.fdsp.net/data/births.csv')

Here is how we

births_by_month = np.zeros(12)
for i, month in enumerate(months):
  births_by_month[i] = \
    births.query('month=="' + month + '"')['count'].sum()
print(births_by_month)
[6906798. 6448725. 7080880. 6788266. 7112239. 7059986. 7461489. 7552007.
 7365904. 7220646. 6813037. 7079453.]

A more elegant approach to doing this uses the dataframe’s groupby() method:

births.groupby('month').sum(numeric_only=True)['count']
month
Apr    6788266
Aug    7552007
Dec    7079453
Feb    6448725
Jan    6906798
Jul    7461489
Jun    7059986
Mar    7080880
May    7112239
Nov    6813037
Oct    7220646
Sep    7365904
Name: count, dtype: int64

We will not use the output of this statement that uses groupby() and sum(), but if we were to use that, we would need to be sure to properly index that data to align it with the player birth months, which are in numerical order.

11.3.1. Terminology Review#

Use the flashcards below to help you review the terminology introduced in this chapter. \(~~~~ ~~~~ ~~~~ \mbox{ }\)