Mastering Data Analysis with Pandas #4


Task 1. DataSet

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd

Discovering the DataSet

In [2]:
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df
Out[2]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/2010 08:26 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/2010 08:26 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
... ... ... ... ... ... ... ... ...
541904 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 12/9/2011 12:50 0.85 12680.0 France
541905 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 12/9/2011 12:50 2.10 12680.0 France
541906 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 12/9/2011 12:50 4.15 12680.0 France
541907 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 12/9/2011 12:50 4.15 12680.0 France
541908 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 12/9/2011 12:50 4.95 12680.0 France

541909 rows × 8 columns

Data Types

In [3]:
sales_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   InvoiceNo    541909 non-null  object 
 1   StockCode    541909 non-null  object 
 2   Description  540455 non-null  object 
 3   Quantity     541909 non-null  int64  
 4   InvoiceDate  541909 non-null  object 
 5   UnitPrice    541909 non-null  float64
 6   CustomerID   406829 non-null  float64
 7   Country      541909 non-null  object 
dtypes: float64(2), int64(1), object(5)
memory usage: 33.1+ MB

Confirming changes

In [4]:
sales_df['InvoiceDate']=pd.to_datetime(sales_df['InvoiceDate'])
sales_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   InvoiceNo    541909 non-null  object        
 1   StockCode    541909 non-null  object        
 2   Description  540455 non-null  object        
 3   Quantity     541909 non-null  int64         
 4   InvoiceDate  541909 non-null  datetime64[ns]
 5   UnitPrice    541909 non-null  float64       
 6   CustomerID   406829 non-null  float64       
 7   Country      541909 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB

How many Null Values in each Column

In [5]:
sales_df.isnull().sum()
Out[5]:
InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64

Unique Countries

In [6]:
sales_df['Country'].unique()
Out[6]:
array(['United Kingdom', 'France', 'Australia', 'Netherlands', 'Germany',
       'Norway', 'EIRE', 'Switzerland', 'Spain', 'Poland', 'Portugal',
       'Italy', 'Belgium', 'Lithuania', 'Japan', 'Iceland',
       'Channel Islands', 'Denmark', 'Cyprus', 'Sweden', 'Austria',
       'Israel', 'Finland', 'Bahrain', 'Greece', 'Hong Kong', 'Singapore',
       'Lebanon', 'United Arab Emirates', 'Saudi Arabia',
       'Czech Republic', 'Canada', 'Unspecified', 'Brazil', 'USA',
       'European Community', 'Malta', 'RSA'], dtype=object)

How many Unique Values in each Column

In [7]:
sales_df.nunique()
Out[7]:
InvoiceNo      25900
StockCode       4070
Description     4223
Quantity         722
InvoiceDate    23260
UnitPrice       1630
CustomerID      4372
Country           38
dtype: int64

Task 2. Group By

In [8]:
sales_df
Out[8]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2010-12-01 08:26:00 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2010-12-01 08:26:00 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 2010-12-01 08:26:00 3.39 17850.0 United Kingdom
... ... ... ... ... ... ... ... ...
541904 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 2011-12-09 12:50:00 0.85 12680.0 France
541905 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2011-12-09 12:50:00 2.10 12680.0 France
541906 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 2011-12-09 12:50:00 4.15 12680.0 France
541907 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-12-09 12:50:00 4.15 12680.0 France
541908 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 2011-12-09 12:50:00 4.95 12680.0 France

541909 rows × 8 columns

Retrieve the mean, max and min of all Unit Prices by Country

In [9]:
sales_df.groupby('Country')['UnitPrice'].agg(['mean', 'max', 'min'])
Out[9]:
mean max min
Country
Australia 3.220612 350.00 0.00
Austria 4.243192 40.00 0.12
Bahrain 4.556316 12.75 1.25
Belgium 3.644335 39.95 0.12
Brazil 4.456250 10.95 0.85
Canada 6.030331 550.94 0.10
Channel Islands 4.932124 293.00 0.19
Cyprus 6.302363 320.69 0.12
Czech Republic 2.938333 40.00 0.29
Denmark 3.256941 18.00 0.21
EIRE 5.911077 1917.00 0.00
European Community 4.820492 18.00 0.55
Finland 5.448705 275.60 0.12
France 5.028864 4161.06 0.00
Germany 3.966930 599.50 0.00
Greece 4.885548 50.00 0.14
Hong Kong 42.505208 2653.95 0.21
Iceland 2.644011 12.75 0.25
Israel 3.633131 125.00 0.06
Italy 4.831121 300.00 0.12
Japan 2.276145 45.57 0.21
Lebanon 5.387556 14.95 0.55
Lithuania 2.841143 5.95 1.25
Malta 5.244173 65.00 0.19
Netherlands 2.738317 206.40 0.00
Norway 6.012026 700.00 0.00
Poland 4.170880 40.00 0.19
Portugal 8.582976 1241.98 0.12
RSA 4.277586 14.95 0.00
Saudi Arabia 2.411000 5.49 0.42
Singapore 109.645808 3949.32 0.19
Spain 4.987544 1715.85 0.00
Sweden 3.910887 40.00 0.19
Switzerland 3.403442 40.00 0.00
USA 2.216426 16.95 0.42
United Arab Emirates 3.380735 37.50 0.29
United Kingdom 4.532422 38970.00 -11062.06
Unspecified 2.699574 16.95 0.19

Task 3. Multi-Indexed DataFrames

In [10]:
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df
Out[10]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/2010 08:26 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/2010 08:26 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/2010 08:26 3.39 17850.0 United Kingdom
... ... ... ... ... ... ... ... ...
541904 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 12/9/2011 12:50 0.85 12680.0 France
541905 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 12/9/2011 12:50 2.10 12680.0 France
541906 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 12/9/2011 12:50 4.15 12680.0 France
541907 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 12/9/2011 12:50 4.15 12680.0 France
541908 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 12/9/2011 12:50 4.95 12680.0 France

541909 rows × 8 columns

Multi-Indexing using Country and InvoiceDate

In [11]:
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df
Out[11]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Country InvoiceDate
United Kingdom 12/1/2010 08:26 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2.55 17850.0
12/1/2010 08:26 536365 71053 WHITE METAL LANTERN 6 3.39 17850.0
12/1/2010 08:26 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2.75 17850.0
12/1/2010 08:26 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 3.39 17850.0
12/1/2010 08:26 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 3.39 17850.0
... ... ... ... ... ... ... ...
France 12/9/2011 12:50 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 0.85 12680.0
12/9/2011 12:50 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2.10 12680.0
12/9/2011 12:50 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 4.15 12680.0
12/9/2011 12:50 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 4.15 12680.0
12/9/2011 12:50 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 4.95 12680.0

541909 rows × 6 columns

In [12]:
sales_df.sort_index(inplace=True)
sales_df
Out[12]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Country InvoiceDate
Australia 1/10/2011 09:58 540557 22523 CHILDS GARDEN FORK PINK 96 0.85 12415.0
1/11/2011 09:47 540700 21581 SKULLS DESIGN COTTON TOTE BAG 6 2.25 12393.0
1/11/2011 09:47 540700 22619 SET OF 6 SOLDIER SKITTLES 8 3.75 12393.0
1/11/2011 09:47 540700 84997B RED 3 PIECE RETROSPOT CUTLERY SET 6 3.75 12393.0
1/11/2011 09:47 540700 20727 LUNCH BAG BLACK SKULL. 20 1.65 12393.0
... ... ... ... ... ... ... ...
Unspecified 9/2/2011 12:17 565303 85227 SET OF 6 3D KIT CARDS FOR KIDS 4 0.85 NaN
9/2/2011 12:17 565303 22138 BAKING SET 9 PIECE RETROSPOT 2 4.95 NaN
9/2/2011 12:17 565303 21889 WOODEN BOX OF DOMINOES 5 1.25 NaN
9/2/2011 12:17 565303 22550 HOLIDAY FUN LUDO 2 3.75 NaN
9/2/2011 12:17 565303 23127 FELTCRAFT GIRL NICOLE KIT 1 4.95 NaN

541909 rows × 6 columns

Retrieve Indexes

In [13]:
sales_df.index.names
Out[13]:
FrozenList(['Country', 'InvoiceDate'])

Retrieve the first Index

In [14]:
#Multi-Index Objects
sales_df.index[0]
Out[14]:
('Australia', '1/10/2011 09:58')

Task 4. Multi-Indexing Operations #1

In [15]:
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df
Out[15]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Country InvoiceDate
United Kingdom 12/1/2010 08:26 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2.55 17850.0
12/1/2010 08:26 536365 71053 WHITE METAL LANTERN 6 3.39 17850.0
12/1/2010 08:26 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2.75 17850.0
12/1/2010 08:26 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 3.39 17850.0
12/1/2010 08:26 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 3.39 17850.0
... ... ... ... ... ... ... ...
France 12/9/2011 12:50 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 0.85 12680.0
12/9/2011 12:50 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2.10 12680.0
12/9/2011 12:50 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 4.15 12680.0
12/9/2011 12:50 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 4.15 12680.0
12/9/2011 12:50 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 4.95 12680.0

541909 rows × 6 columns

Retrieve the Values of the first Index

In [16]:
sales_df.index.get_level_values(0)
Out[16]:
Index(['United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom',
       'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom',
       'United Kingdom', 'United Kingdom',
       ...
       'France', 'France', 'France', 'France', 'France', 'France', 'France',
       'France', 'France', 'France'],
      dtype='object', name='Country', length=541909)

Retrieve the Values of the second Index

In [17]:
sales_df.index.get_level_values(1)
Out[17]:
Index(['12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26',
       '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26',
       '12/1/2010 08:26', '12/1/2010 08:28', '12/1/2010 08:28',
       '12/1/2010 08:34',
       ...
       '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50',
       '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50',
       '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50',
       '12/9/2011 12:50'],
      dtype='object', name='InvoiceDate', length=541909)

Change the Name of the Indexes

In [18]:
sales_df.index.set_names(names=['Transaction Location','Transaction Date'], inplace=True)
sales_df
Out[18]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Transaction Location Transaction Date
United Kingdom 12/1/2010 08:26 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 2.55 17850.0
12/1/2010 08:26 536365 71053 WHITE METAL LANTERN 6 3.39 17850.0
12/1/2010 08:26 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 2.75 17850.0
12/1/2010 08:26 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 3.39 17850.0
12/1/2010 08:26 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 3.39 17850.0
... ... ... ... ... ... ... ...
France 12/9/2011 12:50 581587 22613 PACK OF 20 SPACEBOY NAPKINS 12 0.85 12680.0
12/9/2011 12:50 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2.10 12680.0
12/9/2011 12:50 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 4.15 12680.0
12/9/2011 12:50 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 4.15 12680.0
12/9/2011 12:50 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 4.95 12680.0

541909 rows × 6 columns


Task 5. Multi-Indexing Operations #2

In [19]:
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df.sort_index(inplace=True)
sales_df
Out[19]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Country InvoiceDate
Australia 1/10/2011 09:58 540557 22523 CHILDS GARDEN FORK PINK 96 0.85 12415.0
1/11/2011 09:47 540700 21581 SKULLS DESIGN COTTON TOTE BAG 6 2.25 12393.0
1/11/2011 09:47 540700 22619 SET OF 6 SOLDIER SKITTLES 8 3.75 12393.0
1/11/2011 09:47 540700 84997B RED 3 PIECE RETROSPOT CUTLERY SET 6 3.75 12393.0
1/11/2011 09:47 540700 20727 LUNCH BAG BLACK SKULL. 20 1.65 12393.0
... ... ... ... ... ... ... ...
Unspecified 9/2/2011 12:17 565303 85227 SET OF 6 3D KIT CARDS FOR KIDS 4 0.85 NaN
9/2/2011 12:17 565303 22138 BAKING SET 9 PIECE RETROSPOT 2 4.95 NaN
9/2/2011 12:17 565303 21889 WOODEN BOX OF DOMINOES 5 1.25 NaN
9/2/2011 12:17 565303 22550 HOLIDAY FUN LUDO 2 3.75 NaN
9/2/2011 12:17 565303 23127 FELTCRAFT GIRL NICOLE KIT 1 4.95 NaN

541909 rows × 6 columns

Retrieve specific elements using Multi-Indexing

In [20]:
sales_df.loc['Australia','1/11/2011 09:47']
Out[20]:
InvoiceNo StockCode Description Quantity UnitPrice CustomerID
Country InvoiceDate
Australia 1/11/2011 09:47 540700 21581 SKULLS DESIGN COTTON TOTE BAG 6 2.25 12393.0
1/11/2011 09:47 540700 22619 SET OF 6 SOLDIER SKITTLES 8 3.75 12393.0
1/11/2011 09:47 540700 84997B RED 3 PIECE RETROSPOT CUTLERY SET 6 3.75 12393.0
1/11/2011 09:47 540700 20727 LUNCH BAG BLACK SKULL. 20 1.65 12393.0
1/11/2011 09:47 540700 20726 LUNCH BAG WOODLAND 20 1.65 12393.0
1/11/2011 09:47 540700 22383 LUNCH BAG SUKI DESIGN 10 1.65 12393.0
1/11/2011 09:47 540700 21249 WOODLAND HEIGHT CHART STICKERS 6 2.95 12393.0
1/11/2011 09:47 540700 22378 WALL TIDY RETROSPOT 20 0.85 12393.0
1/11/2011 09:47 540700 22175 PINK OWL SOFT TOY 12 2.95 12393.0
1/11/2011 09:47 540700 22176 BLUE OWL SOFT TOY 12 2.95 12393.0
1/11/2011 09:47 540700 84997C BLUE 3 PIECE POLKADOT CUTLERY SET 6 3.75 12393.0
1/11/2011 09:47 540700 20728 LUNCH BAG CARS BLUE 20 1.65 12393.0
1/11/2011 09:47 540700 22382 LUNCH BAG SPACEBOY DESIGN 20 1.65 12393.0
1/11/2011 09:47 540700 21915 RED HARMONICA IN BOX 12 1.25 12393.0
1/11/2011 09:47 540700 22549 PICTURE DOMINOES 12 1.45 12393.0
1/11/2011 09:47 540700 21578 WOODLAND DESIGN COTTON TOTE BAG 12 2.25 12393.0
1/11/2011 09:47 540700 21577 SAVE THE PLANET COTTON TOTE BAG 12 2.25 12393.0
1/11/2011 09:47 540700 22245 HOOK, 1 HANGER ,MAGIC GARDEN 12 0.85 12393.0
1/11/2011 09:47 540700 22244 3 HOOK HANGER MAGIC GARDEN 12 1.95 12393.0

Retrieve Unit Prices using Multi-Indexing

In [21]:
sales_df.loc[('Australia','1/11/2011 09:47'), 'UnitPrice']
Out[21]:
Country    InvoiceDate    
Australia  1/11/2011 09:47    2.25
           1/11/2011 09:47    3.75
           1/11/2011 09:47    3.75
           1/11/2011 09:47    1.65
           1/11/2011 09:47    1.65
           1/11/2011 09:47    1.65
           1/11/2011 09:47    2.95
           1/11/2011 09:47    0.85
           1/11/2011 09:47    2.95
           1/11/2011 09:47    2.95
           1/11/2011 09:47    3.75
           1/11/2011 09:47    1.65
           1/11/2011 09:47    1.65
           1/11/2011 09:47    1.25
           1/11/2011 09:47    1.45
           1/11/2011 09:47    2.25
           1/11/2011 09:47    2.25
           1/11/2011 09:47    0.85
           1/11/2011 09:47    1.95
Name: UnitPrice, dtype: float64

Retrieve specific elements using iloc

In [22]:
sales_df.iloc[0]
Out[22]:
InvoiceNo                       540557
StockCode                        22523
Description    CHILDS GARDEN FORK PINK
Quantity                            96
UnitPrice                         0.85
CustomerID                     12415.0
Name: (Australia, 1/10/2011 09:58), dtype: object
In [23]:
sales_df.iloc[0,0]
Out[23]:
'540557'

Transpose the DataFrame

In [24]:
sales_df=sales_df.transpose()
sales_df.head(10)
Out[24]:
Country Australia ... Unspecified
InvoiceDate 1/10/2011 09:58 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 1/11/2011 09:47 ... 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17 9/2/2011 12:17
InvoiceNo 540557 540700 540700 540700 540700 540700 540700 540700 540700 540700 ... 565303 565303 565303 565303 565303 565303 565303 565303 565303 565303
StockCode 22523 21581 22619 84997B 20727 20726 22383 21249 22378 22175 ... 20983 20977 20982 23084 22549 85227 22138 21889 22550 23127
Description CHILDS GARDEN FORK PINK SKULLS DESIGN COTTON TOTE BAG SET OF 6 SOLDIER SKITTLES RED 3 PIECE RETROSPOT CUTLERY SET LUNCH BAG BLACK SKULL. LUNCH BAG WOODLAND LUNCH BAG SUKI DESIGN WOODLAND HEIGHT CHART STICKERS WALL TIDY RETROSPOT PINK OWL SOFT TOY ... 12 PENCILS TALL TUBE RED RETROSPOT 36 PENCILS TUBE WOODLAND 12 PENCILS TALL TUBE SKULLS RABBIT NIGHT LIGHT PICTURE DOMINOES SET OF 6 3D KIT CARDS FOR KIDS BAKING SET 9 PIECE RETROSPOT WOODEN BOX OF DOMINOES HOLIDAY FUN LUDO FELTCRAFT GIRL NICOLE KIT
Quantity 96 6 8 6 20 20 10 6 20 12 ... 4 4 5 1 1 4 2 5 2 1
UnitPrice 0.85 2.25 3.75 3.75 1.65 1.65 1.65 2.95 0.85 2.95 ... 0.85 1.25 0.85 2.08 1.45 0.85 4.95 1.25 3.75 4.95
CustomerID 12415.0 12393.0 12393.0 12393.0 12393.0 12393.0 12393.0 12393.0 12393.0 12393.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

6 rows × 541909 columns

Locate specific element in the DataFrame

In [25]:
sales_df.loc['UnitPrice',('Australia','1/10/2011 09:58')]
Out[25]:
Country    InvoiceDate    
Australia  1/10/2011 09:58    0.85
Name: UnitPrice, dtype: object

Locate range of element in the DataFrame

In [26]:
sales_df.loc['UnitPrice',('Australia','1/10/2011 09:58'):('Belgium','1/10/2011 09:58')]
Out[26]:
Country    InvoiceDate    
Australia  1/10/2011 09:58    0.85
           1/11/2011 09:47    2.25
           1/11/2011 09:47    3.75
           1/11/2011 09:47    3.75
           1/11/2011 09:47    1.65
                              ... 
Bahrain    5/9/2011 13:49     2.95
           5/9/2011 13:49     2.95
           5/9/2011 13:49     4.25
           5/9/2011 13:49     4.25
           5/9/2011 13:49     4.25
Name: UnitPrice, Length: 1679, dtype: object

Retrieve the average Unit Price in the UK at the date of 12/1/2010 08:26 using Mutli-Indexing

In [27]:
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)

#Retrieving the mean
sales_df.loc[('United Kingdom','12/1/2010 08:26'), 'UnitPrice'].mean()
Out[27]:
3.91

Task 6. Dealing with Date Time — Python DateTime Module

In [28]:
import datetime as dt

Picking a Date

In [29]:
my_date=dt.date(2020,3,22)
my_date
Out[29]:
datetime.date(2020, 3, 22)

Confirming datatype

In [30]:
type(my_date)
Out[30]:
datetime.date

Convert to string

In [31]:
str(my_date)
Out[31]:
'2020-03-22'

Retrieve the Day

In [32]:
my_date.day
Out[32]:
22

Retrieve the Month

In [33]:
my_date.month
Out[33]:
3

Retrieve the Year

In [34]:
my_date.year
Out[34]:
2020

Picking a Datetime

In [35]:
my_datetime=dt.datetime(2020,3,22,8,20,50)
In [36]:
str(my_datetime)
Out[36]:
'2020-03-22 08:20:50'

Retrieve the Hour

In [37]:
my_datetime.hour
Out[37]:
8

Retrieve the Minute

In [38]:
my_datetime.minute
Out[38]:
20

Retrieve the Second

In [39]:
my_datetime.second
Out[39]:
50

Print out the Calendar for March 2021

In [40]:
import calendar
print(calendar.month(2021,3))
     March 2021
Mo Tu We Th Fr Sa Su
 1  2  3  4  5  6  7
 8  9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31

Use Pandas to Convert irregular Pandas Series into a Datetime Format

In [41]:
dates=pd.Series(['2020/03/22','2020/08/25','March 22nd, 2020'])
dates
Out[41]:
0          2020/03/22
1          2020/08/25
2    March 22nd, 2020
dtype: object

Task 7. Dealing with Date Time — Handling Date and Time using Pandas

Use Timestamp to display date and time

In [42]:
pd.Timestamp(2020, 3, 22, 15, 10, 00)
Out[42]:
Timestamp('2020-03-22 15:10:00')

Use Timestamp to display date and time using Python datetime object

In [43]:
pd.Timestamp(dt.datetime(2020,3,22,15,10,00))
Out[43]:
Timestamp('2020-03-22 15:10:00')

Calculate the difference between 2 dates

In [44]:
day_1=pd.Timestamp(1998,3,22,10)
day_2=pd.Timestamp(2021,3,22,10)
delta=day_2-day_1
delta
Out[44]:
Timedelta('8401 days 00:00:00')

Define 3 dates for 3 separate transactions

In [45]:
date_1=dt.date(2020,3,22)
date_2=dt.date(2020,4,22)
date_3=dt.date(2020,5,22)
dates_list=[date_1,date_2,date_3]
dates_list
Out[45]:
[datetime.date(2020, 3, 22),
 datetime.date(2020, 4, 22),
 datetime.date(2020, 5, 22)]

Convert the list into datetime datatype

In [46]:
dates_index=pd.DatetimeIndex(dates_list)
dates_index
Out[46]:
DatetimeIndex(['2020-03-22', '2020-04-22', '2020-05-22'], dtype='datetime64[ns]', freq=None)

List that carries 3 values corresponding to store sales

In [47]:
sales=[50,55,60]

Define a Pandas Series using datetime and values as follows

In [48]:
sales=pd.Series(data=sales, index=dates_index)
sales
Out[48]:
2020-03-22    50
2020-04-22    55
2020-05-22    60
dtype: int64

Define a range of Dates as follows

In [49]:
my_days=pd.date_range(start='2020-01-01', end='2020-04-01', freq='D')
my_days
Out[49]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12',
               '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
               '2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20',
               '2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24',
               '2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28',
               '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-01',
               '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05',
               '2020-02-06', '2020-02-07', '2020-02-08', '2020-02-09',
               '2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13',
               '2020-02-14', '2020-02-15', '2020-02-16', '2020-02-17',
               '2020-02-18', '2020-02-19', '2020-02-20', '2020-02-21',
               '2020-02-22', '2020-02-23', '2020-02-24', '2020-02-25',
               '2020-02-26', '2020-02-27', '2020-02-28', '2020-02-29',
               '2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04',
               '2020-03-05', '2020-03-06', '2020-03-07', '2020-03-08',
               '2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12',
               '2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16',
               '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20',
               '2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24',
               '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28',
               '2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'],
              dtype='datetime64[ns]', freq='D')
In [50]:
type(my_days)
Out[50]:
pandas.core.indexes.datetimes.DatetimeIndex
In [51]:
my_days=pd.date_range(start='2020-01-01', end='2020-08-01', freq='ME')
my_days
Out[51]:
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31', '2020-06-30', '2020-07-31'],
              dtype='datetime64[ns]', freq='ME')

Retrieve Business Days between 2020-01-01 and 2020-04-01

In [52]:
my_days=pd.date_range(start='2020-01-01', end='2020-04-01', freq='B')
my_days
Out[52]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16',
               '2020-01-17', '2020-01-20', '2020-01-21', '2020-01-22',
               '2020-01-23', '2020-01-24', '2020-01-27', '2020-01-28',
               '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-03',
               '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07',
               '2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13',
               '2020-02-14', '2020-02-17', '2020-02-18', '2020-02-19',
               '2020-02-20', '2020-02-21', '2020-02-24', '2020-02-25',
               '2020-02-26', '2020-02-27', '2020-02-28', '2020-03-02',
               '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06',
               '2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12',
               '2020-03-13', '2020-03-16', '2020-03-17', '2020-03-18',
               '2020-03-19', '2020-03-20', '2020-03-23', '2020-03-24',
               '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-30',
               '2020-03-31', '2020-04-01'],
              dtype='datetime64[ns]', freq='B')