Mastering Data Analysis with Pandas #4
Task 1. DataSet
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
Discovering the DataSet
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/2010 08:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/2010 08:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541904 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 12/9/2011 12:50 | 0.85 | 12680.0 | France |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 12/9/2011 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 12/9/2011 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 12/9/2011 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 12/9/2011 12:50 | 4.95 | 12680.0 | France |
541909 rows × 8 columns
Data Types
sales_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 541909 entries, 0 to 541908 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 InvoiceNo 541909 non-null object 1 StockCode 541909 non-null object 2 Description 540455 non-null object 3 Quantity 541909 non-null int64 4 InvoiceDate 541909 non-null object 5 UnitPrice 541909 non-null float64 6 CustomerID 406829 non-null float64 7 Country 541909 non-null object dtypes: float64(2), int64(1), object(5) memory usage: 33.1+ MB
Confirming changes
sales_df['InvoiceDate']=pd.to_datetime(sales_df['InvoiceDate'])
sales_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 541909 entries, 0 to 541908 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 InvoiceNo 541909 non-null object 1 StockCode 541909 non-null object 2 Description 540455 non-null object 3 Quantity 541909 non-null int64 4 InvoiceDate 541909 non-null datetime64[ns] 5 UnitPrice 541909 non-null float64 6 CustomerID 406829 non-null float64 7 Country 541909 non-null object dtypes: datetime64[ns](1), float64(2), int64(1), object(4) memory usage: 33.1+ MB
How many Null Values in each Column
sales_df.isnull().sum()
InvoiceNo 0 StockCode 0 Description 1454 Quantity 0 InvoiceDate 0 UnitPrice 0 CustomerID 135080 Country 0 dtype: int64
Unique Countries
sales_df['Country'].unique()
array(['United Kingdom', 'France', 'Australia', 'Netherlands', 'Germany', 'Norway', 'EIRE', 'Switzerland', 'Spain', 'Poland', 'Portugal', 'Italy', 'Belgium', 'Lithuania', 'Japan', 'Iceland', 'Channel Islands', 'Denmark', 'Cyprus', 'Sweden', 'Austria', 'Israel', 'Finland', 'Bahrain', 'Greece', 'Hong Kong', 'Singapore', 'Lebanon', 'United Arab Emirates', 'Saudi Arabia', 'Czech Republic', 'Canada', 'Unspecified', 'Brazil', 'USA', 'European Community', 'Malta', 'RSA'], dtype=object)
How many Unique Values in each Column
sales_df.nunique()
InvoiceNo 25900 StockCode 4070 Description 4223 Quantity 722 InvoiceDate 23260 UnitPrice 1630 CustomerID 4372 Country 38 dtype: int64
Task 2. Group By
sales_df
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541904 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 2011-12-09 12:50:00 | 0.85 | 12680.0 | France |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2011-12-09 12:50:00 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 2011-12-09 12:50:00 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 2011-12-09 12:50:00 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 2011-12-09 12:50:00 | 4.95 | 12680.0 | France |
541909 rows × 8 columns
Retrieve the mean, max and min of all Unit Prices by Country
sales_df.groupby('Country')['UnitPrice'].agg(['mean', 'max', 'min'])
mean | max | min | |
---|---|---|---|
Country | |||
Australia | 3.220612 | 350.00 | 0.00 |
Austria | 4.243192 | 40.00 | 0.12 |
Bahrain | 4.556316 | 12.75 | 1.25 |
Belgium | 3.644335 | 39.95 | 0.12 |
Brazil | 4.456250 | 10.95 | 0.85 |
Canada | 6.030331 | 550.94 | 0.10 |
Channel Islands | 4.932124 | 293.00 | 0.19 |
Cyprus | 6.302363 | 320.69 | 0.12 |
Czech Republic | 2.938333 | 40.00 | 0.29 |
Denmark | 3.256941 | 18.00 | 0.21 |
EIRE | 5.911077 | 1917.00 | 0.00 |
European Community | 4.820492 | 18.00 | 0.55 |
Finland | 5.448705 | 275.60 | 0.12 |
France | 5.028864 | 4161.06 | 0.00 |
Germany | 3.966930 | 599.50 | 0.00 |
Greece | 4.885548 | 50.00 | 0.14 |
Hong Kong | 42.505208 | 2653.95 | 0.21 |
Iceland | 2.644011 | 12.75 | 0.25 |
Israel | 3.633131 | 125.00 | 0.06 |
Italy | 4.831121 | 300.00 | 0.12 |
Japan | 2.276145 | 45.57 | 0.21 |
Lebanon | 5.387556 | 14.95 | 0.55 |
Lithuania | 2.841143 | 5.95 | 1.25 |
Malta | 5.244173 | 65.00 | 0.19 |
Netherlands | 2.738317 | 206.40 | 0.00 |
Norway | 6.012026 | 700.00 | 0.00 |
Poland | 4.170880 | 40.00 | 0.19 |
Portugal | 8.582976 | 1241.98 | 0.12 |
RSA | 4.277586 | 14.95 | 0.00 |
Saudi Arabia | 2.411000 | 5.49 | 0.42 |
Singapore | 109.645808 | 3949.32 | 0.19 |
Spain | 4.987544 | 1715.85 | 0.00 |
Sweden | 3.910887 | 40.00 | 0.19 |
Switzerland | 3.403442 | 40.00 | 0.00 |
USA | 2.216426 | 16.95 | 0.42 |
United Arab Emirates | 3.380735 | 37.50 | 0.29 |
United Kingdom | 4.532422 | 38970.00 | -11062.06 |
Unspecified | 2.699574 | 16.95 | 0.19 |
Task 3. Multi-Indexed DataFrames
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 12/1/2010 08:26 | 2.55 | 17850.0 | United Kingdom |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 12/1/2010 08:26 | 2.75 | 17850.0 | United Kingdom |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 12/1/2010 08:26 | 3.39 | 17850.0 | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... |
541904 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 12/9/2011 12:50 | 0.85 | 12680.0 | France |
541905 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 12/9/2011 12:50 | 2.10 | 12680.0 | France |
541906 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 12/9/2011 12:50 | 4.15 | 12680.0 | France |
541907 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 12/9/2011 12:50 | 4.15 | 12680.0 | France |
541908 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 12/9/2011 12:50 | 4.95 | 12680.0 | France |
541909 rows × 8 columns
Multi-Indexing using Country and InvoiceDate
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Country | InvoiceDate | ||||||
United Kingdom | 12/1/2010 08:26 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2.55 | 17850.0 |
12/1/2010 08:26 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2.75 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 3.39 | 17850.0 | |
... | ... | ... | ... | ... | ... | ... | ... |
France | 12/9/2011 12:50 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 0.85 | 12680.0 |
12/9/2011 12:50 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2.10 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 4.95 | 12680.0 |
541909 rows × 6 columns
sales_df.sort_index(inplace=True)
sales_df
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Country | InvoiceDate | ||||||
Australia | 1/10/2011 09:58 | 540557 | 22523 | CHILDS GARDEN FORK PINK | 96 | 0.85 | 12415.0 |
1/11/2011 09:47 | 540700 | 21581 | SKULLS DESIGN COTTON TOTE BAG | 6 | 2.25 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22619 | SET OF 6 SOLDIER SKITTLES | 8 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 84997B | RED 3 PIECE RETROSPOT CUTLERY SET | 6 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 20727 | LUNCH BAG BLACK SKULL. | 20 | 1.65 | 12393.0 | |
... | ... | ... | ... | ... | ... | ... | ... |
Unspecified | 9/2/2011 12:17 | 565303 | 85227 | SET OF 6 3D KIT CARDS FOR KIDS | 4 | 0.85 | NaN |
9/2/2011 12:17 | 565303 | 22138 | BAKING SET 9 PIECE RETROSPOT | 2 | 4.95 | NaN | |
9/2/2011 12:17 | 565303 | 21889 | WOODEN BOX OF DOMINOES | 5 | 1.25 | NaN | |
9/2/2011 12:17 | 565303 | 22550 | HOLIDAY FUN LUDO | 2 | 3.75 | NaN | |
9/2/2011 12:17 | 565303 | 23127 | FELTCRAFT GIRL NICOLE KIT | 1 | 4.95 | NaN |
541909 rows × 6 columns
Retrieve Indexes
sales_df.index.names
FrozenList(['Country', 'InvoiceDate'])
Retrieve the first Index
#Multi-Index Objects
sales_df.index[0]
('Australia', '1/10/2011 09:58')
Task 4. Multi-Indexing Operations #1
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Country | InvoiceDate | ||||||
United Kingdom | 12/1/2010 08:26 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2.55 | 17850.0 |
12/1/2010 08:26 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2.75 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 3.39 | 17850.0 | |
... | ... | ... | ... | ... | ... | ... | ... |
France | 12/9/2011 12:50 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 0.85 | 12680.0 |
12/9/2011 12:50 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2.10 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 4.95 | 12680.0 |
541909 rows × 6 columns
Retrieve the Values of the first Index
sales_df.index.get_level_values(0)
Index(['United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', 'United Kingdom', ... 'France', 'France', 'France', 'France', 'France', 'France', 'France', 'France', 'France', 'France'], dtype='object', name='Country', length=541909)
Retrieve the Values of the second Index
sales_df.index.get_level_values(1)
Index(['12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:26', '12/1/2010 08:28', '12/1/2010 08:28', '12/1/2010 08:34', ... '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50', '12/9/2011 12:50'], dtype='object', name='InvoiceDate', length=541909)
Change the Name of the Indexes
sales_df.index.set_names(names=['Transaction Location','Transaction Date'], inplace=True)
sales_df
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Transaction Location | Transaction Date | ||||||
United Kingdom | 12/1/2010 08:26 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2.55 | 17850.0 |
12/1/2010 08:26 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2.75 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 3.39 | 17850.0 | |
12/1/2010 08:26 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 3.39 | 17850.0 | |
... | ... | ... | ... | ... | ... | ... | ... |
France | 12/9/2011 12:50 | 581587 | 22613 | PACK OF 20 SPACEBOY NAPKINS | 12 | 0.85 | 12680.0 |
12/9/2011 12:50 | 581587 | 22899 | CHILDREN'S APRON DOLLY GIRL | 6 | 2.10 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23254 | CHILDRENS CUTLERY DOLLY GIRL | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 23255 | CHILDRENS CUTLERY CIRCUS PARADE | 4 | 4.15 | 12680.0 | |
12/9/2011 12:50 | 581587 | 22138 | BAKING SET 9 PIECE RETROSPOT | 3 | 4.95 | 12680.0 |
541909 rows × 6 columns
Task 5. Multi-Indexing Operations #2
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
sales_df.sort_index(inplace=True)
sales_df
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Country | InvoiceDate | ||||||
Australia | 1/10/2011 09:58 | 540557 | 22523 | CHILDS GARDEN FORK PINK | 96 | 0.85 | 12415.0 |
1/11/2011 09:47 | 540700 | 21581 | SKULLS DESIGN COTTON TOTE BAG | 6 | 2.25 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22619 | SET OF 6 SOLDIER SKITTLES | 8 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 84997B | RED 3 PIECE RETROSPOT CUTLERY SET | 6 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 20727 | LUNCH BAG BLACK SKULL. | 20 | 1.65 | 12393.0 | |
... | ... | ... | ... | ... | ... | ... | ... |
Unspecified | 9/2/2011 12:17 | 565303 | 85227 | SET OF 6 3D KIT CARDS FOR KIDS | 4 | 0.85 | NaN |
9/2/2011 12:17 | 565303 | 22138 | BAKING SET 9 PIECE RETROSPOT | 2 | 4.95 | NaN | |
9/2/2011 12:17 | 565303 | 21889 | WOODEN BOX OF DOMINOES | 5 | 1.25 | NaN | |
9/2/2011 12:17 | 565303 | 22550 | HOLIDAY FUN LUDO | 2 | 3.75 | NaN | |
9/2/2011 12:17 | 565303 | 23127 | FELTCRAFT GIRL NICOLE KIT | 1 | 4.95 | NaN |
541909 rows × 6 columns
Retrieve specific elements using Multi-Indexing
sales_df.loc['Australia','1/11/2011 09:47']
InvoiceNo | StockCode | Description | Quantity | UnitPrice | CustomerID | ||
---|---|---|---|---|---|---|---|
Country | InvoiceDate | ||||||
Australia | 1/11/2011 09:47 | 540700 | 21581 | SKULLS DESIGN COTTON TOTE BAG | 6 | 2.25 | 12393.0 |
1/11/2011 09:47 | 540700 | 22619 | SET OF 6 SOLDIER SKITTLES | 8 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 84997B | RED 3 PIECE RETROSPOT CUTLERY SET | 6 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 20727 | LUNCH BAG BLACK SKULL. | 20 | 1.65 | 12393.0 | |
1/11/2011 09:47 | 540700 | 20726 | LUNCH BAG WOODLAND | 20 | 1.65 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22383 | LUNCH BAG SUKI DESIGN | 10 | 1.65 | 12393.0 | |
1/11/2011 09:47 | 540700 | 21249 | WOODLAND HEIGHT CHART STICKERS | 6 | 2.95 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22378 | WALL TIDY RETROSPOT | 20 | 0.85 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22175 | PINK OWL SOFT TOY | 12 | 2.95 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22176 | BLUE OWL SOFT TOY | 12 | 2.95 | 12393.0 | |
1/11/2011 09:47 | 540700 | 84997C | BLUE 3 PIECE POLKADOT CUTLERY SET | 6 | 3.75 | 12393.0 | |
1/11/2011 09:47 | 540700 | 20728 | LUNCH BAG CARS BLUE | 20 | 1.65 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22382 | LUNCH BAG SPACEBOY DESIGN | 20 | 1.65 | 12393.0 | |
1/11/2011 09:47 | 540700 | 21915 | RED HARMONICA IN BOX | 12 | 1.25 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22549 | PICTURE DOMINOES | 12 | 1.45 | 12393.0 | |
1/11/2011 09:47 | 540700 | 21578 | WOODLAND DESIGN COTTON TOTE BAG | 12 | 2.25 | 12393.0 | |
1/11/2011 09:47 | 540700 | 21577 | SAVE THE PLANET COTTON TOTE BAG | 12 | 2.25 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22245 | HOOK, 1 HANGER ,MAGIC GARDEN | 12 | 0.85 | 12393.0 | |
1/11/2011 09:47 | 540700 | 22244 | 3 HOOK HANGER MAGIC GARDEN | 12 | 1.95 | 12393.0 |
Retrieve Unit Prices using Multi-Indexing
sales_df.loc[('Australia','1/11/2011 09:47'), 'UnitPrice']
Country InvoiceDate Australia 1/11/2011 09:47 2.25 1/11/2011 09:47 3.75 1/11/2011 09:47 3.75 1/11/2011 09:47 1.65 1/11/2011 09:47 1.65 1/11/2011 09:47 1.65 1/11/2011 09:47 2.95 1/11/2011 09:47 0.85 1/11/2011 09:47 2.95 1/11/2011 09:47 2.95 1/11/2011 09:47 3.75 1/11/2011 09:47 1.65 1/11/2011 09:47 1.65 1/11/2011 09:47 1.25 1/11/2011 09:47 1.45 1/11/2011 09:47 2.25 1/11/2011 09:47 2.25 1/11/2011 09:47 0.85 1/11/2011 09:47 1.95 Name: UnitPrice, dtype: float64
Retrieve specific elements using iloc
sales_df.iloc[0]
InvoiceNo 540557 StockCode 22523 Description CHILDS GARDEN FORK PINK Quantity 96 UnitPrice 0.85 CustomerID 12415.0 Name: (Australia, 1/10/2011 09:58), dtype: object
sales_df.iloc[0,0]
'540557'
Transpose the DataFrame
sales_df=sales_df.transpose()
sales_df.head(10)
Country | Australia | ... | Unspecified | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
InvoiceDate | 1/10/2011 09:58 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | 1/11/2011 09:47 | ... | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 | 9/2/2011 12:17 |
InvoiceNo | 540557 | 540700 | 540700 | 540700 | 540700 | 540700 | 540700 | 540700 | 540700 | 540700 | ... | 565303 | 565303 | 565303 | 565303 | 565303 | 565303 | 565303 | 565303 | 565303 | 565303 |
StockCode | 22523 | 21581 | 22619 | 84997B | 20727 | 20726 | 22383 | 21249 | 22378 | 22175 | ... | 20983 | 20977 | 20982 | 23084 | 22549 | 85227 | 22138 | 21889 | 22550 | 23127 |
Description | CHILDS GARDEN FORK PINK | SKULLS DESIGN COTTON TOTE BAG | SET OF 6 SOLDIER SKITTLES | RED 3 PIECE RETROSPOT CUTLERY SET | LUNCH BAG BLACK SKULL. | LUNCH BAG WOODLAND | LUNCH BAG SUKI DESIGN | WOODLAND HEIGHT CHART STICKERS | WALL TIDY RETROSPOT | PINK OWL SOFT TOY | ... | 12 PENCILS TALL TUBE RED RETROSPOT | 36 PENCILS TUBE WOODLAND | 12 PENCILS TALL TUBE SKULLS | RABBIT NIGHT LIGHT | PICTURE DOMINOES | SET OF 6 3D KIT CARDS FOR KIDS | BAKING SET 9 PIECE RETROSPOT | WOODEN BOX OF DOMINOES | HOLIDAY FUN LUDO | FELTCRAFT GIRL NICOLE KIT |
Quantity | 96 | 6 | 8 | 6 | 20 | 20 | 10 | 6 | 20 | 12 | ... | 4 | 4 | 5 | 1 | 1 | 4 | 2 | 5 | 2 | 1 |
UnitPrice | 0.85 | 2.25 | 3.75 | 3.75 | 1.65 | 1.65 | 1.65 | 2.95 | 0.85 | 2.95 | ... | 0.85 | 1.25 | 0.85 | 2.08 | 1.45 | 0.85 | 4.95 | 1.25 | 3.75 | 4.95 |
CustomerID | 12415.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | 12393.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
6 rows × 541909 columns
Locate specific element in the DataFrame
sales_df.loc['UnitPrice',('Australia','1/10/2011 09:58')]
Country InvoiceDate Australia 1/10/2011 09:58 0.85 Name: UnitPrice, dtype: object
Locate range of element in the DataFrame
sales_df.loc['UnitPrice',('Australia','1/10/2011 09:58'):('Belgium','1/10/2011 09:58')]
Country InvoiceDate Australia 1/10/2011 09:58 0.85 1/11/2011 09:47 2.25 1/11/2011 09:47 3.75 1/11/2011 09:47 3.75 1/11/2011 09:47 1.65 ... Bahrain 5/9/2011 13:49 2.95 5/9/2011 13:49 2.95 5/9/2011 13:49 4.25 5/9/2011 13:49 4.25 5/9/2011 13:49 4.25 Name: UnitPrice, Length: 1679, dtype: object
Retrieve the average Unit Price in the UK at the date of 12/1/2010 08:26 using Mutli-Indexing
sales_df=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 4/00-ecommerce_sales.csv')
sales_df.set_index(keys=['Country','InvoiceDate'], inplace=True)
#Retrieving the mean
sales_df.loc[('United Kingdom','12/1/2010 08:26'), 'UnitPrice'].mean()
3.91
Task 6. Dealing with Date Time — Python DateTime Module
import datetime as dt
Picking a Date
my_date=dt.date(2020,3,22)
my_date
datetime.date(2020, 3, 22)
Confirming datatype
type(my_date)
datetime.date
Convert to string
str(my_date)
'2020-03-22'
Retrieve the Day
my_date.day
22
Retrieve the Month
my_date.month
3
Retrieve the Year
my_date.year
2020
Picking a Datetime
my_datetime=dt.datetime(2020,3,22,8,20,50)
str(my_datetime)
'2020-03-22 08:20:50'
Retrieve the Hour
my_datetime.hour
8
Retrieve the Minute
my_datetime.minute
20
Retrieve the Second
my_datetime.second
50
Print out the Calendar for March 2021
import calendar
print(calendar.month(2021,3))
March 2021 Mo Tu We Th Fr Sa Su 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Use Pandas to Convert irregular Pandas Series into a Datetime Format
dates=pd.Series(['2020/03/22','2020/08/25','March 22nd, 2020'])
dates
0 2020/03/22 1 2020/08/25 2 March 22nd, 2020 dtype: object
Task 7. Dealing with Date Time — Handling Date and Time using Pandas
Use Timestamp to display date and time
pd.Timestamp(2020, 3, 22, 15, 10, 00)
Timestamp('2020-03-22 15:10:00')
Use Timestamp to display date and time using Python datetime object
pd.Timestamp(dt.datetime(2020,3,22,15,10,00))
Timestamp('2020-03-22 15:10:00')
Calculate the difference between 2 dates
day_1=pd.Timestamp(1998,3,22,10)
day_2=pd.Timestamp(2021,3,22,10)
delta=day_2-day_1
delta
Timedelta('8401 days 00:00:00')
Define 3 dates for 3 separate transactions
date_1=dt.date(2020,3,22)
date_2=dt.date(2020,4,22)
date_3=dt.date(2020,5,22)
dates_list=[date_1,date_2,date_3]
dates_list
[datetime.date(2020, 3, 22), datetime.date(2020, 4, 22), datetime.date(2020, 5, 22)]
Convert the list into datetime datatype
dates_index=pd.DatetimeIndex(dates_list)
dates_index
DatetimeIndex(['2020-03-22', '2020-04-22', '2020-05-22'], dtype='datetime64[ns]', freq=None)
List that carries 3 values corresponding to store sales
sales=[50,55,60]
Define a Pandas Series using datetime and values as follows
sales=pd.Series(data=sales, index=dates_index)
sales
2020-03-22 50 2020-04-22 55 2020-05-22 60 dtype: int64
Define a range of Dates as follows
my_days=pd.date_range(start='2020-01-01', end='2020-04-01', freq='D')
my_days
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-11', '2020-01-12', '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16', '2020-01-17', '2020-01-18', '2020-01-19', '2020-01-20', '2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24', '2020-01-25', '2020-01-26', '2020-01-27', '2020-01-28', '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07', '2020-02-08', '2020-02-09', '2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14', '2020-02-15', '2020-02-16', '2020-02-17', '2020-02-18', '2020-02-19', '2020-02-20', '2020-02-21', '2020-02-22', '2020-02-23', '2020-02-24', '2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28', '2020-02-29', '2020-03-01', '2020-03-02', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06', '2020-03-07', '2020-03-08', '2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15', '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20', '2020-03-21', '2020-03-22', '2020-03-23', '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-28', '2020-03-29', '2020-03-30', '2020-03-31', '2020-04-01'], dtype='datetime64[ns]', freq='D')
type(my_days)
pandas.core.indexes.datetimes.DatetimeIndex
my_days=pd.date_range(start='2020-01-01', end='2020-08-01', freq='ME')
my_days
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30', '2020-05-31', '2020-06-30', '2020-07-31'], dtype='datetime64[ns]', freq='ME')
Retrieve Business Days between 2020-01-01 and 2020-04-01
my_days=pd.date_range(start='2020-01-01', end='2020-04-01', freq='B')
my_days
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-13', '2020-01-14', '2020-01-15', '2020-01-16', '2020-01-17', '2020-01-20', '2020-01-21', '2020-01-22', '2020-01-23', '2020-01-24', '2020-01-27', '2020-01-28', '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07', '2020-02-10', '2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14', '2020-02-17', '2020-02-18', '2020-02-19', '2020-02-20', '2020-02-21', '2020-02-24', '2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28', '2020-03-02', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06', '2020-03-09', '2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13', '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19', '2020-03-20', '2020-03-23', '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27', '2020-03-30', '2020-03-31', '2020-04-01'], dtype='datetime64[ns]', freq='B')