Mastering Data Analysis with Pandas #2
Task 1. Define a Pandas Dataframe
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
Create the DataFrame
employee_df=pd.DataFrame({'Employee ID':[1,2,3,4,],
'Employee Name':['Fatima Zahra El Amrani','Ahmed Benali','Amina Oulad Chaib','Youssef Bouhaddou'],
'Annual Salary [MAD]':[80000,105000,143000,112000],
'Years with Company':[5,7,10,1]})
employee_df
Employee ID | Employee Name | Annual Salary [MAD] | Years with Company | |
---|---|---|---|---|
0 | 1 | Fatima Zahra El Amrani | 80000 | 5 |
1 | 2 | Ahmed Benali | 105000 | 7 |
2 | 3 | Amina Oulad Chaib | 143000 | 10 |
3 | 4 | Youssef Bouhaddou | 112000 | 1 |
Confirm it's a DataFrame
type(employee_df)
pandas.core.frame.DataFrame
Obtain the Shape of the DataFrame
employee_df.shape
(4, 4)
Obtain the Info of the DataFrame
employee_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4 entries, 0 to 3 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Employee ID 4 non-null int64 1 Employee Name 4 non-null object 2 Annual Salary [MAD] 4 non-null int64 3 Years with Company 4 non-null int64 dtypes: int64(3), object(1) memory usage: 260.0+ bytes
Calculate the Total Value of the Portfolio including all stocks of the Portfolio DataSet below
#Create the DataSet
portfolio_df=pd.DataFrame({
'stock ticker symbol':['FB','TSLA','T'],
'price per share [$]': [320,360,400],
'number of stocks':[4,7,10]
})
portfolio_df
stock ticker symbol | price per share [$] | number of stocks | |
---|---|---|---|
0 | FB | 320 | 4 |
1 | TSLA | 360 | 7 |
2 | T | 400 | 10 |
#Calculate the stocks dollar value
stocks_dollar_value=portfolio_df['price per share [$]']*portfolio_df['number of stocks']
stocks_dollar_value
0 1280 1 2520 2 4000 dtype: int64
#Calculate the total value of the portfolio
stocks_dollar_value.sum()
7800
Task 2. Read CSV and HTML Data
Import csv file
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Import from html
can_houses=pd.read_html('https://www.livingin-canada.com/house-prices-canada.html')
can_houses[0]
City | Average House Price | 12 Month Change | |
---|---|---|---|
0 | Vancouver, BC | $1,036,000 | + 2.63 % |
1 | Toronto, Ont | $870,000 | +10.2 % |
2 | Ottawa, Ont | $479,000 | + 15.4 % |
3 | Calgary, Alb | $410,000 | – 1.5 % |
4 | Montreal, Que | $435,000 | + 9.3 % |
5 | Halifax, NS | $331,000 | + 3.6 % |
6 | Regina, Sask | $254,000 | – 3.9 % |
7 | Fredericton, NB | $198,000 | – 4.3 % |
8 | (adsbygoogle = window.adsbygoogle || []).push(... | (adsbygoogle = window.adsbygoogle || []).push(... | (adsbygoogle = window.adsbygoogle || []).push(... |
Export to csv
can_houses[0].to_csv('00-can_houses.csv', index=True)
Task 3. Write DataFrame into CSV
Create the DataFrame
employee_df=pd.DataFrame({
'Employee ID':[1,2,3,4],
'Employee Name':['Laila Aly','Kate Steve','Nicole Mitch','Francis Morris'],
'Annual Salary [$]':[35000,40000,100000,25000],
'Years with Company':[5,7,10,1]
})
employee_df
Employee ID | Employee Name | Annual Salary [$] | Years with Company | |
---|---|---|---|---|
0 | 1 | Laila Aly | 35000 | 5 |
1 | 2 | Kate Steve | 40000 | 7 |
2 | 3 | Nicole Mitch | 100000 | 10 |
3 | 4 | Francis Morris | 25000 | 1 |
Export to csv
employee_df.to_csv('/Users/mekki/Desktop/employeedf.csv', index=True)
Task 4. Setting and Resetting Index
Read csv
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Assign a specific column to as Index
emp_info.set_index('First Name',inplace=True)
emp_info
Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
First Name | |||||
Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Reset Index
emp_info.reset_index(inplace=True)
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Set Index directly when reading the CSV
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv',index_col='First Name')
emp_info
Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
First Name | |||||
Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Task 5. Selecting Columns from a DataFrame
Return column in the DataFrame
email=emp_info['Email']
email
First Name Mike bird@gmail.com Noah nsmall@hotmail.com Nina azikez@gahew.mr Chanel chanel@gmail.com Kate kate@hotmail.com Samer samer@gmail.com Heba heba.ismail@hotmail.com Laila Laila.a@hotmail.com Joseph daafeja@boh.jm Noah guutodi@bigwoc.kw Name: Email, dtype: object
Identify datatype
type(email)
pandas.core.series.Series
Select Multiple columns from DataFrame
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
name_salary=emp_info[['First Name','Salary']]
name_salary
First Name | Salary | |
---|---|---|
0 | Mike | 5000.00 |
1 | Noah | 10000.00 |
2 | Nina | 9072.02 |
3 | Chanel | 11072.02 |
4 | Kate | 5000.00 |
5 | Samer | 100000.00 |
6 | Heba | 50000.00 |
7 | Laila | 20000.00 |
8 | Joseph | 2629.13 |
9 | Noah | 8626.96 |
Select Multiple columns from DataFrame using a List
ns=['First Name','Salary']
name_salary_1=emp_info[ns]
name_salary_1
First Name | Salary | |
---|---|---|
0 | Mike | 5000.00 |
1 | Noah | 10000.00 |
2 | Nina | 9072.02 |
3 | Chanel | 11072.02 |
4 | Kate | 5000.00 |
5 | Samer | 100000.00 |
6 | Heba | 50000.00 |
7 | Laila | 20000.00 |
8 | Joseph | 2629.13 |
9 | Noah | 8626.96 |
Access a given Row in the DataFrame
#First 3 Rows of Data
emp_info[0:3]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Task 6. Add and Delete DataFrame Columns
Add the 'Age' Column
emp_info['Age']=[25,26,28,30,36,22,48,55,70,69]
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | Age | ||
---|---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com | 25 |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com | 26 |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr | 28 |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com | 30 |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com | 36 |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com | 22 |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com | 48 |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com | 55 |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm | 70 |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw | 69 |
Add 'Credit Score' Column as the first column
emp_info.insert(0,column='Credit Score',value=[680,700,750,699,550,600,750,500,520,510])
emp_info
Credit Score | First Name | Last Name | Salary | Years with Company | Postal Code | Age | ||
---|---|---|---|---|---|---|---|---|
0 | 680 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com | 25 |
1 | 700 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com | 26 |
2 | 750 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr | 28 |
3 | 699 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com | 30 |
4 | 550 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com | 36 |
5 | 600 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com | 22 |
6 | 750 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com | 48 |
7 | 500 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com | 55 |
8 | 520 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm | 70 |
9 | 510 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw | 69 |
Delete 'Email' column using del
del emp_info['Email']
emp_info
Credit Score | First Name | Last Name | Salary | Years with Company | Postal Code | Age | |
---|---|---|---|---|---|---|---|
0 | 680 | Mike | Moe | 5000.00 | 3 | N94 3M0 | 25 |
1 | 700 | Noah | Ryan | 10000.00 | 8 | N8S 14K | 26 |
2 | 750 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | 28 |
3 | 699 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | 30 |
4 | 550 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | 36 |
5 | 600 | Samer | Mo | 100000.00 | 13 | J7H 3HY | 22 |
6 | 750 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | 48 |
7 | 500 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | 55 |
8 | 520 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | 70 |
9 | 510 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | 69 |
Delete 'Last Name' and 'Salary' columns using drop
emp_info.drop(labels=['Last Name','Salary'], axis=1, inplace = True)
emp_info
Credit Score | First Name | Years with Company | Postal Code | Age | |
---|---|---|---|---|---|
0 | 680 | Mike | 3 | N94 3M0 | 25 |
1 | 700 | Noah | 8 | N8S 14K | 26 |
2 | 750 | Nina | 17 | S1T 4E6 | 28 |
3 | 699 | Chanel | 12 | N7T 3E6 | 30 |
4 | 550 | Kate | 23 | K8N 5H6 | 36 |
5 | 600 | Samer | 13 | J7H 3HY | 22 |
6 | 750 | Heba | 7 | K8Y 3M8 | 48 |
7 | 500 | Laila | 5 | J8Y 3M0 | 55 |
8 | 520 | Joseph | 2 | M6U 5U7 | 70 |
9 | 510 | Noah | 11 | K2D 4M9 | 69 |
Remove a column from a DataFrame and Store it somewhere else using pop
Years_with_Company=emp_info.pop('Years with Company')
Years_with_Company
0 3 1 8 2 17 3 12 4 23 5 13 6 7 7 5 8 2 9 11 Name: Years with Company, dtype: int64
Confirm changes
emp_info
Credit Score | First Name | Postal Code | Age | |
---|---|---|---|---|
0 | 680 | Mike | N94 3M0 | 25 |
1 | 700 | Noah | N8S 14K | 26 |
2 | 750 | Nina | S1T 4E6 | 28 |
3 | 699 | Chanel | N7T 3E6 | 30 |
4 | 550 | Kate | K8N 5H6 | 36 |
5 | 600 | Samer | J7H 3HY | 22 |
6 | 750 | Heba | K8Y 3M8 | 48 |
7 | 500 | Laila | J8Y 3M0 | 55 |
8 | 520 | Joseph | M6U 5U7 | 70 |
9 | 510 | Noah | K2D 4M9 | 69 |
Task 7. Label-Based Elements Selection from a DataFrame(.loc)
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv',index_col='Last Name')
emp_info
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Moe | Mike | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
Ryan | Noah | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Steve | Chanel | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Noor | Kate | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Ismail | Heba | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Patton | Joseph | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
Moran | Noah | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Sort the DataFrame in alphabetical order
emp_info.sort_index(inplace=True)
emp_info
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Ismail | Heba | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Moe | Mike | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
Moran | Noah | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Noor | Kate | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Patton | Joseph | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
Ryan | Noah | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
Steve | Chanel | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Retrieve the rows where 'Last Name' is Aly
emp_info.loc['Aly']
First Name Laila Salary 20000.0 Years with Company 5 Postal Code J8Y 3M0 Email Laila.a@hotmail.com Name: Aly, dtype: object
Retrieve the rows where 'Last Name' are Aly, Steve, Mo
emp_info.loc[['Aly','Steve','Mo']]
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Steve | Chanel | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Retrieve the rows from 'Last Name' Aly to Mo
emp_info.loc['Aly':'Mo']
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Ismail | Heba | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Retrieve all the rows up until Noor
emp_info.loc[:'Noor']
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Ismail | Heba | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Moe | Mike | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
Moran | Noah | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Noor | Kate | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Randomly select a sample from the DataFrame
emp_info.sample(n=5,axis=0)
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Steve | Chanel | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Aly | Laila | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Noor | Kate | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Moran | Noah | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Randomly return 30% of the rows in the DataFrame
emp_info.sample(frac=0.3,axis=0)
First Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|
Last Name | |||||
Keller | Nina | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
Noor | Kate | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Mo | Samer | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
Task 8. Integer Index-Based Elements Selection from a DataFrame (.iloc)
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info.iloc[9]
First Name Noah Last Name Moran Salary 8626.96 Years with Company 11 Postal Code K2D 4M9 Email guutodi@bigwoc.kw Name: 9, dtype: object
Access multiple rows, from index 2 to 4
emp_info.iloc[2:5]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
Access all rows up until index 4
emp_info.iloc[:4]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
Access rows with index 2, 4, 9
emp_info.iloc[[2,4,9]]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Slicing a piece of the DataFrame by selecting which columns to display
emp_info.iloc[4,0:3]
First Name Kate Last Name Noor Salary 5000.0 Name: 4, dtype: object
Select the last 2 rows of the DataFrame
emp_info.iloc[8:]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
#OR
emp_info.iloc[-2:]
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Task 9. Broadcasting Operations and Setting New DataFrame Values
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Increase all Salaries by 1000$
emp_info['Salary']=emp_info['Salary']+1000
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 6000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 11000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 10072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 12072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 6000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 101000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 51000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 21000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 3629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 9626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Convert the Salaries from $ to MAD
emp_info['Salary [MAD]'] = emp_info['Salary']*10.06
#to show numbers without scientific notation ->
pd.set_option('display.float_format', lambda x: '{:.2f}'.format(x))
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | Salary [MAD] | ||
---|---|---|---|---|---|---|---|
0 | Mike | Moe | 6000.00 | 3 | N94 3M0 | bird@gmail.com | 60360.00 |
1 | Noah | Ryan | 11000.00 | 8 | N8S 14K | nsmall@hotmail.com | 110660.00 |
2 | Nina | Keller | 10072.02 | 17 | S1T 4E6 | azikez@gahew.mr | 101324.52 |
3 | Chanel | Steve | 12072.02 | 12 | N7T 3E6 | chanel@gmail.com | 121444.52 |
4 | Kate | Noor | 6000.00 | 23 | K8N 5H6 | kate@hotmail.com | 60360.00 |
5 | Samer | Mo | 101000.00 | 13 | J7H 3HY | samer@gmail.com | 1016060.00 |
6 | Heba | Ismail | 51000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com | 513060.00 |
7 | Laila | Aly | 21000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com | 211260.00 |
8 | Joseph | Patton | 3629.13 | 2 | M6U 5U7 | daafeja@boh.jm | 36509.05 |
9 | Noah | Moran | 9626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw | 96847.22 |
Update the email of Kate Noor
emp_info.iloc[4, 5] = 'kate.noor@gmail.com'
emp_info.loc[4,'Email']
'kate.noor@gmail.com'
Increase the salaries by 12% and return the Sum of all employees salaries
emp_info['Salary']=emp_info['Salary']*1.12
emp_info['Salary'].sum().round(2)
259168.15
Task 10. Sorting and Ordering
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
Sort the DataFrame by seniority
emp_info.sort_values(by='Years with Company', ascending=False, inplace=True)
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
Sort the DataFrame using the index
emp_info.sort_index(inplace=True)
emp_info
First Name | Last Name | Salary | Years with Company | Postal Code | ||
---|---|---|---|---|---|---|
0 | Mike | Moe | 5000.00 | 3 | N94 3M0 | bird@gmail.com |
1 | Noah | Ryan | 10000.00 | 8 | N8S 14K | nsmall@hotmail.com |
2 | Nina | Keller | 9072.02 | 17 | S1T 4E6 | azikez@gahew.mr |
3 | Chanel | Steve | 11072.02 | 12 | N7T 3E6 | chanel@gmail.com |
4 | Kate | Noor | 5000.00 | 23 | K8N 5H6 | kate@hotmail.com |
5 | Samer | Mo | 100000.00 | 13 | J7H 3HY | samer@gmail.com |
6 | Heba | Ismail | 50000.00 | 7 | K8Y 3M8 | heba.ismail@hotmail.com |
7 | Laila | Aly | 20000.00 | 5 | J8Y 3M0 | Laila.a@hotmail.com |
8 | Joseph | Patton | 2629.13 | 2 | M6U 5U7 | daafeja@boh.jm |
9 | Noah | Moran | 8626.96 | 11 | K2D 4M9 | guutodi@bigwoc.kw |