Mastering Data Analysis with Pandas #2


Task 1. Define a Pandas Dataframe

In [1]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd

Create the DataFrame

In [2]:
employee_df=pd.DataFrame({'Employee ID':[1,2,3,4,],
                         'Employee Name':['Fatima Zahra El Amrani','Ahmed Benali','Amina Oulad Chaib','Youssef Bouhaddou'],
                         'Annual Salary [MAD]':[80000,105000,143000,112000],
                         'Years with Company':[5,7,10,1]})
employee_df
Out[2]:
Employee ID Employee Name Annual Salary [MAD] Years with Company
0 1 Fatima Zahra El Amrani 80000 5
1 2 Ahmed Benali 105000 7
2 3 Amina Oulad Chaib 143000 10
3 4 Youssef Bouhaddou 112000 1

Confirm it's a DataFrame

In [3]:
type(employee_df)
Out[3]:
pandas.core.frame.DataFrame

Obtain the Shape of the DataFrame

In [4]:
employee_df.shape
Out[4]:
(4, 4)

Obtain the Info of the DataFrame

In [5]:
employee_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Employee ID          4 non-null      int64 
 1   Employee Name        4 non-null      object
 2   Annual Salary [MAD]  4 non-null      int64 
 3   Years with Company   4 non-null      int64 
dtypes: int64(3), object(1)
memory usage: 260.0+ bytes

Calculate the Total Value of the Portfolio including all stocks of the Portfolio DataSet below

In [6]:
#Create the DataSet
portfolio_df=pd.DataFrame({
    'stock ticker symbol':['FB','TSLA','T'],
    'price per share [$]': [320,360,400],
    'number of stocks':[4,7,10]
})
portfolio_df
Out[6]:
stock ticker symbol price per share [$] number of stocks
0 FB 320 4
1 TSLA 360 7
2 T 400 10
In [7]:
#Calculate the stocks dollar value
stocks_dollar_value=portfolio_df['price per share [$]']*portfolio_df['number of stocks']
stocks_dollar_value
Out[7]:
0    1280
1    2520
2    4000
dtype: int64
In [8]:
#Calculate the total value of the portfolio
stocks_dollar_value.sum()
Out[8]:
7800

Task 2. Read CSV and HTML Data

Import csv file

In [9]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
Out[9]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Import from html

In [10]:
can_houses=pd.read_html('https://www.livingin-canada.com/house-prices-canada.html')
can_houses[0]
Out[10]:
City Average House Price 12 Month Change
0 Vancouver, BC $1,036,000 + 2.63 %
1 Toronto, Ont $870,000 +10.2 %
2 Ottawa, Ont $479,000 + 15.4 %
3 Calgary, Alb $410,000 – 1.5 %
4 Montreal, Que $435,000 + 9.3 %
5 Halifax, NS $331,000 + 3.6 %
6 Regina, Sask $254,000 – 3.9 %
7 Fredericton, NB $198,000 – 4.3 %
8 (adsbygoogle = window.adsbygoogle || []).push(... (adsbygoogle = window.adsbygoogle || []).push(... (adsbygoogle = window.adsbygoogle || []).push(...

Export to csv

In [11]:
can_houses[0].to_csv('00-can_houses.csv', index=True)

Task 3. Write DataFrame into CSV

Create the DataFrame

In [12]:
employee_df=pd.DataFrame({
    'Employee ID':[1,2,3,4],
    'Employee Name':['Laila Aly','Kate Steve','Nicole Mitch','Francis Morris'],
    'Annual Salary [$]':[35000,40000,100000,25000],
    'Years with Company':[5,7,10,1]
})
employee_df
Out[12]:
Employee ID Employee Name Annual Salary [$] Years with Company
0 1 Laila Aly 35000 5
1 2 Kate Steve 40000 7
2 3 Nicole Mitch 100000 10
3 4 Francis Morris 25000 1

Export to csv

In [13]:
employee_df.to_csv('/Users/mekki/Desktop/employeedf.csv', index=True)

Task 4. Setting and Resetting Index

Read csv

In [14]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
Out[14]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Assign a specific column to as Index

In [15]:
emp_info.set_index('First Name',inplace=True)
emp_info
Out[15]:
Last Name Salary Years with Company Postal Code Email
First Name
Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Reset Index

In [16]:
emp_info.reset_index(inplace=True)
emp_info
Out[16]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Set Index directly when reading the CSV

In [17]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv',index_col='First Name')
emp_info
Out[17]:
Last Name Salary Years with Company Postal Code Email
First Name
Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Task 5. Selecting Columns from a DataFrame

Return column in the DataFrame

In [18]:
email=emp_info['Email']
email
Out[18]:
First Name
Mike                bird@gmail.com
Noah            nsmall@hotmail.com
Nina               azikez@gahew.mr
Chanel            chanel@gmail.com
Kate              kate@hotmail.com
Samer              samer@gmail.com
Heba       heba.ismail@hotmail.com
Laila          Laila.a@hotmail.com
Joseph              daafeja@boh.jm
Noah             guutodi@bigwoc.kw
Name: Email, dtype: object

Identify datatype

In [19]:
type(email)
Out[19]:
pandas.core.series.Series

Select Multiple columns from DataFrame

In [20]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
name_salary=emp_info[['First Name','Salary']]
name_salary
Out[20]:
First Name Salary
0 Mike 5000.00
1 Noah 10000.00
2 Nina 9072.02
3 Chanel 11072.02
4 Kate 5000.00
5 Samer 100000.00
6 Heba 50000.00
7 Laila 20000.00
8 Joseph 2629.13
9 Noah 8626.96

Select Multiple columns from DataFrame using a List

In [21]:
ns=['First Name','Salary']
name_salary_1=emp_info[ns]
name_salary_1
Out[21]:
First Name Salary
0 Mike 5000.00
1 Noah 10000.00
2 Nina 9072.02
3 Chanel 11072.02
4 Kate 5000.00
5 Samer 100000.00
6 Heba 50000.00
7 Laila 20000.00
8 Joseph 2629.13
9 Noah 8626.96

Access a given Row in the DataFrame

In [22]:
#First 3 Rows of Data
emp_info[0:3]
Out[22]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr

Task 6. Add and Delete DataFrame Columns

Add the 'Age' Column

In [23]:
emp_info['Age']=[25,26,28,30,36,22,48,55,70,69]
emp_info
Out[23]:
First Name Last Name Salary Years with Company Postal Code Email Age
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com 25
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com 26
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr 28
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com 30
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com 36
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com 22
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com 48
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com 55
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm 70
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw 69

Add 'Credit Score' Column as the first column

In [24]:
emp_info.insert(0,column='Credit Score',value=[680,700,750,699,550,600,750,500,520,510])
emp_info
Out[24]:
Credit Score First Name Last Name Salary Years with Company Postal Code Email Age
0 680 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com 25
1 700 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com 26
2 750 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr 28
3 699 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com 30
4 550 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com 36
5 600 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com 22
6 750 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com 48
7 500 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com 55
8 520 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm 70
9 510 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw 69

Delete 'Email' column using del

In [25]:
del emp_info['Email']
emp_info
Out[25]:
Credit Score First Name Last Name Salary Years with Company Postal Code Age
0 680 Mike Moe 5000.00 3 N94 3M0 25
1 700 Noah Ryan 10000.00 8 N8S 14K 26
2 750 Nina Keller 9072.02 17 S1T 4E6 28
3 699 Chanel Steve 11072.02 12 N7T 3E6 30
4 550 Kate Noor 5000.00 23 K8N 5H6 36
5 600 Samer Mo 100000.00 13 J7H 3HY 22
6 750 Heba Ismail 50000.00 7 K8Y 3M8 48
7 500 Laila Aly 20000.00 5 J8Y 3M0 55
8 520 Joseph Patton 2629.13 2 M6U 5U7 70
9 510 Noah Moran 8626.96 11 K2D 4M9 69

Delete 'Last Name' and 'Salary' columns using drop

In [26]:
emp_info.drop(labels=['Last Name','Salary'], axis=1, inplace = True)
emp_info
Out[26]:
Credit Score First Name Years with Company Postal Code Age
0 680 Mike 3 N94 3M0 25
1 700 Noah 8 N8S 14K 26
2 750 Nina 17 S1T 4E6 28
3 699 Chanel 12 N7T 3E6 30
4 550 Kate 23 K8N 5H6 36
5 600 Samer 13 J7H 3HY 22
6 750 Heba 7 K8Y 3M8 48
7 500 Laila 5 J8Y 3M0 55
8 520 Joseph 2 M6U 5U7 70
9 510 Noah 11 K2D 4M9 69

Remove a column from a DataFrame and Store it somewhere else using pop

In [27]:
Years_with_Company=emp_info.pop('Years with Company')
Years_with_Company
Out[27]:
0     3
1     8
2    17
3    12
4    23
5    13
6     7
7     5
8     2
9    11
Name: Years with Company, dtype: int64

Confirm changes

In [28]:
emp_info
Out[28]:
Credit Score First Name Postal Code Age
0 680 Mike N94 3M0 25
1 700 Noah N8S 14K 26
2 750 Nina S1T 4E6 28
3 699 Chanel N7T 3E6 30
4 550 Kate K8N 5H6 36
5 600 Samer J7H 3HY 22
6 750 Heba K8Y 3M8 48
7 500 Laila J8Y 3M0 55
8 520 Joseph M6U 5U7 70
9 510 Noah K2D 4M9 69

Task 7. Label-Based Elements Selection from a DataFrame(.loc)

In [29]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv',index_col='Last Name')
emp_info
Out[29]:
First Name Salary Years with Company Postal Code Email
Last Name
Moe Mike 5000.00 3 N94 3M0 bird@gmail.com
Ryan Noah 10000.00 8 N8S 14K nsmall@hotmail.com
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Steve Chanel 11072.02 12 N7T 3E6 chanel@gmail.com
Noor Kate 5000.00 23 K8N 5H6 kate@hotmail.com
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com
Ismail Heba 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Patton Joseph 2629.13 2 M6U 5U7 daafeja@boh.jm
Moran Noah 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Sort the DataFrame in alphabetical order

In [30]:
emp_info.sort_index(inplace=True)
emp_info
Out[30]:
First Name Salary Years with Company Postal Code Email
Last Name
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Ismail Heba 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com
Moe Mike 5000.00 3 N94 3M0 bird@gmail.com
Moran Noah 8626.96 11 K2D 4M9 guutodi@bigwoc.kw
Noor Kate 5000.00 23 K8N 5H6 kate@hotmail.com
Patton Joseph 2629.13 2 M6U 5U7 daafeja@boh.jm
Ryan Noah 10000.00 8 N8S 14K nsmall@hotmail.com
Steve Chanel 11072.02 12 N7T 3E6 chanel@gmail.com

Retrieve the rows where 'Last Name' is Aly

In [31]:
emp_info.loc['Aly']
Out[31]:
First Name                          Laila
Salary                            20000.0
Years with Company                      5
Postal Code                       J8Y 3M0
Email                 Laila.a@hotmail.com
Name: Aly, dtype: object

Retrieve the rows where 'Last Name' are Aly, Steve, Mo

In [32]:
emp_info.loc[['Aly','Steve','Mo']]
Out[32]:
First Name Salary Years with Company Postal Code Email
Last Name
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Steve Chanel 11072.02 12 N7T 3E6 chanel@gmail.com
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com

Retrieve the rows from 'Last Name' Aly to Mo

In [33]:
emp_info.loc['Aly':'Mo']
Out[33]:
First Name Salary Years with Company Postal Code Email
Last Name
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Ismail Heba 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com

Retrieve all the rows up until Noor

In [34]:
emp_info.loc[:'Noor']
Out[34]:
First Name Salary Years with Company Postal Code Email
Last Name
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Ismail Heba 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com
Moe Mike 5000.00 3 N94 3M0 bird@gmail.com
Moran Noah 8626.96 11 K2D 4M9 guutodi@bigwoc.kw
Noor Kate 5000.00 23 K8N 5H6 kate@hotmail.com

Randomly select a sample from the DataFrame

In [35]:
emp_info.sample(n=5,axis=0)
Out[35]:
First Name Salary Years with Company Postal Code Email
Last Name
Steve Chanel 11072.02 12 N7T 3E6 chanel@gmail.com
Aly Laila 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Noor Kate 5000.00 23 K8N 5H6 kate@hotmail.com
Moran Noah 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Randomly return 30% of the rows in the DataFrame

In [36]:
emp_info.sample(frac=0.3,axis=0)
Out[36]:
First Name Salary Years with Company Postal Code Email
Last Name
Keller Nina 9072.02 17 S1T 4E6 azikez@gahew.mr
Noor Kate 5000.00 23 K8N 5H6 kate@hotmail.com
Mo Samer 100000.00 13 J7H 3HY samer@gmail.com

Task 8. Integer Index-Based Elements Selection from a DataFrame (.iloc)

In [37]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
In [38]:
emp_info.iloc[9]
Out[38]:
First Name                         Noah
Last Name                         Moran
Salary                          8626.96
Years with Company                   11
Postal Code                     K2D 4M9
Email                 guutodi@bigwoc.kw
Name: 9, dtype: object

Access multiple rows, from index 2 to 4

In [39]:
emp_info.iloc[2:5]
Out[39]:
First Name Last Name Salary Years with Company Postal Code Email
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com

Access all rows up until index 4

In [40]:
emp_info.iloc[:4]
Out[40]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com

Access rows with index 2, 4, 9

In [41]:
emp_info.iloc[[2,4,9]]
Out[41]:
First Name Last Name Salary Years with Company Postal Code Email
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Slicing a piece of the DataFrame by selecting which columns to display

In [42]:
emp_info.iloc[4,0:3]
Out[42]:
First Name      Kate
Last Name       Noor
Salary        5000.0
Name: 4, dtype: object

Select the last 2 rows of the DataFrame

In [43]:
emp_info.iloc[8:]
Out[43]:
First Name Last Name Salary Years with Company Postal Code Email
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw
In [44]:
#OR
emp_info.iloc[-2:]
Out[44]:
First Name Last Name Salary Years with Company Postal Code Email
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Task 9. Broadcasting Operations and Setting New DataFrame Values

In [45]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
Out[45]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Increase all Salaries by 1000$

In [46]:
emp_info['Salary']=emp_info['Salary']+1000
emp_info
Out[46]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 6000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 11000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 10072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 12072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 6000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 101000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 51000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 21000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 3629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 9626.96 11 K2D 4M9 guutodi@bigwoc.kw

Convert the Salaries from $ to MAD

In [47]:
emp_info['Salary [MAD]'] = emp_info['Salary']*10.06
#to show numbers without scientific notation ->
pd.set_option('display.float_format', lambda x: '{:.2f}'.format(x))
emp_info
Out[47]:
First Name Last Name Salary Years with Company Postal Code Email Salary [MAD]
0 Mike Moe 6000.00 3 N94 3M0 bird@gmail.com 60360.00
1 Noah Ryan 11000.00 8 N8S 14K nsmall@hotmail.com 110660.00
2 Nina Keller 10072.02 17 S1T 4E6 azikez@gahew.mr 101324.52
3 Chanel Steve 12072.02 12 N7T 3E6 chanel@gmail.com 121444.52
4 Kate Noor 6000.00 23 K8N 5H6 kate@hotmail.com 60360.00
5 Samer Mo 101000.00 13 J7H 3HY samer@gmail.com 1016060.00
6 Heba Ismail 51000.00 7 K8Y 3M8 heba.ismail@hotmail.com 513060.00
7 Laila Aly 21000.00 5 J8Y 3M0 Laila.a@hotmail.com 211260.00
8 Joseph Patton 3629.13 2 M6U 5U7 daafeja@boh.jm 36509.05
9 Noah Moran 9626.96 11 K2D 4M9 guutodi@bigwoc.kw 96847.22

Update the email of Kate Noor

In [48]:
emp_info.iloc[4, 5] = 'kate.noor@gmail.com'
emp_info.loc[4,'Email']
Out[48]:
'kate.noor@gmail.com'

Increase the salaries by 12% and return the Sum of all employees salaries

In [49]:
emp_info['Salary']=emp_info['Salary']*1.12
emp_info['Salary'].sum().round(2)
Out[49]:
259168.15

Task 10. Sorting and Ordering

In [50]:
emp_info=pd.read_csv('/Users/mekki/Python_Projects_Datasets/Mastering Data Analysis with Pandas Part 2/00-employee_information.csv')
emp_info
Out[50]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw

Sort the DataFrame by seniority

In [51]:
emp_info.sort_values(by='Years with Company', ascending=False, inplace=True)
emp_info
Out[51]:
First Name Last Name Salary Years with Company Postal Code Email
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm

Sort the DataFrame using the index

In [52]:
emp_info.sort_index(inplace=True)
emp_info
Out[52]:
First Name Last Name Salary Years with Company Postal Code Email
0 Mike Moe 5000.00 3 N94 3M0 bird@gmail.com
1 Noah Ryan 10000.00 8 N8S 14K nsmall@hotmail.com
2 Nina Keller 9072.02 17 S1T 4E6 azikez@gahew.mr
3 Chanel Steve 11072.02 12 N7T 3E6 chanel@gmail.com
4 Kate Noor 5000.00 23 K8N 5H6 kate@hotmail.com
5 Samer Mo 100000.00 13 J7H 3HY samer@gmail.com
6 Heba Ismail 50000.00 7 K8Y 3M8 heba.ismail@hotmail.com
7 Laila Aly 20000.00 5 J8Y 3M0 Laila.a@hotmail.com
8 Joseph Patton 2629.13 2 M6U 5U7 daafeja@boh.jm
9 Noah Moran 8626.96 11 K2D 4M9 guutodi@bigwoc.kw