Mekki MOURADI

Data Analyst

Rabat, Maroc

À Propos

Data Analyst Junior avec une solide expérience en marketing numérique et des compétences approfondies dans l’utilisation de Microsoft Excel, SQL, Tableau Software et Python.

Using SQL String Functions to Clean Data

Liens

Compétences & Outils

Description du projet

Apprendre à manipuler les données via les Fonctions de Chaînes de Caractères :

  • LENGTH, LEFT, RIGHT
  • UPPER, LOWER
  • REPLACE
  • TRIM, LTRIM, RTRIM
  • CONCATENATION
  • SUBSTRING
  • STRING AGGREGATION
  • COALESCE

Objectifs de l'analyse

  • Utiliser les Fonctions de Chaînes de Caractères pour modifier et nettoyer les données.
  1. LENGTH, LEFT and RIGHT
  1. Find the Length of the first name of Male Employees
				
					SELECT emp_no, first_name, LENGTH(first_name) AS characters_number
FROM employees
WHERE gender='M'
LIMIT 10;
				
			
emp_nofirst_namecharacters_number
10003Parto5
10004Chirstian9
10005Kyoichi7
10008Saniya6
10012Patricio8
10013Eberhardt9
10014Berni5
10015Guoxiang8
10016Kazuhito8
10019Lillian7
  1. Find the length of the first name of male employees where the length of the first name is greater than 5
				
					SELECT emp_no, LENGTH(first_name) AS characters_number
FROM employees
WHERE gender='M' AND LENGTH(first_name) > 5
LIMIT 10;
				
			
emp_nocharacters_number
100049
100057
100086
100128
100139
100158
100168
100197
100206
100226
  1. Retrieve a list of the customer group of all customers
				
					SELECT customer_id, LEFT(customer_id, 2) AS customer_group
FROM customers
LIMIT 10;
				
			
customer_idcustomer_group
CG-12520CG
DV-13045DV
SO-20335SO
BH-11710BH
AA-10480AA
IM-15070IM
HP-14815HP
PK-19075PK
AG-10270AG
ZD-21925ZD
  1. Retrieve a lsit of the customer number of all customers
				
					SELECT customer_id, RIGHT(customer_id, 5) AS customer_number
FROM customers
LIMIT 10;
				
			
customer_idcustomer_number
CG-1252012520
DV-1304513045
SO-2033520335
BH-1171011710
AA-1048010480
IM-1507015070
HP-1481514815
PK-1907519075
AG-1027010270
ZD-2192521925
  1. Retrieve a lsit of the customer number of all customers using length
				
					SELECT customer_id, RIGHT(customer_id, LENGTH(customer_id)-3) AS customer_number
FROM customers
LIMIT 10;
				
			
customer_idcustomer_number
CG-1252012520
DV-1304513045
SO-2033520335
BH-1171011710
AA-1048010480
IM-1507015070
HP-1481514815
PK-1907519075
AG-1027010270
ZD-2192521925
  1. UPPER and LOWER
  1. Change the first name of the first employee to uppercase letters
				
					SELECT emp_no, first_name
FROM employees
LIMIT 1;
				
			
emp_nofirst_name
10001Georgi
				
					-- start a transaction
BEGIN
-- updating to upper case
UPDATE employees
SET first_name = UPPER(first_name)
WHERE emp_no='10001'
-- confirming edits
SELECT emp_no, first_name
FROM employees
WHERE emp_no='10001'
-- ROLLBACK to the previous step to not alter the dataset
ROLLBACK;
				
			
emp_nofirst_name
10001GEORGI
  1. REPLACE
  1. Replace M to Male and F to Female
				
					SELECT  first_name,
        last_name,
        gender,
        REPLACE(REPLACE(gender, 'M', 'Male'),'F','Female') AS new_gender
FROM employees
LIMIT 10;
				
			
first_namelast_namegendernew_gender
BezalelSimmelFFemale
PartoBamfordMMale
ChirstianKoblickMMale
KyoichiMaliniakMMale
AnnekePreusigFFemale
TzvetanZielinskiFFemale
SaniyaKalloufiMMale
SumantPeacFFemale
DuangkaewPiveteauFFemale
MarySluisFFemale
  1. SQL is case sensitive so to make it easier, we can use String Functions like LOWER to replace data
				
					SELECT  customer_name,
        region,
        country,
        REPLACE(LOWER(country),'united states','US') AS new_country
FROM customers
LIMIT 10;
				
			
customer_nameregioncountrynew_country
Claire GuteSouthUnited StatesUS
Darrin Van HuffWestUnited StatesUS
Sean O'DonnellSouthUnited StatesUS
Brosina HoffmanWestUnited StatesUS
Andrew AllenSouthUnited StatesUS
Irene MaddoxWestUnited StatesUS
Harold PawlanCentralUnited StatesUS
Pete KrizCentralUnited StatesUS
Alejandro GroveWestNULLNULL
Zuschuss DonatelliWestUnited StatesUS
  1. TRIM, LTRIM and RTRIM

TRIM

				
					-- trim all spaces
SELECT TRIM(' ' FROM '   trim this     ')
				
			

LEFT TRIM

				
					-- trim all spaces in the left
SELECT LTRIM ('   trim this     ')
--OR
SELECT TRIM(leading ' ' FROM '   trim this     ')
				
			

RIGHT TRIM

				
					-- trim all spaces in the left
SELECT RTRIM ('   trim this     ')
--OR
SELECT TRIM(trailing ' ' FROM '   trim this     ')
				
			
  1. Trim the () from the bracket_cust_id column
				
					SELECT bracket_cust_id, TRIM(bracket_cust_id,'()') AS cleaned_cust_id
FROM customers
LIMIT 10;
				
			
bracket_cust_idcleaned_cust_id
(CG-12520)CG-12520
(DV-13045)DV-13045
(SO-20335)SO-20335
(BH-11710)BH-11710
(AA-10480)AA-10480
(IM-15070)IM-15070
(HP-14815)HP-14815
(PK-19075)PK-19075
(AG-10270)AG-10270
(ZD-21925)ZD-21925
  1. CONCATENATION
  1. Create a new column called full_name from the first and last name of employees
				
					SELECT  first_name,
        last_name,
        (first_name || ' ' || last_name) AS full_name
FROM employees
LIMIT 10;
				
			
first_namelast_namefull_name
BezalelSimmelBezalel Simmel
PartoBamfordParto Bamford
ChirstianKoblickChirstian Koblick
KyoichiMaliniakKyoichi Maliniak
AnnekePreusigAnneke Preusig
TzvetanZielinskiTzvetan Zielinski
SaniyaKalloufiSaniya Kalloufi
SumantPeacSumant Peac
DuangkaewPiveteauDuangkaew Piveteau
MarySluisMary Sluis
  1. Create a new column called address from the city, state and country of customers
				
					SELECT  city||', '||state||', '||country) AS address,
        city,
        state,
        country
FROM customers
LIMIT 10;
				
			
addresscitystatecountry
Henderson, Kentucky, United StatesHendersonKentuckyUnited States
Los Angeles, California, United StatesLos AngelesCaliforniaUnited States
Fort Lauderdale, Florida, United StatesFort LauderdaleFloridaUnited States
Los Angeles, California, United StatesLos AngelesCaliforniaUnited States
Concord, North Carolina, United StatesConcordNorth CarolinaUnited States
Seattle, Washington, United StatesSeattleWashingtonUnited States
Fort Worth, Texas, United StatesFort WorthTexasUnited States
Madison, Wisconsin, United StatesMadisonWisconsinUnited States
NULLWest JordanUtahNULL
San Francisco, California, United StatesSan FranciscoCaliforniaUnited States
  1. Create a column called desc_age from the customers name and age
				
					SELECT  (customer_name||', '||age||' years old.') AS desc_age,
        customer_name,
        age
FROM customers
LIMIT 10;
				
			
desc_agecustomer_nameage
Claire Gute, 67 years old.Claire Gute67
Darrin Van Huff, 31 years old.Darrin Van Huff31
Sean O'Donnell, 65 years old.Sean O'Donnell65
Brosina Hoffman, 20 years old.Brosina Hoffman20
Andrew Allen, 50 years old.Andrew Allen50
Irene Maddox, 66 years old.Irene Maddox66
Harold Pawlan, 20 years old.Harold Pawlan20
Pete Kriz, 46 years old.Pete Kriz46
Alejandro Grove, 18 years old.Alejandro Grove18
Zuschuss Donatelli, 66 years old.Zuschuss Donatelli66
  1. SUBSTRING
  1. Retrieve the ids, names, and groups of customers
				
					SELECT  customer_id,
		customer_name,
		SUBSTRING(customer_id FOR 2) AS customer_group
FROM customers
LIMIT 10;
				
			
customer_idcustomer_namecustomer_group
CG-12520Claire GuteCG
DV-13045Darrin Van HuffDV
SO-20335Sean O'DonnellSO
BH-11710Brosina HoffmanBH
AA-10480Andrew AllenAA
IM-15070Irene MaddoxIM
HP-14815Harold PawlanHP
PK-19075Pete KrizPK
AG-10270Alejandro GroveAG
ZD-21925Zuschuss DonatelliZD
  1. Retrieve the ids and names of the customers in the customer group 'AB'
				
					SELECT  customer_id,
		customer_name,
		SUBSTRING(customer_id FOR 2) AS customer_group
FROM customers
WHERE SUBSTRING(customer_id FOR 2) = 'AB';
				
			
customer_idcustomer_namecustomer_group
AB-10060Adam BellavanceAB
AB-10165Alan BarnesAB
AB-10255Alejandro BallentineAB
AB-10600Ann BlumeAB
AB-10105Adrian BartonAB
AB-10150Aimee BixbyAB
AB-10015Aaron BergmanAB
  1. Retrieve the ids, names, and customer number of customers in the custom group 'AB'
				
					SELECT  customer_id,
		customer_name,
		SUBSTRING(customer_id FROM 4 FOR 5) AS customer_number
FROM customers
WHERE SUBSTRING(customer_id FOR 2) = 'AB';
				
			
customer_idcustomer_namecustomer_number
AB-10060Adam Bellavance10060
AB-10165Alan Barnes10165
AB-10255Alejandro Ballentine10255
AB-10600Ann Blume10600
AB-10105Adrian Barton10105
AB-10150Aimee Bixby10150
AB-10015Aaron Bergman10015
  1. Retrieve the year of birth for all employees
				
					SELECT  emp_no,
        birth_date,
        SUBSTRING(CAST(birth_date AS VARCHAR) FOR 4) AS year
FROM employees
LIMIT 10;

-- OR

SELECT  emp_no,
        birth_date,
        EXTRACT(YEAR FROM birth_date) AS year
FROM employees
LIMIT 10;
				
			
emp_nobirth_dateyear
100021964-06-021964
100031959-12-031959
100041954-05-011954
100051955-01-211955
100061953-04-201953
100071957-05-231957
100081958-02-191958
100091952-04-191952
100101963-06-011963
100111953-11-071953
  1. STRING AGGREGATION
  1. Retrieve a list of all department numbers for different employees
				
					SELECT  emp_no,
        STRING_AGG(dept_no,', ') AS departments
FROM dept_emp
GROUP BY emp_no
LIMIT 10;
				
			
emp_nodepartments
10001d005
10002d007
10003d004
10004d004
10005d003
10006d005
10007d008
10008d005
10009d006
10010d004, d006
  1. Retrieve a list of all products that were ordered by a customer from the sales table
				
					SELECT  order_id,
        STRING_AGG(product_id,' /// ') AS products_ordered
FROM sales
GROUP BY order_id
				
			
order_idproducts_ordered
CA-2014-100006TEC-PH-10002075
CA-2014-100090OFF-BI-10001597 /// FUR-TA-10003715
CA-2014-100293OFF-PA-10000176
CA-2014-100328OFF-BI-10000343
CA-2014-100363OFF-FA-10000611 /// OFF-PA-10004733
CA-2014-100391OFF-PA-10001471
CA-2014-100678OFF-AR-10001868 /// TEC-AC-10000474 /// OFF-EN-10000056 /// FUR-CH-10002602
CA-2014-100706TEC-AC-10001314 /// FUR-FU-10002268
CA-2014-100762OFF-PA-10001815 /// OFF-AR-10000380 /// OFF-LA-10003930 /// OFF-PA-10004082
CA-2014-100860OFF-LA-10001982
  1. COALESCE
  1. Replace a missing country with the city, state, or no address depending on the data availabe
				
					SELECT  customer_name,
        country,
        city,
        state,
		COALESCE(country,city,state,'No Address') AS customer_address
FROM customers
LIMIT 10;
				
			
customer_namecountrycitystatecustomer_address
Claire GuteUnited StatesHendersonKentuckyUnited States
Darrin Van HuffUnited StatesLos AngelesCaliforniaUnited States
Sean O'DonnellUnited StatesFort LauderdaleFloridaUnited States
Brosina HoffmanUnited StatesLos AngelesCaliforniaUnited States
Andrew AllenUnited StatesConcordNorth CarolinaUnited States
Irene MaddoxUnited StatesSeattleWashingtonUnited States
Harold PawlanUnited StatesFort WorthTexasUnited States
Pete KrizUnited StatesMadisonWisconsinUnited States
Alejandro GroveNULLWest JordanUtahWest Jordan
Zuschuss DonatelliUnited StatesSan FranciscoCaliforniaUnited States