1.3 - Exploratory Data Analysis - EDA
๐
This step helps us to analyze the data to understand patterns, relationships, and insights.
Its time to visualize the data using matplotlib and seaborn libraries in python.
Techniques like visualizations (histograms, scatter plots), summary statistics, correlation analysis can be used to find patterns and insights of the dataset.
To find unique values in our dataset
for col in employee_data.columns:
print(f'{col}: ', employee_data[col].unique())
output
# Employee ID: [ 8410 64756 30257 ... 12409 9554 73042]
# Age: [31 59 24 ... 22 32]
# Gender: ['Male' 'Female']
# Years at Company: [19 15 ... 50 51]
# Job Role: ['Education' 'Media' 'Healthcare' 'Technology' 'Finance']
# Monthly Income: [ 5390 5534 8159 ... 11854 11558 12651]
# Work-Life Balance: ['Excellent' 'Poor' 'Good' 'Fair']
# Job Satisfaction: ['Medium' 'High' 'Very High' 'Low']
# Performance Rating: ['Average' 'Low' 'High' 'Below Average']
# Number of Promotions: [2 3 0 1 4]
# Overtime: ['No' 'Yes']
# Distance from Home: [22 21 .... 66]
# Education Level: ['Associate Degree' 'Masterโs Degree' 'Bachelorโs Degree' 'High School'
'PhD']
# Marital Status: ['Married' 'Divorced' 'Single']
# Number of Dependents: [0 3 2 4 1 5 6]
# Job Level: ['Mid' 'Senior' 'Entry']
# Company Size: ['Medium' 'Small' 'Large']
# Company Tenure: [ 89 21 .... 126 128]
# Remote Work: ['No' 'Yes']
# Leadership Opportunities: ['No' 'Yes']
# Innovation Opportunities: ['No' 'Yes']
# Company Reputation: ['Excellent' 'Fair' 'Poor' 'Good']
# Employee Recognition: ['Medium' 'Low' 'High' 'Very High']
# Attrition: ['Stayed' 'Left']
Define input features and output target variables
X = employee_data.drop(['Employee ID', 'Attrition'])
y = employee_data['Attrition']
X.shape # (74498, 22)
y.shape # (74498,)