In my work as a Data Analyst at InDataWeTrust, Exploratory Data Analysis (EDA) is a routine part of my role. I firmly believe this phase is crucial as it allows us to make informed decisions regarding the type of machine learning model to be deployed.
On one project, we were tasked with building a predictive model for a marketing campaign. We began with EDA using Python libraries like Pandas, NumPy, and Matplotlib for data cleaning, manipulation, and visualization. Through this process, we identified certain trends and patterns, including the fact that our dataset had a categorical target variable. Further observations revealed that certain features exhibited significant influence over the target variable demonstrating non-linearity and complex interactions.
These insights from our EDA stage led us to choose a Random Forest classifier, which could handle these complexities effectively. As a result, our model demonstrated high accuracy and allowed our client to effectively prioritize their marketing efforts, leading to improved conversion rates.