Hey Outliers,
There was a group of newcomers in our meeting yesterday. It was a mix of experienced and people who are just starting their data science journey. EDA was a hot topic during the meet. Exploratory Data Analysis (EDA) is the backbone of every data science task. It is a way to communicate with the data to see what insights can be extracted.
Metaphor Time! If an individual feeling under the weather goes to see the doctor, what does the doctor do first? Do they immediately start running tests (in this case ML models)? No. They first try to figure out what is wrong and they do that by asking questions: “How are you doing today?” “Why are you here today.” This is synonymous to asking the data: what is your distribution like? Do you have any outliers? Are there any correlated variables, etc. These questions can be answered using plots such as histograms, scatter plots, bar graphs, box plots, etc. Based on these plots, we can get an idea of what the next steps are. If a doctor examines the patient and finds that they have a high fever and runny nose, then they might run some COVID tests. If the distribution of a variable is highly right skewed, then maybe a log transformation might be necessary before modeling. This new found information can be used to guide us in making the next steps.
To express the importance of EDA, Tyler (@_meeni_batch_) went over his Kaggle notebook for Titanic. He clearly stated the plots he used, what insights he was able extract just from looking at the visualization and what feature engineering he implemented. Shout out to Tyler for taking the time to present his project!
COMMUNITY UPDATES
BOOK STUDY
Check the Events in Discord to see when the study group is being held!
Mathematics for Machine Learning
Hands on Machine Learning
Interpretable Machine Learning
KAGGLE
HuBMAP + HPA - Hacking the Human Body
PROJECTS
Sabi Sands
A Django app allowing user to upload a content image and a style image. The app will return the content image with the style of the style image.
Julia Programming
COURSES
https://www.coursera.org/specializations/tensorflow-advanced-techniques
https://www.coursera.org/professional-certificates/tensorflow-in-practice
ML NEWS
https://openai.com/blog/dall-e-now-available-in-beta/
PODCASTS
Thank you for participating in our town hall. We also have a Discord community where we discuss things in more details. Let us know in the comments if you are interested in joining the Biased Outliers. Until next time!
Feature Photo by Adeolu Eletu on Unsplash