Simon Tan
Musings of a curious developer ت
Introduction to Data Science
Lesson 1 of CS109 - Understanding what Data Science is all about.

As expected, the first lesson is about why one should study data science, a high level overview of what is known as The Data Science Process and some CS109 specific details (like when the labs are held, homeworks assignments etc.) that are not applicable to me.

Why data science?

Basically for crazy good job prospects! But jokes aside, you should have other motivations for wanting to learn data science and not just do it for the money. It is going to be a long and arduous journey and honestly, money can be easier had elsewhere! I, too, have a few reasons why I chose to learn data science.

The Data Science Process

Data Science Process

The 5 steps in the image above gives us a rough idea of what data science is all about; asking interesting questions, getting the data that you need, modelling it and seeing if your hypothesis made sense and communicating the results. It should be stressed that you do not need to strictly follow the order as shown above. For instance, sometimes you might be given some data and be asked to find interesting patterns.

StepSimon’s TL;DR
Ask an interesting questionWhat do you want to achieve from analysing these data?
Get the DataIf you were provided the data, you want to ask questions like How was the data sampled?, What data is missing?, What are the relevant attributes?, Are there any privacy issues?. However, if you were not provided with any data then you are more interested in acquisition of the data that you need to answer your question and how to avoid biases. You will also be cleaning the data here.
Explore the DataExploring the data is all about plotting it out, looking for patterns, anomalies, outliers etc.
Model the DataBuild, fit and validate the model (whatever that means).
Communicate / Visualise the dataTo share with others in an effective manner about the findings you discovered by communicating the information through any form of visual medium like graphs, diagrams, plots etc.

Aaaand that is about it for the first lesson. If you noticed any mistakes, please let me know! I am also still trying to figure out a better way of communicating my thoughts and learning so if you have any tips, do share them my way!

Last modified on 11 October 2020.
Attributions, if any, can be found here.