The Statistics of Data Science

Raine Cat
2 min readSep 30, 2020

Being a Data Scientist means having a collection of skills. Programming is the first skill that pops up our mind when we hear about Data Science. Meanwhile, domain knowledge is one of the most important skills as it is where we apply our findings when formulating a data driven decision making.

What about STATISTICS?

Statistics is one of the foundations of data science. Understanding the underlying principles of statistics is the key to gain expertise in data science. When analyzing data we make use of statistical methods to test the hypothesis from a derived problem.

In understanding Statistics, it is always important to get used to the topic of probability as this is the foundation of statistics. Bear in mind that in statistics we are actually calculating the probability that our hypothesis is correct and therefore, acceptable.

Bayesian Inference are heavily used in the field of mathematics and mostly medical researches as it gives mathematical intuition on the causality of events or variables.

Meanwhile, Distribution is highly used in data science. If a data behaves in a way that follows a certain attribute, then the chance of obtaining these values are similar with a specific distribution.

The importance of DATA PREPARATION

It is important to clean the data before actually applying statistical techniques or deploying it into a model. Uncleaned data will affect the overall result of our test. For instance, if we fail to remove data with null input, imagine the effect of it on our calculated mean and other measures of central tendency. If there are a number of zero values in our data set, our mean would be lower and it would be bias. Remember that a bias model is same as having a wrong model. Same effect with having a number of data with unrealistically high values. Let’s take a survey as an example, a single mistyped age (let’s say age ‘200’ instead of age ‘20’) would affect our mean and of course our measure of data dispersion.

--

--

Raine Cat

Licensed Electronics Engineer | Aspiring Data Scientist | a little bit stuck in the MiddlE | the name is an oxymoron