Beginners Guide to Machine Learning | Linear Regression

5 min readJun 14, 2021

When we enter the world of Machine Learning, most of us start with what it is and then start with Linear Regression. This blog will cover almost everything you need to know about Linear Regression. To read/understand Machine Learning, I’ve created a blog that explains Machine Learning and its type like you’re a 5-year-old. So if you want to read about it, have a look at it here:

Machine Learning ELI5

Have you ever saw a baby since their childhood and observed how they start walking?

kumawatrohan.medium.com

Introduction

Linear Regression is a linear model that assumes a linear model that takes a linear relationship between the input variable (x) and the output variable (y). It attempts to model the relationship between the dependent and independent features by fitting a linear equation to the observed data. Before fitting a linear model to observed data, a modeller should first determine whether or not there is a relationship between the variables of interest.

In layman terms, it is used to understand the correlation between independent features and a dependent feature.

Linear Regression is of two types:

The core idea is to obtain a line that best fits the data.

Intuition

Let’s understand the mathematical approach behind Linear Regression. I’ll start with Simple Linear Regression as it’s easier to explain with diagrams. As I said, the core idea is to obtain the best fit line, and our best fit line equation looks like this:

Let’s consider any two variables, where one is a dependent one, and another one is independent! We have plot both of these on a 2-D scatter plot graph, and we’ve built the best fit line, which passes through most of the points on the chart. This best fit line will help us predict a value when we’ll input some data into it. The best fit line will probably look like this:

How do we create the best fit line?

One option is to create multiple best fit line and then minimise the distance such that summation of all the errors should be minimum.

Is there any formula to do so?

Yes, we usually call it “The Cost Function”.

So, there can be multiple best fit lines. What then, how can we use this equation to find out one best-fit line?

One option would be to draw millions of best fit lines and then find the cost function and then figure out our best fit line. But this would require high computational power unnecessarily. We have got a better and an efficient way to do so. Let’s understand the efficient manner with an example: Here, I’ll pick only three data points so that we can understand them better. We’ve two columns, hours studied and percentage, where the percentage is a dependent feature. We plot that using a scatter plot. Here I’m assuming our intercept to be zero.

Next, we’ll take out the first data point and figure out our slope. Then we’ll take our second data point and figure out our slope, and so on. Now, we’ll calculate our cost function. After these steps, we change our slope slightly, calculate the cost function and then record it in a list. We keep on doing this until unless we minimise the function. After recording/storing these slopes, we plot this on a graph and get a parabolic type graph. This graph is known as Gradient Descent.

Gradient Descent is an optimisation algorithm used to find the parameters’ values that minimise a cost function. It is the backbone of Deep Learning. Once we get this Gradient Descent, when should we stop looking for our slope / “m” value which will look good for our regression or best-fit line?

We need to move towards our global minima. To keep this thing in memory, statisticians have introduced the convergence theorem. The formula looks like this:

These steps show how a simple Linear Regression model works. Even if the no. of feature increases (multiple linear regression), the procedure is the same but the Gradient Descent’s graph changes concerning dimensions. It can be 3D or 4D, or 5D concerning the no of features.

We generally deal with multiple features in real life. Why so? If we take an example of a person working out per day and his weight, we can’t neglect diet, sleep metrics, and other factors affecting a person’s health. Similarly, we can’t determine how much a student will score in his exams based on Hours studied only.

Since we’ve understood the intuition about it now, let’s look at some essential things asked in an interview.

Applications of Linear Regression

Analyse the marketing effectiveness, pricing and promotions on sales of a product.
Generate insights on consumer behaviour, understanding business and factors influencing profitability.
It can be used in business to evaluate trends and make estimates or forecasts.

Assumptions

There is a linear relationship between the dependent and independent variable. It is known as Linearity Assumption.
Errors are equally distributed while performing Linear Regression. It is also known as Homoscedasticity.
The independent variable in the dataset should not exhibit any multicollinearity.

Disadvantages of Linear Regression

Under-fitting.
Over-fitting.
The algorithm is susceptible to outliers.

The blog talks about the intuition behind Linear Regression. To summarise, we need to understand the best-fit line, cost function, gradient descent and convergence theorem. The following blog will be on how to code and predict something by using Linear Regression.