Linear Regression (Supervised Learning/Regression)

Linear regression shows the relationship between certain variables. The equation—assuming there is enough quality data—can help predict outcomes based on inputs.

Example: Suppose we have data on the number of hours spent studying for an exam and the grade. See Table 3-6.

Table 3-6.

Chart for hours of study and grades

Hours of StudyGrade Percentage
10.75
10.69
10.71
30.82
30.83
40.86
50.85
50.89
50.84
60.91
60.92
70.95

As you can see, the general relationship is positive (this describes the tendency where a higher grade is correlated with more hours of study). With the regression algorithm, we can plot a line that has the best fit (this is done by using a calculation called “least squares,” which minimizes the errors). See Figure 3-4.

../images/480660_1_En_3_Chapter/480660_1_En_3_Fig4_HTML.jpg
Figure 3-4.This is a plot of a linear regression model that is based on hours of study

From this, we get the following equation:

Grade = Number of hours of study × 0.03731 + 0.6889

Then, let’s suppose you study 4 hours for the exam . What will be your estimated grade? The equation tells us how:

0.838 = 4 × 0.03731 + 0.6889

How accurate is this? To help answer this question, we can use a calculation called R-squared. In our case, it is 0.9180 (this ranges from 0 to 1). The closer the value is to 1, the better the fit. So 0.9180 is quite high. It means that the hours of study explains 91.8% of the grade on the exam.

Now it’s true that this model is simplistic. To better reflect reality, you can add more variables to explain the grade on the exam—say the student’s attendance. When doing this, you will use something called multivariate regression .

Note

If the coefficient for a variable is quite small, then it might be a good idea to not include it in the model.

Sometimes data may not be in a straight line either, in which case the regression algorithm will not work . But you can use a more complex version, called polynomial regression.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *