In statistics, linear regression is favoured. Use it to measure correlation between single or more predictive and response variables. Simple linear regression in R is used to measure the association between single or multiple predicting and response variables. If you have many predictor variables, utilize multiple linear regression. Before diving into the specifics, let’s look at a real-world example of linear regression.

An actual case of linear regression

Businesses utilize linear regression to understand the revenue-to-advertising link. For example, businesses analyze linear regression models by comparing advertising expenditures to revenue. Thus, the linear regression models:


0 + 1 Equals (advertising spend)




0 = Total projected revenue (In the case when advertising spends is zero.)


1 coefficient = Change in revenue average (if the advertising spends increases by a single unit.)


Now, there are 3 cases for the 1 coefficient:


  1. When 1 is negative, highest advertising expenditures result in minimum revenue.


  1. Less advertising spend means less income impact.


  1. When 1 is positive, it signifies that greater advertising spend equals more revenue.


Thus, a corporation can quickly decide whether to boost or decrease advertising spends depending on the models’ 1 value.


R linear regression


Simple linear regression allows the user to examine the relationship between two quantitative variables. The independent variables are symbolized by x, whereas the dependent variables are denoted by y. Two variables are assumed to be linearly related. It predicts the response value (y) as a function or as an independent variable (x).


The dependent variable (salary) values for each independent variable are described below (Years experienced).


Given Salary Info


Experience Salary




1.3 46000.00


1.5 37000.00


2.0 43000.00


2.2 39000.00


2.9 56000.00


3.0 60000.00


3.2 54000.00


3.2 64000.00


3.7 57000.00


Here are variables for n observations (in this case n = 10):


x = [x 1, x 2,…, x n],


y = [y 1, y 2,…, y n]


Assume the following scatter plot:




(edit image from “geek for geek”)


Now you can find the line in the scatter plot. The regression line is the best-fit line and its equation is:


a + bx




If x is a feature,


y are the predicted response variables,


a = y-intercept


The slope is b.


To create predictive models, you must first analyze both a and b. Remember that if you know the values of both coefficients, you can forecast the values of the responsive models rapidly. Now we’ll use Least Squares. The least squares principle is used to identify the curve that best fits the data. Here’s how to compute a curve’s value:


y = f(x)


When x=x1, and y is y1, the expected value is f. (x1). The residue is also:


y1-f(x1) (2)


Other leftover values will be:


y2-f(x2) (3)




yn-f(xn) (4)


When evaluating residual values, we find that some residuals are positive and some are negative. Now we’ll look at the curve’s best-fit values at xi’s minimum. As stated previously, some residuals may be positive or negative, thus we must consider both to obtain a desirable result. So, to generate the best representative curve, we will use the formula:




Straight Line Least Square


Suppose you have n observations (x1, y1), (x2, y2), (x3, y3), (x2, yn). And you need a straight line.




a + bx




ei = yi-(axi+b)


Now the square sum for ei is:




Remember that E is a function of a and b. We need to compute a and b so that E is minimum and meets all the conditions:




This condition returns the values:




The two equations used to obtain a and b are called “normal equations”. So, E can be written as:




The syntax for linear regression analysis in R is:


(y model)


y is an object holding the expected dependent variables and the mathematical model formula. Then lm() returns the coefficient values of the provided models with no more statistical information.


R simple linear regression example


Visualizing the Practical set results:




(change image)


Visualizing the Test set results:






The most common predictive analysis is linear regression. It uses statistics to estimate the link between independent and dependent variables. In everyday life, linear regression is utilized in enterprises, medical research, agricultural research, data analysis, and other areas. This blog describes how to apply linear regression in R. It can help you learn more about linear regression. If you still have issues, let us know in the comments area below. Our professionals will respond to your query immediately. and finest r programming tasks from our specialists