CHAPTER 06.03: LINEAR REGRESSION: Derivation: Part 1 of 2   In this segment, we're going to talk about linear regression, and we're going to derive the expression for the linear regression model, so we're going to talk about derivation. Now, what regression is all about is that you are trying to best fit some given data. So let me write this down first, and then I will show you graphically.  So basically it is that you are given certain data points, let's suppose x1, y1, x2, y2, all the way up to xn, yn.  So you're given, let's suppose n data points, and what you want to be able to do is that you want to best fit a straight line to it, and that straight line, let's suppose we call it y is equal to a0, plus a1 x, that's the general form of a straight line, y is equal to a0, plus a1 x, where a0 is the intercept and a1 is the slope of that straight line.  So what we're trying to do is that somebody's giving us these data points, and what they want us to do is they want us to best fit a straight line to this data point . . . to these data points.  So let's suppose somebody says, hey, this is the straight line which you are drawing.  So the equation of the straight line which we have is y is equal to a0, plus a1 x, and this is, let's suppose, x1, comma, y1, and this is let's suppose some xi, comma, yi, some ith data point, let's suppose, and basically you are drawing all . . . showing all those n data points on this graph here, and then you are trying to draw a straight line to best fit those data points.  Eventually what you want to do is you want to minimize the amount of difference which you have between these observed values, which are the ones which are given to you, and what the straight line will predict, because you want to minimize the amount of difference between, but at the same time you have the differences at several points, at all the data points.  You'll have a difference here, you have a difference here, you've got a difference here, so everywhere there's a difference between what you are observing and what you are predicting.  So the cross is the one which you are predicting, and the dot is the one which you are observing.  And now this data point here, or not the data point, but the predicted value at that particular point will be simply a0 . . . it'll be simply a0, plus a1 times xi, because that's a straight line which you are going to find, and then if you put the value of x equal to xi, that's what you're going to get for the predicted value, and this is the observed value. So if you look at the amount of residual, that's what it's called, some people call it error, so there's a residual at each and every data point which you have, which will be the observed value, which is yi, minus the predicted value, which is the predicted value from the straight line, so you've got yi, minus a0, minus a1 xi.  That is the amount of residual which you are having at each data point.  Now, what you want to be able to do is you want to be able to somehow make these residuals to be small everywhere, and one of the criteria used is to simply take the summation of all the errors, square it, and then you want to add all those errors, the square of all the errors, and which is called the Sr, sometimes . . . and the Sr stands for the sum of the square of residuals. And that's what you want to try to make as small as possible, because eventually what the goal is to minimize the amount of residuals, and one of the criteria which is used is called the least squares method of finding the regression model.  And least squares means that you are squaring the errors, and you are trying to add all them, and you're trying to minimize them, and that's why it's called least squares regression. So that means that it's summation of yi, minus a0, minus a1 xi, squared, and that's what you are getting  from there, i is equal to 1 to n.  So you want to minimize the summation there.  Now, you can understand that when somebody's telling you to do regression, you have these as the observed values, and these are the . . . this is what you're predicting, and the only control which you have is on these constants of the regression model, so these are called the constants of the regression model. And depending on what kind of regression model you are drawing, in this case we are doing a linear regression, so a0 and a1 are the two constants of the regression model.  You want to be able to find those.  Those are something which are in your control, and you want to control them in such a fashion that this summation, this whole summation becomes as small as possible.  You cannot make it exactly equal 0.  If you make it exactly equal to 0, that means that it has to, the straight line has to go through all the data points, which is not going to be the case when you are doing regression.  So you want to be able to make this as small as possible. So what that means is that we have to derivatives with respect to a0 and a1, put those equal to 0, going back to your differential calculus class, to be able to minimize this  expression there, minimize the summation here.  So what I'll have to do is take the derivative of the sum of the square of the residuals with respect to a0, put that equal to 0, take the derivative Sr with respect to a1, and put that equal to 0. So that's what I will have to do, and once I put those equal to 0, I'll get two equations . . . two equations, and two unknowns, and once I get two equations, two unknowns, I'll be able to find what a0 and a1 are.