CHAPTER 06.03: LINEAR REGRESSION: Background: Part 2 of 2

 

So what that means is that the summation of the errors itself, this here, is not a good criterion. Not a good criterion. So somebody might say, hey, the reason why you think that it's not a good criterion is because you had these negative errors and positive errors, and the two positive errors simply cancelled out, and that's why you would . . . you don't it consider to be a good criterion.  So maybe somebody might suggest, hey, is simply adding the summation of the errors . . . the absolute value of the residuals equal to 0 a better criterion? Somebody might say, hey, if you go ahead and add all the absolute value of the residuals, because then you don't have to worry about the negatives cancelling with the positives. Is that a good criterion?  Now, if you take the same example, where we had y is equal to 4 x, minus 4, if you would go ahead and find out what the summation is of the absolute errors, absolute residuals, i is equal to 1 to 4, you will get 4, because rather than -2 cancelling with +2, you get 2 plus 2 is 4. Now, if you had this y is equal to 6, and you summed all the absolute value of the residuals, you will get 4 again.  Again, you have the same problem of not getting a unique line.  Now, somebody would tell me that, hey, who are you to tell us that the sum of the absolute value of the residuals is 4 is the minimum which you can get?  Go ahead and try it, you will find out that there is . . . no matter how hard you try, that no matter what straight line you choose, the sum of the absolute values of the residuals is never going to be less than 4, and that can be a good exercise for you to do. So again, we have a problem with the fact that we are not getting unique lines, and I'm already establishing for you as homework that go ahead and find out if you can find some other straight line for which the sum of the errors . . . absolute value of the residuals is less than 4, then my . . . my showing you of this is moot, it is not correct, but I can guarantee you that's what happens, that you won't get any other line for which the sum of the absolute value of the errors, or the residuals, is less than 4.  So again, we have the problem of not getting a unique line. So this is not a good criterion . . . this is not a good criterion to use.  Not a good criterion to use.  So I'm going to cross this off, it's not a good criterion to you. So what is a good criterion to you?  A good criterion to use is to not only sum the errors, but you sum, or sum the absolute value of the errors, but you square the errors, which of course takes care of it being negative and positive, it's always positive, the square of the error, and then you add those, and you minimize this. So you minimize, not the sum of the errors, you don't minimize the sum of the absolute value of the residuals, but you sum the square of the residuals, and that's what's going to be the criterion for your least squares regression.  And that's the end of this segment.