CHAPTER 06.05: ADEQUACY OF REGRESSION MODELS: Introduction

 

In this segment, we're going to talk about the adequacy of regression models. So for example, if somebody gives you data like this one, and says that, hey, this is experimental data of y versus x, and they want you to fit some kind of regression model to it, whether it is a straight line, a polynomial model, exponential model, so on and so forth.  So the question arises that once we have developed that model, how do we know that it's adequate?  Because, through this particular data which you are seeing here, I could simply regress it to a straight line, or I could regress it to a second-order polynomial, so many of these options available to me.  So for example, if I go to the next slide right here, here is the straight line regression which is being shown to be able to best fit the data which is given to you, and it looks like to be a reasonable fit, but although some people might argue that this more looks like a parabolic curve, as opposed to a straight line, but you can very well see that the straight line regression model itself is very close to the data points themselves. Now, we could go one more step and say, hey, I'm going to use a second-order polynomial rather than a first-order polynomial to do the regression.  So this second-order polynomial regression which you are seeing here is basically y is equal to a0, plus a1 T, plus a2 T squared, that's what it is . . . maybe not T, we are talking about x here, so it's a0, plus a1 x, plus a2 x squared, so that's what we get as the fit here. So if we look at this particular data here again, and show the first-order curve, which is being shown by the red line, and the second-order polynomial, which is being shown by the blue line, what we are trying to figure out now is that which model should we be choosing?  On what basis should we be choosing the model itself, and also figuring out that once we have chosen the model, based on the physics of the problem, or otherwise, how do we know that particular model is adequate?  So there are two questions which we have to answer.  The first thing which we have to answer is does the model which we have chosen, does it describe the data adequately?  That means that is there an adequate fit? The second thing which we have to understand is that how does the model predict the response variable predictably?  So what that means is that if we are given a certain value of x, let's suppose we are given y versus x data, and we have already done the regression modeling, now, how well does y get predicted once we have chosen a certain value of x. So that's what we need to understand about the quality of the fitted model. So we're going to limit our discussion to adequacy of straight line regression models only in these particular segments.  The reason being that because there is an opinion that if you are talking about nonlinear regression models that the kind of things which you should be looking at are different from what we are looking in straight line regression models, but if we understand some of our discussion on the straight line regression models, you'll find out that if you have to later on talk about nonlinear models, and see whether they are adequate, you'll be able to use some of the principles, but at the same time, you'll have to figure out whether those principles are valid also for nonlinear models.  So there are four checks which we're going to talk about, but keep in mind that this is not what adequacy of regression model, or this is not that what adequacy of regression model is limited to these four checks, but if you make these, at least these four checks, you are in much better shape than most other people who look at regression models and try to understand whether they're adequate.  Most people, they just simply use the calculation of the coefficient of determination to figure out whether a particular model is adequate, but that's far from the truth.  So what we're going to do is we're going to look at plotting the data and the model, then we're going to find the standard error of estimate, then we're going to calculate the coefficient of determination, and then also we're going to make several checks under this umbrella, whether the model is meeting the assumption of random errors, that the errors which you have in the data is random, as opposed to it being having some kind of a form to is.  So when we do these four checks, we will be able to figure out whether a particular model is adequate or not. So the example which we're going to take in the next segment is of this data which is given to us, so here you are given alpha, which expansion coefficient, as a function of temperature for six data points.  We are only choosing six data points from 22 data points which are given to you to keep the example simple. So in the first three checks which we talked about, we're going to only use the six data points from . . . from the data, but then we go to the check number 4, which is about understanding about random errors, we're going to take 22 data points.  So what I am trying to say is that this is only being done in order to keep things simple.  It would be nice to take all the 22 data points and show you the calculations, but those are going to be lengthy, so bear with me when I'm saying that, hey, we're going to take these six data points, these six data points right here, and we are going to regress it to a straight line, this is something which you have learned in the previous segments, how to regress that particular data to a straight line, and what we're going to concentrate on is that whether this particular model which we're going to draw through these six data points, it is adequate or not?  And that's the end of this segment.