**Prediction **

**INTERVAL DATA -- Bivariate Distribution**

A. A review of predictive techniques:

Income (Y) | ||

Education (X) | Hi |
Lo |

Hi | 14 | 2 |

Lo | 6 | 11 |

Here one can easily see a positive relationship:

hi education--hi income

low education--low income

It is easy to predict using this table: Say a person has a low
income, one would predict a low level of education.

Data in different form:

Person | Education X
| Income Y |

A | 10 | 9 |

B | 8 | 8 |

C | 7 | 6 |

PREDICT | 6 |
5 |

D | 5 | 4 |

E | 2 | 2 |

Using this information: If a person has education 8, predict an
income of 8. If a person has income 5, although there is no income
of 5, still predict that the person's education would be about
6.

A scatter diagram can be made to represent the relationship between
the two variables: X = Education, Y = Income.

A straight line approximately describes the relationship between
X and Y. One can use it to predict: Say a person has an income
of 5, one would predict an education of about 6.

B. Equation of a Line:** Y = a + bX
**

**Where: X and Y are variables using for prediction**

**a = "Y - intercept" **

**b = slope of line **

To get the slope, take any point not on the line and measure its distance from the line. The vertical line's distance is "P" and the horizontal line's distance is "Q." The slope equals P / Q. In this example,

**a = 2, b = 2 / 2 = 1**,** **so the equation of this line
is:

**Y = 2 + 1X
**

C. Predicting Y from a knowledge of X:** Y
= a_{yx} + b_{yx}X
**

**Where: b_{yx} = N( XY)
- ( X)( Y)**----------------

**a_{yx} = Y - (byx)(X)
**------------

D. Predicting X from a knowledge of Y:** X = axy + bxyY
**

**Where: b_{yx} = N( XY)
- ( X)( Y)**----------------

**a_{yx} = X - (b_{yx}
)(Y)**-----------

**Assumptions:**

**interval data **

**linear relationship**

**homoscedasticity**

There is a linear relationship as long as the shape of the data
scatter has some oblong shape. A circle, a S, etc. is __not__
a linear. Homoscedasticity: similar variance in columns and rows.
If it fits linear a relationship, it will also be homoscedastic.

**Example: Q:** Predict the income of a person who is 9 feet
tall.

Before using the formulas, try predicting by observation; one
would expect an answer of about $9.50.

Person | Height X
| Income Y |

A | 10 | 10 |

B | 8 | 9 |

C | 6 | 7 |

D | 3 | 2 |

**A:** X = height Y = income

Person | Height X
| X^{2} | Income Y
| XY |

A | 10 | 100 | 10 | 100 |

B | 8 | 64 | 9 | 72 |

C | 6 | 36 | 7 | 42 |

D | 3 | 9
| 2 | 6 |

N=4 | 27 | 209 | 28 | 220 |

b_{yx} = [4(220) - 27(28)] / [4(209)
- (27)^{2}] = 1.16

a_{yx}_{ }= [28 - 1.16(27)]
/ 4 = -.83

Y = -.83 + 1.16X

This is the straight line which describes the data. Now use the
equation to solve for X = 9:

Y = -.83 + 1.16(9)

Y = -.83 + 10.44

Y = $9.61