previous lesson

Pearson's r

INTERVAL DATA: Association & Inference

Pearson's r: r = N(XY) - X(Y)
-----------------[N(X2) - (X)2 ][N(Y2) - (Y)2 ]

Where: X = one variable

Y = other variable

N = total in sample

Assumptions: linear relationship; homoscedasticity

Example:
PersonHeight X Income Y
A1010
B89
C67
D32

Q: What is the association between height and income?

A: First check assumptions by making a scatter diagram.


This is a linear relationship, so it also meets the assumption for homoscedasticity.
PersonHeight X X2Income Y Y2XY
A10100 10100100
B864 98172
C636 74942
D39 246
N=427209 28234220

r = 4(220) - 27(20) = ----------------------------124 =  --------------------- .97
---[4(209) - (27)2 ][4(234) - (20)2 ] ----------[107][152]

Interpretation: Use the scale: "There is a ______ association between (variable 1) and (variable 2)."

For r2: Convert to a percent and include in the statement "_____% of the variance in (variable 1) can be explained by (variable 2)." or vice versa. For 1 - r2: Convert to a percent and include in the statement, "_____% of the variance in (variable 1) cannot be explained by (variable 2)." or vice versa.

Interpretation of the example above:

Large positive relationship between height and income.

94% of variance in height can be explained by income.

6% of variance in height cannot be explained by income.

94% reduction in error when predicting height from income.

B. Test of significance for r and partial r

Test of significance for r:

1. Compute r and |r|.

2. d.f. = N - 2

3. Look up p-value of |r| in table in appendix.

Assumptions: 1) linear relationship

2) homoscedasticity

3) normal distribution of both variables in the whole population. Assume a normal distribution in this class because computation of it is beyond the scope of this course.

Example:
PersonHeight X Income Y
A1010
B89
C67
D32

Q: Can this association be generalized to the whole population?

A: r = .97; d.f. = N - 2 = 4 - 2 = 2; |r| = .97

4. Check to see if this is a one-tailed or a two-tailed test. In this question there is no hypothesis, so do the two-tailed test. The table looks something like this:

Two-tailed test

d.f. p >.05 p = .05 ----------p = .01

1 : :

2 ----------------.950 --------------.990

3 : :

r =.97 So yes, one can generalize this association to the whole population.

C. Multivariate Association : Partial r

Partial r (r 12.3) = r12 - (r13)(r23)

(divided by)______________________________
-----------------------[1 - (r13)2 ][1 - (r23)2 ]

Where:
r12 = Pearson's r for variables 1 & 2.

r13 = " r " " 1 & 3.

r23 = " r " " 2 & 3.

Assumptions: linear relationship; homoscedasticity

Example:
PersonEducation #1 Income #2Age #3
A1010 5
B89 4
C67 3
D32 2
E1 1

Q: What is the association between education and income with age held constant?

A: Variable #3 will always be the one held constant. Call education variable #1 and income variable #2. Make scatter diagrams to check if assumptions are met. (In this example it is a stretch

to imagine a linear relationships, but proceed as if

they did.)

Recall: r = N(XY) - X(Y)

(divided by)___________________________________________
-----------[N(X2) - (X)2 ][N(Y2) - (Y)2 ]

Education (#1) Income (#2)
PersonEducation #1 X2Income #2 Y2XY
A10100 1110
B525 3915
C416 41616
D24 52510
E11 1010010
N=522146 2315161

r12 = 5(61) - 22(23) = -.85
-----[5(146) - (22)2 ][5(151) - (23)2 ]
PersonEducation #1 X2Age#3 Y2XY
A10100 52550
B525 41620
C416 3912
D24 244
E11 111
N=522146 155587

r13 = 5(87) - 22(15) = .95
------[5(146) - (22)2 ][5(55) - (15)2 ]

Income (#2) Age (#3)
PersonIncome #2 X2Age #3 Y2XY
A11 5255
B39 41612
C416 3912
D525 2410
E10100 1110
N=523151 155549

r23 = 5(49) - 23(15) --------------------------------r23 = -.94
------[5(151) - (23)2 ][5(55) - (15)2 ]

r12.3 = (-.85) - (+.95)(-.94) =
[1 - (+.95)2 ][1 - (-.94)2 ]

(-.85) - (-.893)
[1 - (+.9025)][1 - (+.8836)]

r12.3 = +.043 =
[.0975][.1164]

043 =
.011349

.043 = + .40
.1065

Interpretation: same as for r but add 3rd variable which is held constant.

Example: r12.3 = .40, r12.32 = (.40)2 = .16, 1 - r12.32 = 1 - .16 = .84

Moderate association between education and income with age held constant.

16% of variance in education can be explained by income (or vice versa) with age held constant.

94% of variance in education cannot be explained by income (or vice versa) with age held constant.

16% reduction in error when predicting education from income (or vice versa) with age held constant.

table of contents

homework

next lesson