previous lesson

Introduction to Statistics

TERMS / NOTATION

Frequency Distribution Table: AIDS cases by exposure category and sex reported July 1993 through June 1994 in the United States

EXPOSURE CATEGORY
MALE f
RELATIVE PROPOR-

TION
CUM.

PROPOR-

TION
FEMALE f
RELATIVE PROPOR-

TION
CUM.

PROPOR-

TION
GAY OR LESBIAN SEXUAL RELATIONS 42,146
.602
.603
0
.000
.000
INJECTING DRUG USE 17,441
.249
.852
6,138
.429
.429
HEMOPHILIA/COAGULATION DISORDER 586
.008
.860
17
.001
.430
HETEROSEXUAL CONTACT 2,838
.041
.901
5,457
.381
.811
RECEIPT OF BLOOD TRANSFUSION, BLOOD COMP., OR TISSUE 498
.007
.908
375
.026
.837
RISK NOT REPORTED OR IDENTIFIED 6,438
.092
1.000
2,322
.162
.999
TOTAL 69,955
14,309

SOURCE: HIV/AIDS Surveillance Report , Vol. 6:1.

Frequency distribution table: a summary of univariate samples

frequency (f): Number of cases that fall into a certain delineated category.

variable: something which varies (i.e. belief in god, age, gender, etc.)

categories: the subsets the variable varies between (e.g. gender categories are masculine and feminine).

total (N): Total number of cases in sample.

relative proportion: each proportion of variable category to the total number in the sample. Relative proportion equals f/N.

cumulative proportion: the sum of the relative proportion of current variable category and all preceding categories.

Crosstabulation Table: SEX BY BELIEF IN GOD

VARIABLE

BELIEF IN GOD


SEX
DON'T BELIEVE BELIEVE IN HIGHER POWER DO BELIEVE
MALE 79 91 463 633
FEMALE 28 83 753 864
107 174 1216 1407

VARIABLE

CATEGORIES

ROW SUBTOTALS

SAMPLE TOTAL

CATEGORIES SUBTOTALS SOURCE: GSS91 SURVEY SUBSAMPLE

Crosstabulation table (also contingency table): A summary of the relationship of 2 or more variables.

data: detailed information of any kind.

cell: Indicated by the shaded section. Each cell contains the number of cases that are both described by the category delineated to its left and the category delineated above. In our shaded example, 79 cases are both male and do not believe in a god.

subtotals: (n): Total number of cases in particular row or column.

BELIEF IN GOD

SEX

ROW%

COL%

DON'T BELIEVE
BELIEVE IN HIGHER POWER
DO BELIEVE
MALE
79

12.5

73.8
91

14.4

52.3
463

73.1

38
633

42.3
FEMALE
28

3.2

26.1
83

9.6

47.7
753

87.2

61.9
864

57.7
107

7.2
174

11.6
1216

81.2
1497

row percentages: Frequency divided by row total. This shows the proportion of the cases in the row category that are the column category. In the example above 12.5% of the males and 3.2% of the females do not believe in a god.

column percentages: Frequency divided by column total. This shows the proportion of the cases in the column category that are the row category. In the example above 73.8% of those that do not believe in a god are male.

raw data: "Raw" means nothing has been done to it yet, such as the following case listing

PROCESSING RAW DATA

Example:

PersonAge SexMarital Status
Joe21M M (married)
Ann13F S (single)
Sue72F M
Bill54M D (divorced)
Sam18M M
Kay12F S

1. Raw data comes from questions on questionnaires.

a) Open-ended questions -- allows respondent to write an answer to the questions.

b) Close-ended questions -- gives respondent choices to indicate answers

Example: sex (circle one) M F.

2. Processing raw data

a) Computers are extremely helpful when processing a great deal of data.

b) Processing by hand -- individual must make frequency distributionsand/or tables.

Constructing a frequency distribution table (using raw data above)

Sex f%cum %
M350.0 50.0
F350.0 100.0
6 = N

Constructing a crosstabulation table

Examine the data, each variable will have a number of categories. Count the categories and construct a table using a grid large enough for all the categories. If the data is nominal, the placement of a variable on the top (columns) or side (rows) is arbitrary (although it makes sense to put the larger number of categories across the top). For the other levels of measurement usually the independent, or the higher level of measurement if there is no prediction, goes on top.

Joe: male, married table:

Marital Status
GenderM SD
Ml
F


Competed table:

Marital Status
GenderM SDrow total
M20 13
F12 03____
column total32 16 = N


LEVEL OF MEASUREMENT (Types of data)

Nominal

Ordinal

Interval/ratio

*The requirement of a true zero point is the difference between interval and ratio. This distinction is not necessary in basic statistics.

Examples:

(nominal)(ordinal)(interval)
"Region of residence" "Social Class""Height"
Nupper upper6'2"
Slower upper6'1"
Nupper middle6'1"
Elower middle6'0"
Wupper lower5'11"

*For both ordinal and interval data, the categories must have order, but do not necessarily have to be in ordered form.

PURPOSE OF STATISTIC

A. Description ("summarizing")

1. univariate distribution ("one variable")

B. Multivariate distribution (several variables) ("relationship," "association", "correlation")

Example: (bivariate = 2 variables) relationship between education and income.

C. Inference ("generalize from sample to population")

1. univariate ("setting confidence intervals")

2. one bivariate sample ( 2 variables) ("testing the significance of association")

  1. two or more univariate ("testing the significance of differences")

USING THE CHART

Example: "Ever had sex with someone other than the person you were married to?

SexYesNo
M106369 475
F89613 702
195982 1,177

GSS91 survey subsample