previous lesson
Introduction to Statistics
TERMS / NOTATION
Frequency Distribution Table: AIDS
cases by exposure category and sex reported July 1993 through
June 1994 in the United States
EXPOSURE CATEGORY
| MALE f
| RELATIVE PROPOR-
TION
| CUM.
PROPOR-
TION
| FEMALE f
| RELATIVE PROPOR-
TION
| CUM.
PROPOR-
TION
|
GAY OR LESBIAN SEXUAL RELATIONS
| 42,146
| .602
| .603
| 0
| .000
| .000
|
INJECTING DRUG USE
| 17,441
| .249
| .852
| 6,138
| .429
| .429
|
HEMOPHILIA/COAGULATION DISORDER
| 586
| .008
| .860
| 17
| .001
| .430
|
HETEROSEXUAL CONTACT
| 2,838
| .041
| .901
| 5,457
| .381
| .811
|
RECEIPT OF BLOOD TRANSFUSION, BLOOD COMP., OR TISSUE
| 498
| .007
| .908
| 375
| .026
| .837
|
RISK NOT REPORTED OR IDENTIFIED
| 6,438
| .092
| 1.000
| 2,322
| .162
| .999
|
TOTAL
| 69,955
| | | 14,309
| | |
SOURCE: HIV/AIDS Surveillance
Report , Vol. 6:1.
Frequency distribution table: a
summary of univariate samples
frequency (f): Number of cases
that fall into a certain delineated category.
variable: something which varies
(i.e. belief in god, age, gender, etc.)
categories: the subsets the variable
varies between (e.g. gender categories are masculine and
feminine).
total (N): Total number of cases
in sample.
relative proportion: each proportion
of variable category to the total number in the sample. Relative
proportion equals f/N.
cumulative proportion: the sum
of the relative proportion of current variable category
and all preceding categories.
Crosstabulation Table: SEX BY BELIEF
IN GOD
VARIABLE
BELIEF IN GOD
VARIABLE
CATEGORIES
ROW SUBTOTALS
SAMPLE TOTAL
CATEGORIES SUBTOTALS
SOURCE: GSS91 SURVEY SUBSAMPLE
Crosstabulation table (also contingency
table): A summary of the relationship of 2 or more variables.
data: detailed information of any
kind.
cell: Indicated by the shaded section.
Each cell contains the number of cases that are both described
by the category delineated to its left and the category delineated
above. In our shaded example, 79 cases are both male and do not
believe in a god.
subtotals: (n): Total number of
cases in particular row or column.
BELIEF IN GOD
row percentages: Frequency divided
by row total. This shows the proportion of the cases in the row
category that are the column category. In the example above 12.5%
of the males and 3.2% of the females do not believe in a god.
column percentages: Frequency divided
by column total. This shows the proportion of the cases in the
column category that are the row category. In the example above
73.8% of those that do not believe in a god are male.
raw data: "Raw" means
nothing has been done to it yet, such as the following case listing
PROCESSING RAW DATA
Example:
Person | Age |
Sex | Marital Status
|
Joe | 21 | M
| M (married) |
Ann | 13 | F
| S (single) |
Sue | 72 | F
| M |
Bill | 54 | M
| D (divorced) |
Sam | 18 | M
| M |
Kay | 12 | F
| S |
1. Raw data comes from questions on questionnaires.
a) Open-ended questions -- allows respondent
to write an answer to the questions.
b) Close-ended questions -- gives respondent
choices to indicate answers
Example: sex (circle one) M F.
2. Processing raw data
a) Computers are extremely helpful when
processing a great deal of data.
b) Processing by hand -- individual must
make frequency distributionsand/or
tables.
Constructing a frequency distribution
table (using raw data above)
Sex |
f | % | cum %
|
M | 3 | 50.0
| 50.0 |
F | 3 | 50.0
| 100.0 |
| 6 = N |
| |
Constructing a crosstabulation table
Examine the data, each variable will have
a number of categories. Count the categories and construct a table
using a grid large enough for all the categories. If the data
is nominal, the placement of a variable on the top (columns) or
side (rows) is arbitrary (although it makes sense to put the larger
number of categories across the top). For the other levels of
measurement usually the independent, or the higher level of measurement
if there is no prediction, goes on top.
Joe: male, married table:
| Marital Status
| | |
Gender | M
| S | D |
M | l | |
|
F | | |
|
Competed table:
| Marital Status
| | |
Gender | M
| S | D | row total
|
M | 2 | 0 |
1 | 3 |
F | 1 | 2
| 0 | 3____ |
column total | 3 | 2
| 1 | 6 = N |
LEVEL OF MEASUREMENT
(Types of data)
Nominal
Ordinal
Interval/ratio
*The requirement of
a true zero point is the difference between interval and ratio.
This distinction is not necessary in basic statistics.
Examples:
(nominal) | (ordinal) | (interval)
|
"Region of residence" |
"Social Class" | "Height"
|
N | upper upper | 6'2"
|
S | lower upper | 6'1"
|
N | upper middle | 6'1"
|
E | lower middle | 6'0"
|
W | upper lower | 5'11"
|
*For both ordinal and
interval data, the categories must have order, but do not necessarily
have to be in ordered form.
PURPOSE OF STATISTIC
A. Description ("summarizing")
1. univariate distribution ("one
variable")
B. Multivariate distribution (several
variables) ("relationship," "association",
"correlation")
Example: (bivariate = 2 variables)
relationship between education and income.
C. Inference ("generalize from sample
to population")
1. univariate ("setting confidence
intervals")
2. one bivariate sample ( 2 variables)
("testing the significance of association")
- two or more univariate ("testing
the significance of differences")
USING THE CHART
Example: "Ever had sex with someone other than the
person you were married to?
Sex | Yes | No
| |
M | 106 | 369
| 475 |
F | 89 | 613
| 702 |
| 195 | 982
| 1,177 |
GSS91 survey subsample
- Look at the data - see what level type it is.
- Look at the question asked, see the chart
for type of statistic to use.
- Different statistics have different purposes and assumptions.
It is important that the correct statistics for the type of data
are chosen.
- For these data the statistic Phi would be ideal to measure
relationship.