Psychology 2113, Research
Methods I: Statistics
Welcome to Psychology 2113
Statistics in Research
Welcome to Psychology 2113
About the course
Where does it fit?
2113 is a prerequisite
for most upper division courses
Take 3114, Research
Methods II: Applications and Experimental Design, the next semester after 2113
3114 (Experimental) is a
prerequisite for capstone and most 4000 level courses
Take 3003, Advanced
Undergraduate Statistics, if you are planning to go to graduate school
Outline
Grades
Academic misconduct
Welcome..., cont.
Questions about the
course?
Assistants
Graduate assistant
Undergraduate volunteers
Labs/Times/Colors
T 10:30-11:20 (Section
12): Pink
T 11:30-12:20 (Section
11): Yellow
W 12:30-1:20 (Section
14): Blue
W 1:30-2:20 (Section
13): Green
Textbook errors
Statistics in Research
Hamstring stretch
example
four groups of runners,
randomly assigned to groups
groups differed by
amount of time holding the stretch
interested in
flexibility after six weeks of stretching
What do we want to know?
The effect on
flexibility due to time holding the stretch
The best time to hold
the stretch
Results:
control=15<30=60
Statistics in Research, cont.
Another example:
hamstring stretch, but we simply measure a group of runners on the length of
time they hold their stretch and on their flexibility.
Statistics in Research, cont.
Other examples
Trait
aggressiveness and hockey penalties
Piano
lessons and math performance
Age and
music purchases
Teenagers
and gambling
Violent
video games and risk of heart trouble
Statistics in Research, cont.
Another example
Delay of gratification
and SAT scores
4.5 year old children
rewards visibly exposed
(E) vs. obscured (O)
coping ideas (I) vs.
none (N)
four groups: EI, EN, OI,
ON
delay time in seconds
Statistics in Research, Cont.
Delay of gratification
and SAT scores
Results:
EI 517s, EN 365s, OI 585s, ON 590s
Psychology 2113
Overview of Statistics
Preview of Inferential Statistics
Research and Research Design
Overview of Statistics
Population
Target group for inference
Examples
Parameter: numerical
characteristic of population. Example: population mean is m (mu).
Sample
Subgroup of the
population
Examples
Statistic: numerical
characteristic of sample. Example: sample mean is X (X-bar).
Overview..., cont.
Sampling
Process of selecting
sample from population
Random sampling
Independent selection
As contrasted to
evenness
Descriptive vs.
Inferential Statistics
Descriptive: primary
purpose is to describe some aspect of the data
Inferential: primary
purpose is to infer (to estimate or to make a decision, test a hypothesis)
Preview of Inferential Statistics
All
inferential statistics have the following in common:
Use of Some Descriptive Statistic
(Remember
)
Use of Probability
Coin toss example
Fair coin? p(Head)=.5
10 tosses, ten Heads,
p(10 H|fair coin)=1/1024
=.00098,
Reject idea (hypothesis)
of fair coin
10 tosses, six heads,
p(6 or more H|fair coin)
=386/1024=.377,
Retain hypothesis of
fair coin
Potential for Estimation
(Remember
)
Sampling Variability and Sampling
Distributions
If
we do the delay of gratification study again, we likely wont get 365 for the
mean of the EN group. The value of the new mean might be close, like 355. A
third study might yield 375, etc.
Use of a Theoretical Distribution
Two Hypotheses, Two Decisions, Two Types of
Error
Two hypotheses
Coin is fair, or coin is
not fair
Kids in the EN group
score lower, or they dont
30s hamstring stretch is
better than 15s, or it isnt
Two decisions
Reject or retain a
hypothesis
Two types of error
Reject a true
hypothesis, retain a false hypothesis
Preview of Inferential Statistics
All inferential
statistics have the following in common:
use of some descriptive
statistic
use of probability
potential for estimation
sampling variability
sampling distributions
use of a theoretical
distribution
two hypotheses, two
decisions, two types of error
Research and Research Design
Research defined
Structured Problem
Solving
Comparative vs. Absolute
Scientific methods:
steps (cyclic)
1. encounter and
identify problem
2. formulate hypotheses,
define variables
3. think through
consequences of hypotheses
4. design & run
study, collect data, compute statistics, test hypotheses
5. draw conclusions
Research: Variables
Variable: entity that is
free to take on different values
Independent variable (IV): its values are manipulated by the
researcher, comes first in time
Dependent variable (DV): measured by researcher, follows the IV
in time
Extraneous variable (EV): controlled by researcher
randomization of
subjects to groups
keep all subjects
constant on EV
include EV in the design
of the experiment
Research: Variables
Variable, continued:
Predictor variable (PV): comes first in time but there is no
manipulation, analogous to IV.
Criterion variable (CV): follows PV in time, analogous to DV.
Research: Examples
Example: first hamstring
stretch study.
IV was time holding
stretch: control, 15s, 30s, 60s.
DV was flexibility.
EV s were age, gender,
etc.
Example: delay of
gratification study.
IV s were rewards
(exposed, obscured) and coping ideas (ideas, none).
DV was delay time.
EV was age (controlled
by ).
Research: Examples
Example: second
hamstring stretch study.
PV was time holding
stretch: 12s, 30s, 25s, etc.
CV was flexibility: 47,
34, etc.
EV s were age, gender,
etc.
Operational definition.
The type that is
assigned to a variable depends on how it is defined and used in any given
research project.
Some variables can be
any one of the five types, depending on how they were used in the research.
Anxiety example from
book.
Research: Relationships
Types of relationships
Causal relationship: IV
causes the DV
e.g. different times
holding a hamstring stretch causes differences in flexibility
key is manipulation of
IV
Predictive relationship:
PV predicts the CV
e.g. different times
holding a hamstring stretch merely predicts differences in flexibility
key is no
manipulation
Types of Research
Types of research
True experiment
manipulation of IV
randomization of
subjects to groups
causal relationship between
IV and DV
Observational research
no manipulation
minimal control of EV
predictive relationship
between PV and CV
Data: quantitative and qualitative (end 2)
Psychology
2113
Pictorial Descriptions
Stem and Leaf Display
Pictorial Descriptions
Frequency distribution
Stem and leaf display
Bar graph and histogram
Frequency polygon
Pie chart
Scatterplot
Skewness and kurtosis
Stem and Leaf Display
The first digit(s) of a
score form the stem, the last digit(s) form the leaf
An age of 38 could be
shown as
Stem and Leaf Display, cont.
We want 10-20 total
number of stems.
Number of stems per
digit depends on total number of stems: can do 1, 2, or 5 stems per digit.
Description With Statistics
Aspects or
characteristics of data that we can describe are
Middle
Spread
Skewness
Kurtosis
Statistics that
measure/describe
middle are mean, median,
mode
spread are range,
variance, standard deviation, midrange
Description With Statistics
Middle = central
tendency, location, center
Measures of middle are mean,
median, mode (keywords)
Spread = variability,
dispersion
Measures of spread are range,
variance, standard deviation, midrange (keywords)
Skewness = departure
from symmetry
Positive skewness = tail
(extreme scores) in positive direction
Negative skewness = tail
(extreme scores) in negative direction
Kurtosis = peakedness
relative to normal curve
Skewness
Skewness
Kurtosis
Description With Statistics
Another name for middle
is
Psychology 2113
Describing Middle
Describing Spread
Describing the Middle of Data
Another name for
middle is
_________.
Middle is the aspect
of data
we want to describe.
We describe/measure the
middle of data in a sample with the statistics:
Mean.
Median.
Mode.
We describe/measure the
middle of data in a population with the parameter m (mu); we usually dont know m, so we estimate it with X-bar.
Sample Mean
The sample mean is the
sum of the scores divided by the number of scores, and is symbolized by X-bar,
X = SX/N
For example, for X1=4,
X2=1, X3=7, N=3, SX=12 and X = SX/N = 12/3 = 4
Characteristics:
X-bar is the balance
point
Sample Median
The median is the middle
of the ordered scores, and is
symbolized as X50.
Median position
(as distinct from the median itself) is (N+1)/2 and is used to find the median.
Example: X1=4,
X2=1, X3=7, then N=3.
Median position is
(3+1)/2 = 4/2 = 2.
Place the scores in
order, 1 4 7.
X50 is the
score in position/rank 2.
So X50 = 4.
Sample Median, cont.
Another example: X1=4,
X2=1, X3=7, X4=6, and N=4.
Median position is
(N+1)/2 = (4+1)/2 = 5/2 = 2.5.
Place the scores in
order, 1 4 6 7.
X50 is the
score in position/rank 2.5.
So X50 =
(4+6)/2 = 10/2 = 5.
Characteristics:
Depends on only one or
two middle values.
For quantitative data
when distribution is skewed.
Minimizes S|X-X50|.
Sample Mode
The mode is the most
frequent score.
Examples:
1 1 4 7, the mode is 1.
1 1 4 7 7, there are two
modes, 1 and 7.
1 4 7, there is no mode.
Characteristics:
Has problems: more than
one, or none; maybe not in the middle; little info re data.
Best for qualitative
data, e.g. gender.
If it exists, it is
always one of the scores.
Is rarely if ever used.
Describing the Spread of Data
Another name for
spread is _________.
Spread is the aspect
of data we want to describe.
Any statistic that
describes/measures spread should have these characteristics: it should
Equal zero when the
spread is zero.
Increase as spread
increases.
Measure just spread, not
middle.
Describing the Spread of Data, cont.
We describe/measure the
spread of data in a sample with the statistics:
Range = high score-low
score.
Midrange, MR.
Sample variance, s*².
Sample standard
deviation, s*.
Unbiased variance
estimate, s².
s.
We describe/measure the
spread of data in a population with the parameter s ( sigma) or s²; we usually dont know s or s², so we estimate them with one of the statistics.
Midrange (MR)
Formula is MR=UH-LH.
UH=upper hinge
LH=lower hinge
Hinges cut off 25% of
the data in each tail
Hinge position is
([median position]+1)/2.
[median position] is the
whole number part of the median position (remember, median pos.=(N+1)/2)
Use hinge position to
count in from the tails to find the hinges.
Midrange (MR), cont.
Example: 4 1 5 3 3 6 1 2
6 4 5 3 4 1, N=14
Arrange data in order: 1
1 1 2 3 3 3 4 4 4 5 5 6 6
Compute median position
= (N+1)/2=(14+1)/2=15/2=7.5
Compute hinge position =
([median position]+1)/2=(7+1)/2=8/2=4
Count in to the 4th
score from each tail to find UH and LH
UH=5 and LH=2
MR=UH-LH=5-2=3
Sample Variance, s*²
Definitional formula:
s*²=S(X-X)²/N, the average squared deviation from X-bar.
Example: 1 2 3
N=3, X = SX/N=6/3=2
S(X-X)² =
(1-2)²+(2-2)²+(3-2)²=1+0+1=2
s*²=2/3=.6667
Computational formula:
s*²=[NSX²-(SX)²]/N²
SX² = 1²+2²+3²=1+4+9=14, SX=6, N=3
s*²=[3(14)-(6)²]/3²=[42-36]/9=6/9=2/3=.6667
s*² is in squared units
of measure.
Sample Standard Deviation, s*
Formula: s*= Φs*²
Example: 1 2 3
N=3, X = SX/N=6/3=2
S(X-X)² = (1-2)²+(2-2)²+(3-2)²=1+0+1=2
s*²=2/3=.6667
s*= Φ.6667=.8165
s* is in original units
of measure.
Unbiased Variance Estimate, s²
Definitional formula:
s²=S(X-X)²/(N-1)
Example: 1 2 3
N=3, X = SX/N=6/3=2
S(X-X)² =
(1-2)²+(2-2)²+(3-2)²=1+0+1=2
s²=2/2=1.0
Computational formula:
s²=[NSX²-(SX)²]/[N(N-1)]
SX² = 1²+2²+3²=1+4+9=14, SX=6, N=3
s²=[3(14)-(6)²]/[3(2)]=[42-36]/6=6/6=1.0
s² is in squared units
of measure
s
Formula: s= Φs²
Example: 1 2 3
N=3, X = SX/N=6/3=2
S(X-X)² =
(1-2)²+(2-2)²+(3-2)²=1+0+1=2
s²=1.0
s= Φ1=1.0
s is in original units
of measure.
s has no official name,
it is the square root of the unbiased variance estimate, s².
SAS
Box-plots
A pictorial description
that uses a box to show the middle of the data and lines called whiskers to
show the tails of a distribution.
Box
Upper end is at the UH,
lower end is at the LH
Line across the
middle is X50
Example: 4 1 5 3 3 6 1 2
6 4 5 3 4 1, N=14
Arrange data in order: 1
1 1 2 3 3 3 4 4 4 5 5 6 6
Compute median position
= (N+1)/2=(14+1)/2=15/2=7.5
Median, X50,
is (3+4)/2=7/2=3.5
Box-plots, cont.
Compute hinge position =
([median position]+1)/2=(7+1)/2=8/2=4
Count in to the 4th
score from each tail to find UH and LH, 1 1 1 2 3 3 3 4 4 4 5 5 6 6
UH=5 and LH=2
Draw the box.
Box-plots, cont.
Whiskers are lines drawn
from the ends of the box (the hinges) to adjacent values, UAV & LAV.
Adjacent values are the
first real data values inside the inner fences.
Inner fences, upper and
lower
Upper, UIF=UH+1.5MR
Lower, LIF= LH-1.5MR
Example: MR=UH-LH=5-2=3
UIF=UH+1.5MR=5+1.5(3)=9.5
LIF=LH+1.5MR=2-1.5(3)=-2.5
1 1 1 2 3 3 3 4 4 4 5 5
6 6
UAV=6
LAV=1
SAS
Box-plots, cont.
Another example: 1 1 1 2
3 3 3 4 4 4 4 6 9 10 11
N=15, median
position=(N+1)/2=(15+1)/2=16/2=8.
Hinge position=([median
position]+1)/2=(8+1)/2=4.5.
Hinges, X50,
and midrange:
UH=(4+6)/2=10/2=5
LH=(2+3)/2=5/2=2.5
X50=4
MR=UH-LH=5-2.5=2.5
Inner fences:
UIF=UH+1.5MR=5+1.5(2.5)=5+3.75=8.75
LIF=
LH-1.5MR=2.5-1.5(2.5)=2.5-3.75=-1.25
1 1 1 2 3 3 3 4 4 4 4 6
9 10 11
UAV=6
LAV=1
Draw box-plot
Outliers: outside whiskers, marked with * (end 4)
Psychology 2113
z Scores
Normal Distributions
Standard Normal Distribution
z Scores
The aspect of the data we want to describe/measure is
relative position.
z scores are
statistics that describe the relative position of something in its
distribution.
Verbal formula: z is something minus its mean divided
by its standard deviation.
Formulas:
For X in sample, z=(X-X)/s*
For X in population, z=(X-m)/s
z Scores, cont.
Characteristics:
The mean of a distribution of z scores is zero.
The variance of a distribution of z scores is one.
The shape of a distribution of z scores is reflective,
the shape is the same as the shape of the distribution of the Xs.
Example: Compute z
Sample, if X=34, X=40, and s*²=9, then
z=(X-X)/s*=(34-40)/3=-6/3=-2.
Population, if X=10, m=8, and s²=7,
then
z=(X-m)/s=(10-8)/2.6457
.=2/2.6457
.=.7559
Normal Distributions
Family of theoretical distributions; there are many
different normal distributions.
Characteristics:
Symmetric, continuous, unimodal.
Bell-shaped.
Scores range from -₯ to +₯
.
Mean, median, and mode are all the same value.
Each distribution has two parameters, m and s².
Normal Distributions, cont.
Examples:
IQ is normally distributed with m=100 and s²=225.
Height of American males is normally distributed with m=69
and s²=9.
The standard normal (or unit normal) distribution has m=0
and s²=1.
We can transform any normal distribution to the
standard normal distribution by computing z scores: the resulting distribution
of z scores will have a shape that is normal.
Standard Normal Distribution
We use this distribution to get probabilities
associated with a z score (probability, proportion, and area under the curve
are synonymous).
Example:
If Joe is 72 inches tall, what is the probability that
any randomly selected man is his height or taller?
For height, m=69
and s²=9,
so
z =(X-m)/s=(72-69)/3=3/3=1
From Table A.2, p(z>1)=.1587
Standard Normal Distribution, cont.
Using Table A.2, there
are two key facts:
Total area equals one
Symmetry
Standard Normal Distribution, cont.
More examples: if z is normal and p(z>1.645)=.05
Compute p(z<1.645)
Want all the area to the left of 1.645, a large area.
The area is in the middle and left tail.
Use total area=1 to get
p(z<1.645) = 1-.05 = .95
Standard Normal Distribution, cont.
More examples: if z is normal and p(z>1.645)=.05
Compute p(z<-1.645)
Want all the area to the left of -1.645, a small area.
The area is in the left tail.
Use symmetry to get
p(z<-1.645) = .05
Standard Normal Distribution, cont.
More examples: if z is normal and p(z>1.645)=.05
Compute p(z>-1.645)
Want all the area to the right of -1.645, a large area.
The area is in the middle and right tail.
Use symmetry and total area=1 to get
p(z>-1.645) = .95
Standard Normal Distribution, cont.
Standard Normal Distribution, cont.
Standard Normal Distribution, cont.
Standard Normal Distribution, cont.
Standard Normal Distribution, cont.
Psychology 2113
Correlation
Regression
Correlation and Regression
Both examine linear
(straight line) relationships.
Both work with a pair of
scores, one on each of two variables, X and Y.
Correlation:
Defined as the degree of
linear relationship between X and Y.
Is measured/described by
the statistic r.
Regression:
Is concerned with the
prediction of Y from X.
Forms a prediction
equation to predict Y from X.
Correlation
The aspect of the data
that we
want to describe/measure is
the degree of linear relationship
between X and Y.
The statistic r
describes/measures the degree of linear relationship between X and Y.
r=SzXzY/N, the average product of z
scores for X and Y
Works with two
variables, X and Y
-1<r<1,
r measures positive or negative relationships
Measures only the degree
of linear relationship
r2=proportion
of variability in Y that is explained by X
r is undefined if X or Y
has zero spread
r is dimensionless
Correlation: -1<r<1.
The sign of r shows the
type of linear relationship between X and Y. We can use the definitional
Correlation: Linear
If there is a curvilinear
relationship between X and Y,
then r will not detect it. The value of r will be zero if there is no linear
relationship between X and Y.
Correlation: r2
r2=proportion
of variability in Y
that is explained by X.
If r=.5, r2=.25, so the proportion of
variability in Y that is explained by X is
.25 (as a percentage, this shows 25% explained by X, 75% unexplained).
Scatterplots:
r=.5,
r2=.25 r=.7, r2=.49 r=.9, r2=.81
Venn
Diagrams: r2 is represented by the proportion of overlap.
Y
X Y X Y X
Correlation: Undefined
If there is no spread in
X or Y, then r is undefined. Note that any z is undefined if the standard
deviation is zero, and r=SzXzY/N.
Correlation, cont.
r
Computational formula
(p. 176)
Example, p. 177-178
(Excel)
(SAS)
Population correlation
coefficient, r (rho).
Impact on r:
Restriction of range.
Combined data.
Extreme scores
(outliers).
Correlation does not
imply causation.
Regression
Regression is concerned
with
forming a prediction equation to
predict Y from X.
Uses the formula for a
straight line, Y=bX+a.
Y is the predicted Y
score on the criterion variable.
b is the slope, b=DY/ D
X=rise/run.
X is a score on the
predictor variable.
a is the Y-intercept,
where the line crosses the Y axis, the value of Y when X=0, a=Y-bX.
Example: if b=2, a=8,
and X=23,
then Y=2(23)+8=54.
Regression, cont.
Linear only.
Generalize only for X
values in
your sample.
Actual observed Y is
different from Y by an amount called error, e, that is, Y=Y+e.
Error in regression is
e=Y-Y.
Many different potential
regression lines.
Regression: Best Line
The statistics b and a
are computed so as to minimize the sum of squared errors,
Se2=S(Y-Y)2 is a minimum
This is called the Least
Squares Criterion.
Regression: b and a
Computing b and a:
b=[NSXY-(SX)(SY)]/[NSX2-(SX)2]
a=Y-bX
Example, p.209 (SAS)
For males,
Anxiety=.04Age+6.65
For females,
Anxiety=-.16Age+15.71
Regression: sy.x
Standard error of
estimate is a
statistic that measures/describes
spread of errors or Y scores
in regression.
sy·x is the
standard deviation of errors in regression
sy·x = ΦSe2/(N-2)= ΦS(Y-Y)2/(N-2).
sy·x = Φ[(N-1)/(N-2)](sy)Φ(1-r2)
As r2
increases, sy·x decreases. For example, if N=100 and sy=4
r2 sy·x
.2 3.94
.4 3.68
.6 3.22
.8 2.41
.9 1.75
Regression: Partitioning
Partition total spread
Total = Explained + Not
Explained
This is true for
proportion of spread and amount of spread.
Proportion: 1 = r2
+ (1-r2)
Amount: s2y
= s2y r2 + s2y(1-r2)
Example: Total Expl. Not Expl.
r=.5, s2y=200, Proportion
Amount
Psychology
2113
Probability
Probability
Defined as relative
frequency of occurrence.
Basic definitions:
Sample space: all
possible outcomes of an experiment.
Elementary event: a
single member of the sample space.
Event: any collection of
elementary events.
Probability:
p(elementary
event)=1/(total number)
p(event)=(number in the
event)/(total number)
Conditional probability:
p(A|B)=(number in [A and
B])/(number in B)
The probability of A in
the redefined (reduced) sample space of B.
Probability: The Juror Example
The sample space is all
48 jurors.
An elementary event is
any one of the 48 jurors.
An event is any subgroup
of the 48: e.g. the 31 who gave an award.
Probabilities:
p(elementary event)=1/48
p(award)=31/48=.65
Conditional probability:
p(Award|Auth.)=18/20
The 20 Auth. are the
reduced sample space.
Always do the denominator first.
Out of the 20 Auth., the 18 who gave an award go in
the numerator.
Probability, cont.
(The Big Three)
Independence (1):
events A
and B are independent if
p(A|B)=p(A)
The A probability is not
changed by
reducing the sample space to B.
Multiplication (And)
Rule (2):
p(A and
B)=p(A)p(B|A)=p(A|B)p(B)
Mutually exclusive:
Events A and B do not
have any elementary events in common.
Events A and B cannot
occur simultaneously.
p(A and B)=0
Addition (Or) Rule (3):
p(A or B)=p(A)+p(B)-p(A
and B)
Probability: The Juror Example: Independence
The first of the Big 3:
Independence.
Is Award independent of Auth.? It is if
p(Award)=p(Award|Auth.).
p(Award)=31/48=.65
p(Award|Auth.)=18/20=.90
No, Award is not independent of Auth. because .65Ή.90.
Another example.
Is No Award independent of Egal.? It is if
p(No Award)=p(No Award|Egal.).
p(No Award)=17/48=.35
p(No Award|Egal.)=15/28=.54
No, No Award is not independent of Egal. Because .35Ή.54.
Probability: More About Independence
Independence:
A is independent of B if
p(A)=p(A|B) or if p(B)=p(B|A).
Examples:
If p(A)=.3 and
p(A|B)=.4, A and B are not independent because .3Ή.4.
Given: p(A)=.1, p(B)=.2,
p(A|B)=.3, and p(B|A)=.4. Is A independent of B? Explain. No, A and B are not
independent because .1Ή.3 (or .2Ή.4).
Given: p(A)=.1, p(B)=.2,
p(A|B)=.1, and p(B|A)=.2. Is A independent of B? Explain. Yes, A and B are
independent because .1=.1 (or .2=.2).
Given: p(A)=.47,
p(B)=.34, and p(B|A)=.49. Which o f these is the reason that A is not
independent of B? .47Ή.34, .47Ή.49, .34Ή.49.
Probability: More About Ind., cont.
People in a restaurant
were asked before and after their meal if they thought they would like a
dessert. Is Dessert independent of Before?
p(Dessert)=55/104=.53
and p(Dessert|Before)=34/52=.65
No, Dessert is not
independent of Before because .53Ή.65, or
p(Before)=52/104=.50 and
p(Before|Dessert)=34/55=.62
No, Dessert is not
independent of Before because .50Ή.62.
Which two probabilities
would you have to examine to determine if No Dessert is independent of After?
One way: see if p(No
Dessert) equals p(No Dessert|After).
Or, see if p(After)
equals p(After|No Dessert).
Now, what are the two
probabilities for each of these?
Probability: The Juror Example:
Multiplication Rule
The second of the Big 3:
Multiplication (And) Rule: p(A and
B)=p(A)p(B|A)=p(A|B)p(B).
This is the product of
two probabilities: one about A, one about B, one a marginal probability, one a
conditional probability.
Compute p(Award and
Auth.). We know the answer to this is 18/48=.375.
p(Award and
Auth.)=p(Award)p(Auth.|Award)=(31/48)(18/31)=18/48
p(Award and
Auth.)=p(Award|Auth.)p(Auth.)=(18/20)(20/48)=18/48
Compute p(Award and
Egal.). We know the answer to this is 13/48=.27.
p(Award and
Egal.)=p(Award)p(Egal.|Award)=(31/48)(13/31)=13/48
p(Award and
Egal.)=p(Award|Egal.)p(Egal.)=(13/28)(28/48)=18/48
Probability: More About Multiplication Rule
Multiplication (And)
Rule:
p(A and B)=p(A)p(B|A)=p(A|B)p(B).
This is the product of
two probabilities: one about A, one about B, one a marginal probability, one a
conditional probability.
Other examples:
p(A)=.2, p(B)=.6,
p(A|B)=.3, p(A and B)=p(A|B)p(B)=(.3)(.6)=.18
p(A)=.5, p(B)=.1,
p(B|A)=.2, p(A and B)=p(A)p(B|A)=(.5)(.2)=.10
If all four
probabilities are given, you can do the problem two ways and you should get the
same answer: p(A)=.2, p(B)=.4, p(A|B)=.3, p(B|A)=.6,
p(A and
B)=p(A)p(B|A)=(.2)(.6)=.12
p(A and
B)=p(A|B)p(B)=(.3)(.4)=.12
Probability: The Juror Example: Mutually
Exclusive
Not one of the Big 3 and
not the same as independence.
Are No Award and Auth.
mutually exclusive? No, because p(No Award and Auth.)=2/48Ή0. If none of the 48 jurors had responded in this
cell, then No Award and Auth. would have been mutually exclusive.
Are Award and No Award
mutually exclusive? Yes, because all 48 jurors have been classified as one of
the two, and p(Award and No Award)=0.
Probability: The Juror Example: Addition
Rule
The third of the Big 3:
Addition (Or) Rule: p(A
or B)=p(A) + p(B) - p(A and B). This is the most complicated of the Big 3
because it uses the Multiplication Rule to get its answer.
Compute p(Award or
Auth.).
p(Award or
Auth.)=p(Award)+p(Auth.)-p(Award and Auth.) = (31/48)+(20/48)-(18/48)=33/48=.69
Compute p(Award or
Egal.).
p(Award or
Egal.)=p(Award)+p(Egal.)-p(Award|Egal.) = (31/48)+(28/48)-(13/48)=46/48=.96
Probability: More About the Addition Rule
Addition (Or) Rule: p(A
or B)=p(A)+p(B)-p(A and B).
Note that you add
together two marginal probabilities and subtract off a joint (And) probability.
Other
examples:
p(A)=.2, p(B)=.6,
p(A|B)=.3.
First, compute p(A and
B): p(A and B) = p(A|B)p(B)=(.3)(.6)=.18.
Now, compute p(A or B):
p(A or B) = p(A)+p(B)-p(A and B) =
.2+.6-.18=.62.
p(A)=.5, p(B)=.1, p(B|A)=.2.
First, compute p(A and
B): p(A and B) = p(A)p(B|A)=(.5)(.2)=.10.
Now, compute p(A or B):
p(A or B) = p(A)+p(B)-p(A and B) =
.5+.1-.10=.50.
Probability: Review
Defined as relative
frequency of occurrence.
Basic definitions:
Sample space: all
possible outcomes of an experiment.
Elementary event: a
single member of the sample space.
Event: any collection of
elementary events.
Probability:
p(elementary
event)=1/(total number)
p(event)=(number in the
event)/(total number)
Conditional probability:
p(A|B)=(number in A and
B)/(number in B)
The probability of A in
the redefined (reduced) sample space of B.
The only probability
that does not have total number in the denominator.
Probability, cont.
(The Big Three)
Independence (1):
events A
and B are independent if
p(A|B)=p(A)
The A probability is not
changed by
reducing the sample space to B.
Multiplication (And)
Rule (2):
p(A and
B)=p(A)p(B|A)=p(A|B)p(B)
Mutually exclusive:
Events A and B do not
have any elementary events in common.
Events A and B cannot
occur simultaneously.
p(A and B)=0
Addition (Or) Rule (3):
p(A or B)=p(A)+p(B)-p(A
and B) (end 8)
Psychology 2113
Sampling Distributions
Introduction
Types of Distributions
Sampling Distribution of X
Other Sampling Distributions
Estimation
Sampling Distributions: Introduction
Pivotal subject:
distributions of statistics. Foundation
linchpin
important
crucial.
You need sampling
distributions to make inferences:
To get probabilities of
statistics for decision making about parameters.
To get information
necessary to estimate parameters.
A distribution that
could be formed by drawing all possible samples of a given size N from some
population, computing the statistic for each sample, and arranging these
statistics in a distribution.
Every statistic has a
sampling distribution.
Types of Distributions
Population:
Distribution of all
possible scores, Xs;
Usually large,
unobtainable, and hypothetical;
Has parameters m and s2, the values of which are usually unknown;
Unknown shape;
We want to infer to one
of the parameters or to the distribution itself.
Types of Distributions, cont.
Sample:
Distribution of the N
scores that we actually have, Xs;
Usually a manageable
size, already obtained, and real;
Contained in what we
will call our real world;
Has known statistics
like X and s2;
Known shape;
We want to infer from
one of the statistics to a parameter.
Types of Distributions, cont.
Sampling distribution:
Distribution of a
statistic over all possible samples, for example, Xs;
Shows the variability of
the statistic;
Theoretical;
Has parameters and
usually a known shape;
The bridge for the
inference from the sample to the population, from the statistic to the
parameter;
Where we get the
probabilities of the statistic so we can make decisions about the parameter.
Types of Distributions, cont.
Sampling Distribution of X-bar
The sampling
distribution of X-bar
Has the purpose of any
sampling distribution: to obtain probabilities
Has the definition of
any sampling distribution: the distribution of a statistic.
Has specific
characteristics:
Mean: mX = m
Variance: s2X =s2/N
Shape is normal if
Population is normal
N is large (Central Limit
Theorem)
Sampling Distribution of X-bar (Review)
The sampling
distribution of X-bar
Purpose is_______________________
_______________________________.
Definition
is______________________ _______________________________.
Has specific
characteristics:
Mean: mX = ___
Variance: s2X =_______
Shape is __________ if
Population is
____________
N is _______ (_______________
Theorem)
Sampling Distribution of X-bar : Use Of zX-bar
IQ of deaf children:
What is the mean of this
population distribution? Is it 100, like for the population of all IQ scores (m=100 and s2=225)?
What is the probability
of getting X=88.07 or less if m=100
(and s2=225)?
To get this probability,
we need a new statistic, zX=(X-m)/Φ(s2/N).
zX=(88.07-100)/Φ(225/59) = -6.11
p(X<88.07)=p(z<-6.11)<.00003
Use Of zX-bar
IQ of deaf children:
So what does this look
like and how does it help us decide about m=100? Is the mean of the IQ of deaf children 100?
Because the probability
of getting X=88.07 or less if m=100
is so small, less than .00003, we reject the idea that m=100.
It is very unlikely to
get the data that led to X=88.07 from a population with m=100.
Other Sampling Distributions
The sampling
distribution of X-bar is the first sampling distribution we learn, but it is
not the only one (all statistics have sampling distributions).
All sampling
distributions have in common:
Purpose: to obtain probabilities
Definition: the
distribution of a statistic.
But each sampling
distribution has specific characteristics like mean, variance, and shape.
Other Sampling Distributions, cont.
Sampling distributions
of
s*2 and s2:
Both have shapes that are
positively skewed.
The mean of s*2
is [(N-1)/N]s2, always
smaller than s2.
The mean of s2
is s2.
Other Sampling Distributions, cont.
Sampling distributions
of
r, s*, and s.
r: the mean is r (rho) if r=0, and the shape is symmetric but not normal.
s* and
s: neither has a mean equal to s.
Estimation
You need sampling
distributions to make
inferences:
To get probabilities of
statistics for decision making about parameters.
To get information
necessary to estimate parameters.
Estimation is the
calculation of an approximate value of a parameter.
Point estimation is the
use of a statistic as a single value (point) to estimate a parameter.
Any statistic can be
used to estimate any parameter.
Some statistics are
good, and logical, estimates of particular parameters, such as X-bar as an
estimate of m.
Unbiased estimate is
one definition of good estimate.
Estimation: Unbiased
Unbiased estimate: A
statistic
is an unbiased estimate of a
parameter if the mean of its sampling distribution is
equal to the parameter: mstatistic=desired parameter.
The following statistics
are unbiased estimates of their corresponding parameters:
X-bar is an unbiased
estimate of m because mX=m.
s2 is an
unbiased estimate of s2 because ms²=s2.
r is an unbiased
estimate of r because mr=r if r=0.
Note that the statistic
and parameter can change, but the definition of unbiased is mstatistic=parameter.
Estimation: Unbiased, cont.
The following statistics
are
not unbiased estimates of
their corresponding parameters (each is a biased
estimate):
s*2 is a
biased estimate of s2 because ms*²Ήs2.
s* is a
biased estimate of s because ms*Ήs.
s is a biased estimate
of s because ms Ήs.
Note that the statistic
and parameter can change, but the definition of unbiased is mstatistic=parameter,
always m of the statistic, and this m equals the desired parameter.
Estimation: Confidence Intervals
Sampling distributions
give information necessary to estimate parameters.
Estimation is the
calculation of an approximate value of a parameter.
Interval estimation
allows you to obtain an interval of potential values for a parameter.
For the problem with IQ
of deaf children, we found that X = 88.07 for our sample mean. We know that X
is a good (unbiased) estimate of m, but we also know that X has variability so it is unlikely that m=88.07. However, 88.07 should be close to m.
Estimation: Confidence Intervals, cont.
A confidence interval
for m gives an interval of values around X-bar that are
likely to include the true value of m.
A 95% confidence
interval for m is given by
X-1.96(Φs2/N) to X+1.96(Φs2/N).
For the problem with IQ
of deaf children, X = 88.07, N=59, and s2=225. So
the 95% confidence interval for m is
88.07-1.96(Φ225/59) to 88.07+1.96(Φ225/59)
84.24 to 91.90.
We can say that we are
95% confident that the m of the IQ of deaf
children is between 84.24 and 91.90. Or, we can say that 95% of intervals like
this one will include the true value of the m of the IQ of deaf children.
Estimation: Confidence Intervals, cont.
We
can say that 95% of intervals like 84.24 to 91.90 will include the true value
of the m of the IQ of deaf children. The true value of this m is unknown, but many intervals, each from a different
sample, would cluster around the true mean.
Psychology 2113
Hypothesis Testing
Introduction
Examples
Key Terms
Hypothesis Testing: Introduction
This is the last of the
seven topics common to all inferential statistic, and so it integrates all of
the other six. Principally, hypothesis uses probability and the sampling
distribution of a statistic to make decisions about a parameter.
Hypothesis testing is
the process of testing tentative guesses about relationships between variables
in populations. These relationships between variables are evidenced in a
statement , a hypothesis, about a population parameter.
Hypothesis Testing: Examples
Rat-shipment example:
are the rats defective? Or are they OK? If m=33 and s2=361, is the X=44.4 from the sample of N=25 rats
significantly different from 33?
Compute zX=(X-
m)/Φ(s2/N)=(44.4-33)/Φ(361/25)= 11.4/3.8=3.00.
Find p(X>44.4)=p(z>3)=.0013.
It is unlikely the rats came from a population with m=33 for the mean run time. So we decide that mΉ33 and that the rats are defective.
Hypothesis Testing: Examples, cont.
IQ of deaf children
example: are the deaf children lower in IQ? Or are they average? If m=100 and s2=225, is
the X=88.07 from the sample of N=59 deaf children significantly lower than 100?
Compute zX=(X-
m)/Φ(s2/N)=(88.07-100)/Φ(225/59)=
-11.93/1.95=-6.11.
Find p(X<88.07)=p(z<-6.11)<.00003.
It is unlikely the deaf children came from a population with m=100 for the mean IQ. So we decide that m<100 and that the children have lower IQ scores (remember, this
is due to the fact that their language, ASL, is not English, so they score
lower on the verbal part of the total IQ test).
Hypothesis Testing: Key Terms
Test statistic: a
statistic used only for the purpose of testing hypotheses; e.g. zX.
Assumptions: conditions
placed on a test statistic necessary for its valid use in hypothesis testing;
for zX, the
assumptions are that the population is normal in shape and that the
observations are independent.
Null hypothesis: the
hypothesis that we test; Ho.
Alternative hypothesis:
where we put what we believe; H1. Both Ho and H1
are stated in terms of a parameter.
Hypothesis Testing: Key Terms, cont.
Significance level: the
standard for what we mean by a small probability in hypothesis testing; a.
Directional and
non-directional hypotheses.
One- and two-tailed
tests, critical values, and rejection values.
Decision rules:
Critical value decision
rules.
p-value decision rules.
(end of 10)
HO and H1
Rat-shipment example:
We start with H1.
We believe that there is something wrong with the rats, or that mΉ33. So we have H1: mΉ33.
Next, we state Ho.
The null is always the opposite of the alternative. Within Ho and H1,
the set of potential values of the parameter to be tested usually contains all
possible numbers. The null hypothesis usually has the equals in it. So we
have Ho: m=33.
IQ of deaf children
example:
Again, we start with H1.
We believe that the deaf children will score lower on the IQ test because
English is not their native language, or that m<100. So we have H1: m<100.
Next, we state Ho.
So we have Ho: m>100.
Significance Level
The significance level
is the small probability used in hypothesis testing to determine an unusual
event that leads you to reject Ho.
The significance level
is symbolized by a (alpha).
The value of a is almost always set at a=.05.
The value of a is chosen before data are collected.
If Ho is
rejected when a=.05, here are examples
of what you say:
The mean of the IQ of
deaf children, X=88.07, is significantly lower than 100, z=-6.11, p<.00003.
The mean of the run
times, X=44.4, is significantly different from 33, z=3.00, p=.0013.
Directional and Non-directional Hypotheses
Directional hypotheses
specify a particular direction for values of the parameter.
IQ of deaf children
example: Ho: m>100, H1: m<100.
Non-directional
hypotheses do not specify a particular direction for values of the parameter.
Rat shipment example: Ho:
m=33, H1: mΉ33.
Another example:
Suppose you believe that
dancers are more introverted than other people. You have N=26 dancers and know
that for this age group with your male/female ratio that m=19.15.
So you have Ho:
m<19.15
and H1: m>19.15.
One- and Two-Tailed Tests, Critical Values,
and Rejection Values
One- and two-tailed
tests:
A one-tailed test is a
statistical test that uses only one tail of the sampling distribution of the
test statistic.
A two-tailed test is a
statistical test that uses two tails of the sampling distribution of the test
statistic.
Critical values are
values of the test statistic that cut off a or a/2 in the tail(s) of the theoretical reference
distribution.
Rejection values are the
values of the test statistic that lead to rejection of Ho.
One- and Two-Tailed Tests, Critical Values,
and Rejection Values, cont.
Rat shipment example: Ho:
m=33, H1: mΉ33
Two-tailed test
Critical values are zcrit=-1.96 and zcrit= 1.96
Rejection values are <-1.96
and >1.96
One- and Two-Tailed Tests, Critical Values,
and Rejection Values, cont.
IQ of deaf children
example: Ho: m>100, H1: m<100.
One-tailed test
Critical value is zcrit=-1.645
Rejection values are <-1.645
One- and Two-Tailed Tests, Critical Values,
and Rejection Values, cont.
Introversion of dancers
example: Ho: m<19.15, H1:
m>19.15.
One-tailed test
Critical value is zcrit=1.645
Rejection values are >1.645
Critical Value Decision Rules
Rat shipment example: Ho:
m=33, H1: mΉ33.
Reject Ho if
the observed zX<-1.96 or if zX>1.96.
The observed zX
was zX=3.00.
Critical Value Decision Rules, cont.
IQ of deaf children
example: Ho: m>100, H1: m<100.
Reject Ho if
the observed zX<-1.645.
The observed zX
was zX=-6.11.
Critical Value Decision Rules, cont.
Introversion of dancers
example: Ho: m<19.15, H1:
m>19.15.
Reject Ho if
the observed zX>1.645.
The observed zX
was zX=.76.
Compute zX for Introversion of
Dancers
Remember, you believe
that dancers are more introverted than other people. You have N=26 dancers and
know that for this age group with your male/female ratio that m=19.15. So you have Ho: m<19.15
and H1: m>19.15.
Introversion of dancers
example: are the dancers higher in introversion? Or are they average? If m=19.15 and s=4.32, is the X=19.79 from the sample of N=26 dancers significantly
higher than 19.15?
Compute zX=(X-
m)/Φ(s2/N)= (19.79-19.15)/Φ(18.6624/26)=.64/.85=.76.
p-Value Decision Rules
Rat shipment example: Ho:
m=33, H1: mΉ33.
Reject Ho if
the SAS (2-tailed) p-value is <a=.05.
The SAS p-value is
.0026.
Reject Ho: m=33 because .0026<.05 (p-value is < a).
p-Value Decision Rules, cont.
Reject Ho if
½ the SAS p-value <a, and
the observed zX
is in the tail specified by H1.
½ the SAS p-value is
.00003 and the observed zX was in the left tail (as in H1).
p-Value Decision Rules, cont.
Reject Ho if
½ the SAS p-value <a, and
the observed zX
is in the tail specified by H1.
½ the SAS p-value is
.2236 and the observed zX was in the right tail (as in H1).
So, retain Ho:
m<19.15.
Psychology 2113
Types of Error
Power
Factors That Influence Power
Types of Errors
Two hypotheses, two
decisions, two types of error: this was one of the seven topics common to all
inferential methods.
The two hypotheses are Ho
and H1, and the two decisions are to Reject Ho and to
Retain Ho.
Now we come to the
errors that you can make in hypothesis testing:
A Type I error: to
reject Ho when Ho is true.
A Type II error: to
retain Ho when H0 is false (H1 is true).
Types of Errors, cont.
Each of these types of
errors has a probability of occurring:
p(Type I error)=p(reject
Ho | Ho true)=a.
p(Type II
error)=p(retain Ho | H1 is true)=b.
We summarize these in a
2x2 box:
Types of Errors, cont.
Now lets see what this
looks like with a picture of the two distributions.
p(Type I error)=p(reject
Ho | Ho true)=a.
p(Type II
error)=p(retain Ho | H1 is true)=b.
Types of Errors, cont.
If you have already
rejected Ho, the only error you can make is a Type I error, and
because you have not retained Ho, then b=0 (after the fact).
If you have already
retained Ho, the only error you can make is a Type II error, and
because you have not rejected Ho, then a=0 (after the fact).
Types of Errors, cont.
It is extremely
important to keep the probabilities of both types of errors small.
We keep a small by definition, a=.05. We have direct control over a.
However, we do not have
direct control over b. To keep b small, we keep 1- b=power large by using the influence of several factors, thus indirectly
controlling power (and b).
Power
Power = p(rejecting Ho)
= 1-b
We keep b small and 1-b large indirectly by using the influence of several factors: effect
size, N, s2, a, and type of hypotheses.
Power: Effect Size
Effect size, for zX
is g=(m-mo)/s, the difference between the true mean and the mean
given in the Ho divided by the population standard deviation.
As effect size
increases, power increases.
Power: Sample Size, N
N, sample size, is the
factor that gives you the greatest control over power. You usually can choose
N, and N has a great influence on power.
As N increases, power
increases.
Power: s2
s2, population variance, offers you little control over
power. You usually have controlled s2 through
good research methods.
As s2 decreases, power increases.
Power: a
a, p( Type I error),
usually set at .05, also offers you little control over power. You can choose
to use .01 or smaller, but will rarely use a larger than .05.
As a increases, power increases.
Power: Type of Hypotheses
Directional hypotheses
have greater power if you are correct in predicting direction, but virtually
zero power if you are wrong.
Power: Type of Hypotheses, cont.
Non-directional
hypotheses have good power in either direction, but lower power than that for a
directional hypothesis in the correct direction.
Power: Review
Power =
p(______________) = ____
We keep b small and 1-b large indirectly by using the influence of several factors:
effect
size=_________________: as effect size increases, power_____________.
N: as N increases,
power_____________
s2: as s2________________, power increases.
a: as a increases, power_____________.
A directional hypothesis
gives greater power if______________________________________.
A non-directional
hypothesis gives good power in__________________________.
(end 11
Psychology 2113
New Test Statistics
One-Sample t-test
Test of Correlation
New Test Statistics
All test statistics
(inferential methods) have some things in common: use of descriptive
statistics, use of probability
all of the basics of hypothesis testing. For
example, all have a null hypothesis, all use a, and for all, increasing N increases power.
But some things are
different. For every new test statistic, we will cover four topics:
Situation, including the
hypotheses.
Test statistic.
Theoretical reference
distribution, critical values, and decision rules.
Assumptions.
New Test Statistics, cont.
I encourage you to start
a chart. Put the four topics on the left side (rows) and the test statistics on
the top (columns). Start with zX.
One-Sample t-Test
1.
Situation/hypotheses
2.
Test statistic
3.
Distribution
4.
Assumptions
t Distributions
t distributions have the
following characteristics:
Theoretical distribution
that is symmetric, smooth, unimodal,
and has m=0.
Looks like the standard
normal distribution, but has longer tails and more variability.
The greater variability
is due to t statistics having not only a mean, X, that varies from sample to
sample, but also a variance, s2.
t Distributions, df
t distributions have
only one parameter, df (degrees of freedom). The formula for df can change from
one t statistic to the next.
The working definition
for df is In a sample variance, df=number of independent components number of parameters estimated.
The one-sample t has the
unbiased sample variance, s2, in its formula. In s2=S(X-X)2/(N-1), there are N values of X, the
independent components, and 1 statistic, X-bar, that estimates the 1 parameter,
m. So df=N-1 for the one-sample t.
df, One-Sample t
The
df for the one-sample is N-1. Note that the whole concept of df came with the
t-test. There was no concept of df associated with zX. So whatever
changed from zX to t is what brought with it the concept of df. So
how does t differ from zX? t has s2.
t Table
Now
we can use df and a=.05
to find a critical value for t, tcrit. The t table is organized by
df for the rows and a for
one- and two-tailed tests for the columns. If N=10, then for a one-sample t,
df=N-1=10-1=9. For a two-tailed test with a=.05 and df=9, the critical values are ±2.262.
One-Sample t-Test: Example
Are people who are
interrupted in a task accurate in estimating how long they have spent on the
task? People who were given 20 3-letter anagrams to solve (e.g. arn is ran)
were interrupted after doing 10 of them and asked to estimate how long they had
worked on the task. The researchers formed a ratio of estimated to actual time,
and mratio
should be 1 if the people are accurate in estimating time.
The ratios for the N=10
people are .911 1.011 1.807
2.010 1.911 2.156
1.251 1.516 2.730
1.160
Get the sum and the sum
of squares of the ratios:
SX=16.463 and SX2=30.119405
One-Sample t-Test: Example
Now compute the mean, X,
and the unbiased variance, s2, and s.
X=1.646, s2=.3352,
s=.579.
Ho:m=1, H1:m Ή1. So we are now ready to compute t=(X-m)/Φ(s2/N)
= (1.646-1)/Φ(.3352/10)=3.53
Using a critical value
decision rule, the upper tcrit is 2.262 and 3.53>2.262. Using a
p-value decision rule, the SAS p-value was .0064<a=.05.
So both decision rules
lead us to reject Ho:m=1. What does this look like in the sampling distribution of t?
One-Sample t-Test: Example, cont.
Find the observed t=3.53
and the upper tcrit=2.262 in the distribution below. Because
3.53>2.262, or because .0064<.05, we reject Ho:m=1. People interrupted in a task significantly
overestimate the time spent in the task.
Test of Correlation: r
Continuing with your
chart, we will add a new test statistic to zX and the one-sample t.
You already know it as a descriptive statistic, but here it will be used to
test hypotheses.
Test of Correlation: df
The df for r is N-2. It
can be shown why df=N-2 from the standard error of estimate. The standard error
of estimate is a statistic that describes spread of errors or Y scores in
correlation and regression. So, in sY.X
we look for independent components and statistics (that estimate parameters).
r Critical Values
Now
we can use df and a=.05
to find a critical value for r, rcrit. The table of rcrit
is organized by df for the rows and a
for one- and two-tailed tests for the columns. If N=10, then for r,
df=N-2=10-2=8. For a two-tailed test with a=.05 and df=8, the critical values are ±.632.
Test of Correlation: Example
Researchers
believed stress for police officers increased as number of hours spent
moonlighting on a second job increased. For 28 officers, r was .45. Is r
significantly larger than zero?
Confidence Intervals for m
Remember,
interval estimation allows you to obtain an interval of potential values for a
parameter.
For
the problem about the ratio of estimated time to actual time for interrupted
anagram solvers, we found X=1.646 for
our sample mean. We know that X is a good (unbiased) estimate of m, but
we also know that X has variability so it is unlikely that m=1.646.
However, 1.646 should be close to m.
Now we will see how to get an interval for m when we dont know s2.
Confidence Intervals for m, cont.
A
confidence interval for m
gives an interval of values around X that are likely to include the true value
of m.
A 95% confidence
interval for m is
given by
X-tcrit(Φs2/N)
to X+tcrit(Φs2/N).
For the problem
about the ratio of estimated time, X=1.646, N=10, s2=.3352, df=N-1=
9, and tcrit= ±2.262.
So the 95% confidence interval for m
is
1.646-2.262(Φ.3352/10)
to 1.646+2.262(Φ.3352/10)
1.23 to 2.06.
Confidence Intervals for m, cont.
So
for the 95% confidence interval for m
of
1.23 to
2.06
we can say that we are 95% confident that the m of
the ratio of estimated to actual time is between 1.23
and 2.06. Or, we can say that 95% of intervals like
this one will include the true value of the m of the ratio
of estimated time to actual time for people interrupted
after solving 10 of 20 anagrams. Note that 1 is not in
the interval, so we reject Ho: m=1.
Confidence Intervals for m, cont.
We
can say that 95% of intervals like 1.23 to 2.06 will include the true value of
the m of the ratio of estimated to actual time. The true
value of this m is unknown, but many
intervals, each from a different sample, would cluster around the true mean.
Psychology 2113
Two-Sample Tests
Two-Independent-Sample t-test
Two-Dependent-Sample t-test
Two Samples
The one-sample t-test
and test of correlation are realistic, useful statistical tests. The tests that
we will learn next are even more so: they dont need a known value of m. They both use two samples.
You can evaluate
research on two groups of people who saw a brief film of a car wreck. Is there
any difference in estimates of speed between those who were asked, How fast
were the cars going when they hit into each other? vs How fast were the cars
going when they smashed into each other?
Two-Independent- Sample t-test
Situation/hypotheses
Test statistic
Distribution
Assumptions
Two-Independent- Sample t-test
We have independent
samples whenever there is not any obvious dependency present. When we cover the
two-dependent-sample t, we will see some of these obvious ways samples can be
dependent.
Why is df=n1+n2-2?
The denominator for the two-independent-sample t has both s21
and s22. In s21 =S(X1-X1)2/(n1-1),
there are n1 independent X1 scores and one statistic, X1.
So, for s21, the df equals n1-1. Similarly for
s22, with df=n2-1. Adding the two df together
gives df=n1-1+ n2-1=n1+ n2-2.
Two-Independent-Sample t-test: Example
One group (PC=perceived
control) of 30 students thought items they submitted might be selected for the
test. The other group (NC=no control) of 30 was told writing the items was a
study aid. Students were randomly assigned to groups. Exam stress was measured
by number of symptoms.
Results: for Group PC, X1=10,
s21=8.8276.
For Group NC, X2=12.5, s22=8.6034. df=n1+n2-2=30+30-2=58. Critical
values for df=55 are ±2.004. The computed value of
t=3.28, so we reject Ho:mPC=mNC because
3.28>2.004.
Two-Independent-Sample t-test: Example,
cont.
The sampling
distribution for t for the exam stress example is shown below. We reject Ho:mPC=mNC because
t=3.28>2.004 or because p=.0018<a=.05. The two groups differ significantly in number of symptoms of
stress.
Two-Independent-Sample t-test: Robustness
What happens to t when
its assumptions are not met? Is t a good statistic? Is a still equal to .05? The topic of robustness of test
statistics examines their quality or validity when an assumption is not met
(when the assumption is violated). A statistic is robust to violation of
an assumption if
Its sampling
distribution is well-fit by its theoretical distribution.
atrue»aset. Note
that atrue is from the sampling distribution, and aset is
from the theoretical distribution.
When aset=.05,
approximately equals is defined as .04 to .06.
We get this
information from research on statistics.
Two-Independent-Sample t-test: Robustness,
cont.
Two-Dependent- Sample t-test
Situation/hypotheses
Test statistic
Distribution
Assumptions
Two-Dependent- Sample t-test: X,X Pairs
We have dependent
samples whenever we have X,X pairs of scores. Such pairs can happen in at least
three different ways:
Researcher-produced
pairs. If students in the exam stress study had been matched on GPA, the
researcher would have produced the pairs. The X scores on number of symptoms in
the PC group would be dependent on the X scores in the NC group.
Naturally occurring
pairs. For example, husband-wife, siblings, etc.
Repeated measures. This
could be the pre- and post-test scores
when people are measured before and after a treatment.
Two-Dependent-Sample t-test: Test Statistic
The t for this test is
based on first getting difference scores, d=X1-X2. Then
the statistics in t can be
computed:
d=Sd/N
s2d=S(d-d)2/(N-1)=[NSd2-(Sd)2]/[N(N-1)]
Why is df=N-1? The
denominator for the two-dependent-sample t has s2d. In s2d,
there are N independent ds and one statistic, d. So, for s2d
and the two-dependent-sample t, the df equals
N-1.
Two-Dependent- Sample t-test: Example
Does intravenous
injection of butyrate, a flavor
enhancer, give
increased fetal hemoglobin in 6 sickle-cell anemia patients?
Results: for
pre-injection scores, X=14.3, and for post-injection scores, X=38.6.
Computations on d gave
d=24.3, s2d=347.8667,
and t=3.20. With N=6 patients, df=N-1=6-1=5. The critical value for df=5 is
2.015. So we reject Ho:mPost<mPre
because 3.20>2.015.
SAS
Two-Dependent-Sample t-test: Example, cont.
The sampling distribution
for t for the butyrate injection example is shown below. We reject Ho:mPost<mPre
because t=3.20>2.015, or, because ½p=.012<a=.05 and t is positive.
Psychology 2113
One-Way ANOVA
Introduction
Logic
F-test
One-Way ANOVA: Introduction
Now we examine a test
statistic that will let us test hypotheses about two or more means, so we can
use two or more groups. The two-sample t-tests could work with only two groups;
the one-way ANOVA uses two or more.
Does smoking impact your
thinking? Non-smokers (NS), active smokers (AS, had just smoked), and deprived
smokers (DS, not smoked for 3 hours) did several tasks, ranging from simple to
complex. In complex tasks, there were significant differences between the
groups, with the AS group doing the worst.
One-Way ANOVA: Logic
The ANOVA F-test uses a
different logic than zX, r, or any of the t-tests. They were all
based on a logic that looked for how far the test statistic was from a middle
value of zero. If the statistic was far enough away from zero, and in agreement
with H1, then you rejected Ho.
The ANOVAs logic forms
an F-ratio of two sample variances, one based on the group means (Between) and
the other based on scores within groups (Within). If Ho of equal
populations means is true, both variances should be equal and the average F
will be about 1. If the population means arent equal, we expect
Between>Within, average F>1, and we reject Ho if F>Fcrit
(note: do demo).
One-Way ANOVA: Logic, cont.
H1:any differences in mjs
One-Way ANOVA: Logic, cont.
Notation: n=# obs. per
group, J=# groups, N=nJ.
Two sample variances:
One based on group means
(Between). Compute s2X and multiply it by n. This also is
called MSB, so ns2X=MSB.
One based on scores
within groups (Within). Compute s2 of observations within each of
the J groups, and the average of these J values of s2 is s2pooled,
also called MSW.
Form the test statistic,
F=(ns2X)/(s2pooled)=MSB/
MSW.
Hypotheses:
Ho:m1=m2=
=mJ
H1:any
differences in mjs
One-Way ANOVA: Logic, cont.
We reject Ho if F>Fcrit.
One-Way ANOVA: F-test
1.
Situation/hypotheses
2.
Test statistic
3
.Distribution
4.
Assumptions
One-Way ANOVA: Factors vs. Levels
The ANOVA is a general
statistical tool, including the one-way ANOVA, the two-way ANOVA and beyond.
The one in one-way refers to the number of factors (variables that classify
the subjects into groups). A one-way layout looks like this:
One-Way ANOVA: Test Statistic
Hypotheses: if J=4
Ho:m1=m2=m3=m4
H1:any
differences in mjs
The test statistic is
the F-ratio, F=MSB/MSW=(SSB/dfB)/(SSW/dfW),
where dfB=J-1 and dfW=N-J
(or if you have =ns, J(n-1)).
Example: if SSB=410,
SSW=630, n=30, and J=4, then dfB=J-1=4-1=3, and dfW=J(n-1)=4(29)=116,
so F=(410/3)/(630/116)=136.67/5.43=25.16.
One-Way ANOVA: F Distribution
The F distribution is a
positively skewed distribution with a minimum of zero.
It has two parameters,
the df for the numerator variance and the df for the denominator variance. For
the one-way ANOVA, the df for the numerator is dfB and the df for
the denominator is dfW.
The F table of critical
values is organized by dfB, dfW, and a (.05 and .01). Only upper-tail critical values are
given because we expect the F only to get large if H1 is true.
One-Way ANOVA: F Distribution, cont.
Here is a picture of the
F distribution with dfB=3 and dfW=60, with the critical
value that cuts off a=.05 in the upper tail.
One-Way ANOVA: Assumptions
The one-way ANOVA F
statistic is distributed as FJ-1,N-J
only if all of the assumptions are met. If any of the assumptions are not met,
then F only approximately has this distribution and we need to ask questions
about robustness for each assumption.
Normality: like the
two-independent-sample t, F is reasonably robust to non-normality, except for
mixed distributions.
Equal variances: unlike
the t, F is not robust to very unequal variances, even with large and equal
ns.
Independence: like the
t, F is not robust to dependence in the data, but we typically meet this
assumption.
One-Way ANOVA: Unequal Variances
Unlike the t, F is not
robust to very unequal variances, even with large and equal ns, if J>2.
For example, for J=4,
n=50, if the population variances are in the ratio of 16:1:1:1, then the true a is .088 when a is set at .05. Note that .088 is larger than the .06 that we set as an
upper boundary on a=.05. Also note that the
n=50 per group is a bunch larger than the n=15 that it took to make the t
robust to any ratio in variances.
The F is robust to
slightly unequal variances, but you dont know the population variances.
This problem of the Fs
lack of robustness to very unequal variances is resolved when we get to the
next statistical procedures, MCPs.
One-Way ANOVA: Example, Liar Data
Which occupation should
best be able to detect liars? Secret Service agents, judges, and psychiatrists
were compared on percent correct in detecting which of ten people were lying
(Liar data, p.471). Here XSS=64, Xjudges=56.57, Xpsy=57.71.
Hypotheses:
Ho:mSS=mJudges=mPsy
H1:any
differences in mjs
F=MSB/MSW=(SSB/dfB)/(SSW/dfW),
where dfB=J-1 and dfW=J(n-1).
SSB=1120, SSW=16,845.7143,
n=35, and J=3, then dfB=J-1=3-1=2, and dfW=J(n-1)=3(34)=102,
so F=(1120/2)/(16845.7143/102)=560/165.1541=3.39.
One-Way ANOVA: Example, Liar Data
Next, get an Fcrit
for dfB=2 and dfW=102. Using 2 and 100, with a=.05 we have Fcrit=3.09.
One-Way ANOVA: Example, Liar Data
The results of an ANOVA
are often reported in an ANOVA summary table. Below is the summary table for
the Liar Data. (SAS)
One-Way ANOVA: t2=F
The final topic for the
ANOVA is to show the connection between the two-independent-sample t and the
one-way ANOVA F when J=2.
The relationship is that
t2=F. Lets do an example: J=2, n=31.
Note the df for both
tests: for t, df=n1+n2-2=31+31-2=60. For F, dfB=J-1=2-1=1,
dfW=J(n-1)=2(30)=60.
Now, get tcrit
and Fcrit: tcrit =±2.00 and Fcrit=4.00.
If you square 2, you
get 4, and if you square 2, you get 4, so squaring the values of tcrit
gives the value of Fcrit.
Note that the tail
area cut off by the t of 2 is .025, and the tail area cut off by the t of 2 is
also .025. Add them and you get .05, the tail area cut off by the F of 4.
One-Way ANOVA: t2=F
Here is a picture of
what happens with t260=F1,60.
Psychology 2113
Multiple Comparison Procedures
(MCPs)
Introduction
Tukeys
HSD
Fisher-Hayter
Test
Multiple Comparison Procedures: Introduction
If the ANOVA F rejects Ho,
it is favoring H1; but H1 merely says any difference in the mjs. So
the F doesnt tell you which groups have different means, it says some
difference, somewhere. This fact, along with the lack of robustness to unequal
variances, makes us not rely on the F as the most important statistic for a
one-way design.
We need tests for the
multiple differences that exist between the J means. For example, which of the
groups is best at detecting liars: Secret Service agents, judges, or
psychiatrists? The significant F merely says there is a difference.
Multiple Comparison Procedures: Introduction
Pairwise comparisons are
differences in means taken two at a time. On J means, there are C=J(J-1)/2 pairwise
comparisons.
The hypotheses for
pairwise comparisons are Ho:mj=mj and H1:mjΉmj.
Multiple Comparison Procedures: Introduction
Error rates:
p()=p(at least one Type
I error)<(1-(1-a)C<Ca where C is the number of pairwise comparisons you
are doing and a is the alpha set for
each comparison.
Error rate per
comparison sets a=.05 for each
comparison, so p()<(1-(1-.05)C<C(.05) could be
large. For C=3, p()<(1-(1-.05)3<3(.05) gives
p()=.143<.15.
Error rate familywise
controls p() at a maximum of .05 for a group of C comparisons by keeping a small.
Multiple Comparison Procedures: Introduction
For the liar data, J=3,
so C=J(J-1)/2=3(2)/2=3. There are three pairwise comparisons: SS vs. judges, SS
vs. psychiatrists, and judges vs. psychiatrists.
Error rate per
comparison sets a=.05 for each
comparison, so p()=.143<.15.
Error rate familywise
sets a=.016952427 and p()=.05.
Multiple Comparison Procedures: Introduction
If J=4,
C=J(J-1)/2=4(3)/2=6. There are six pairwise comparisons: 1 vs. 2, 1 vs. 3, 1
vs. 4, 2 vs. 3, 2 vs. 4, and 3 vs. 4.
Error rate per
comparison sets a=.05 for each
comparison, so p()=.265<.30.
Error rate familywise
sets a=.008512445 and p()=.05.
Tukeys MCP
automatically controls p() familywise.
Multiple Comparison Procedures: Tukey
We will use a t
statistic for multiple comparisons, and the Studentized Range distribution that
has a critical value of q/Φ2.
Multiple Comparison Procedures: Example
Tukeys MCP on the liar
data gives three t statistics:
SS vs. judges, t=(64-56.57)/Φ[(2(165.1541))/35]=7.43/3.072=2.42
SS vs.
psychiatrists,
t=(64-57.71)/Φ[(2(165.1541))/35]=6.29/3.072=2.05
Psychiatrists vs.
judges, t=(57.71-56.57)/Φ[(2(165.1541))/35]=1.14/3.072=.37
Critical value: q/Φ2
For J=3, dfW=102
(well use 60), and a=.05, the value of
q=3.40. So the t critical value is q/Φ2=3.40/Φ2=2.40.
Only the SS vs. judges t
is significant (reject Ho because 2.42>2.40).
SS agents out-do judges,
but not psychiatrists. Psychiatrists are not better than judges. (SAS)
Multiple Comparison Procedures:
Fisher-Hayter
Like the Tukey except
that the overall F must be significant and it uses the Studentized Range
distribution with a critical value of q/Φ2 for J-1.
Multiple Comparison Procedures: Example
For Fisher-Hayters MCP
on the liar data, F was significant, 3.39>3.09; the same three t statistics:
SS vs. judges, t=2.42
SS vs. psychiatrists,
t=2.05
Psychiatrists vs.
judges, t=.37
Critical value: q/Φ2 but uses J-1
For J-1=2, dfW=102
(well use 60), and a=.05, the value of
q=2.83. So the t critical value is q/Φ2=2.83/Φ2=2.00.
Now both SS vs. judges
and SS vs. psychiatrists ts are significant (reject Hos because
2.42>2.00 and 2.05>2.00). The Fisher-Hayter MCP has better power than
Tukeys MCP.
SS agents out-do judges
and psychiatrists. Psychiatrists are not better than judges. (end of 15)
Psychology 2113
Two-Way ANOVA
Introduction
Logic
Interaction
F-tests
Two-Way ANOVA: Introduction
The two-way ANOVA uses
two factors, variables that combine to form the groups. The factors may or may
not be independent variables.
The groups formed by
combining levels/values of the factors are called cells, and the means of the
observations in these cells are called cell means.
We have three F-tests in
a two-way ANOVA, one for each of the two factors by themselves, and one for the
interaction of the two factors.
Two-Way ANOVA: Introduction
Example: runners and
cyclists randomly assigned to one of three amounts of time to hold a hamstring
stretch, tested for flexibility after six weeks. Sport and time are the
factors, we call this a 2X3 ANOVA, and there are 6 cells.
So we will have an
F-ratio for sport, an F-ratio for time, and an F-ratio for the interaction of
sport and time.
Two-Way ANOVA: Logic
The logic of the two-way
ANOVA is the same as that for the one-way: for each of the three F-tests, form
an F-ratio of two sample variances. For each F, if Ho is true, both
variances should be equal and the average F will be about 1.
For each F, if H1
is true,
We expect
numerator>denominator,
We expect average
F>1,
And we reject Ho
if F>Fcrit.
The difference is that
the two-way ANOVA is more complex: there are three Fs. The effects of the
factors are called main effects.
Two-Way ANOVA: Logic, cont.
Notation: n=# obs. per
cell, J=# levels of A, K=# levels of B, N=nJK.
Each of the three Fs is
formed as a ratio of two sample variances: the numerator will be the MS for the
effect tested, the denominator will be MSW.
Hypotheses:
For A (e.g. Sport)
Ho:m1=m2=
=mJ
H1:any
differences in mjs
For B (e.g. Time)
Ho:m1=m2=
=mK
H1:any
differences in mks
For interaction (not
easily expressed in terms of ms),
Ho:no interaction
effect
H1:some
interaction effect
Two-Way ANOVA: Interaction
Interaction is a unique
combination of the factors, a combined effect separate from the main effects;
interaction cant be explained by the factors alone.
Maybe for runners, 30s
gives the best flexibility, but for cyclists, they need 60s. This is an example
of interaction.
When cell means are
plotted, interaction shows up as line segments that are not parallel.
Two-Way ANOVA: Interaction, cont.
Another example:
Irritable bowel syndrome (IBS) involves abdominal pain, a sudden/urgent need to
go to the bathroom, and frequent diarrhea or constipation. A new drug helps
only women who are prone to diarrhea, not women with constipation nor men with
either condition.
Using a rating of the
drug that increases as effectiveness of the drug increases, these results would
look like this:
Two-Way ANOVA: Interaction, cont.
Plots of cell means
showing the three F-tests (assumes that MSW is small so any observed
difference is significant).
Two-Way ANOVA: Interaction, cont.
Plots of cell means
showing the three F-tests (assumes that MSW is small so any observed
difference is significant).
Two-Way ANOVA: F-tests
1.
Situation/hypotheses
2.
Test statistic
3
.Distribution
4.
Assumptions
Two-Way ANOVA: Factors vs. Levels
Each two-way ANOVA
always has two factors, but each of these factors can have different numbers of
levels (levels are the values of the factors). Here are several different
two-way layouts:
Two-Way ANOVA: Test Statistics
The test statistics are
F-ratios,
FA=MSA/MSW=(SSA/dfA)/(SSW/dfW),
where dfA=J-1 and
dfW=JK(n-1)
FB=MSB/MSW=(SSB/dfB)/(SSW/dfW),
where dfB=K-1 and
dfW=JK(n-1)
FAB=MSAB/MSW=(SSAB/dfAB)/(SSW/dfW),
where dfAB=(J-1)(K-1)
and dfW=JK(n-1)
If n=20, J=3, and K=4,
compute the df:
dfA=J-1=3-1=2
dfB=K-1=4-1=3
dfAB=(J-1)(K-1)=(3-1)(4-1)=(2)(3)=6
dfW=JK(n-1)=(3)(4)(20-1)=12(19)=228
Two-Way ANOVA: F Distributions
Each F-statistic in a
two-way ANOVA has its own, possible different, F distribution and F critical
value.
FA is
distributed as FJ-1, JK(n-1)
FB is
distributed as FK-1, JK(n-1)
FAB is
distributed as F(J-1)(K-1), JK(n-1)
Find the three Fcrit
values for dfA=2, dfB=3, dfAB=6, and dfW=228
(use dfW=200, see Table A.6).
For A, Fcrit=3.04.
For B, Fcrit=2.65.
For AB, Fcrit=2.14.
Two-Way ANOVA Example: Intro
Do study technique (four
groups: no notes, student notes, outline framework, and complete outline) and
cognitive style (FI=field independent=self-sufficient and provide own
structure, FD=field dependent = need outside structure) impact scores on a
20-item multiple choice test? Within cognitive style, students were randomly
assigned to study technique, 13 per cell, and all listened to the same taped
lecture over the material on the quiz. Heres the 2X4 layout.
Two-Way ANOVA Example: Results
ANOVA
Summary Table
Source SS df MS F p (SAS)
A=Cog.
Styles 25.009 1 25.009 7.78 .0064
B=Study
Tech. 320.182 3 106.727 33.22 .0001
AB=interaction 27.260 3
9.086 2.83 .0426
Within 308.462 96 3.213
Total 680.913 103
For
example, FA=MSA/MSW=(SSA/dfA)/(SSW/dfW),
where dfA=J-1 and dfW=JK(n-1). So FA=(25.009/1)/(308.462/96)=25.009/3.213=7.78.
Two-Way ANOVA Example: FAB
For the
study-techniques/cognitive-style study, how would the results for FAB
be reported? Many journals would say, For the test scores, the interaction of
study technique and cognitive style was significant, F3,96=2.83,
p=.0426.
Lets see why we came to
this conclusion, using both types of decision rules.
Critical value decision
rule: for df of 3 and 96 (use 80), Fcrit is 2.72, so we reject Ho
because 2.83>2.72.
p-value decision rule:
the p-value from the ANOVA table was .0426, so we reject Ho because
.0426<.05.
Now, how do we interpret
these results?
Two-Way ANOVA Example: FAB
A significant
interaction says the effects of one factor on the dependent variable depend on
the level of the other factor. That is, the main effect results may not be
consistent across levels of the other main effect.
Here, the effect (on
test scores) of study technique depends on level of cognitive style: for FI
students, any note-taking is better than no notes. For FD students, only the
two outline methods are better than no notes.
Two-Way ANOVA Example: FAB
Or, interaction says
that FI differs from FD (significantly) only at Student Notes. Both FI and FD
do well with outlines and poorly with No Notes, but only FI students do well
with their own notes.
Note that the
significant main effect results are modified: FI is better than FD only for
Student Notes, and, Student Notes is better than No Notes only for FI students.
Also, you need MCPs to test these means.
Two-Way ANOVA Example: FAB
The significant
interaction is a red flag, warning that the main effect results may not be
consistent across levels of the other main effect.
Psychology 2113
Nonparametric Methods
Introduction
Chi-Square
Test
Rank
Tests
Nonparametric Methods: Introduction
All of the inferential
statistics you have learned so far have tested hypotheses about parameters and
made a normality assumption. These test statistics are called parametric
methods. Now we will learn some new statistics called nonparametric methods.
Nonparametric Methods: Introduction
NP methods that are good
for the nonparametric hypotheses detect any difference in the populations, such
as middle, spread, skewness, kurtosis, or any combination. F-tests, t-tests,
and the test of correlation all were able to detect only one specific parameter
or difference in parameters.
Some NP methods are
actually sensitive to parameter differences, even though they were designed to
test the more general nonparametric hypotheses.
NP methods do not assume
normality, but do make some assumption about independence. Also, they assume
the underlying distribution of the data is continuous.
Nonparametric Methods: Introduction
Statisticians use
qualitative and quantitative to describe data. Psychologists often use scales
of measurement.
Scales of measurement:
Nominal: name only, e.g.
gender, M F can be any numbers that are different, M=0 and F=1, or M=3 and F=2.
Ordinal: name and rank,
e.g. football rankings, 1 is higher ranked than 2, 2 than 3, etc.
Interval: name, rank,
and equal intervals, e.g. temperature in C°, 20°C-10°C is 10°C, as is
40°C-30°C.
Ratio: name, rank, equal
intervals, and a true zero point, e.g. height, because zero means absence of
height.
Nonparametric Methods: Introduction
When do you use NP
methods? Four issues have been raised in the literature on NP tests.
Hypotheses: Use NP
methods (like the Chi-Square test) if you want to test hypotheses that are
truly nonparametric, e.g. Ho:distribution1=distribution2.
Assumptions: Use NP
methods if the normality assumption is known to be violated by having a mixed
distribution, with 5-10% of the scores as outliers in one tail of the distribution
of the population.
Scale of measurement:
Use NP methods (like the Chi-Square test) if you have a nominal scale of
measurement, or, perhaps one of the rank tests if you have ordinal data.
Sample size (N): Often
given as a consideration in deciding between NP and parametric tests, N is not
an issue. Unequal ns problems plague the NP tests as well, and small N is no
better with NP than parametric tests.
Nonparametric Methods: c2 Test,
Introduction
The c2 test for
contingency tables is a NP method that is good for nonparametric hypotheses and
can detect any difference in the populations, such as middle, spread, skewness,
kurtosis, or any combination.
The c2 test for
contingency tables does not assume normality.
It is used if the data
are qualitative, if participants are placed into categories, if we have
categorical variables, or if frequencies are involved. These typically go
together.
Nonparametric Methods: c2 Test,
Example
Who initiates touch in a
public setting, M or F? Does this depend on whether the relationship is young
or old? This is a problem for the c2 test for
contingency tables.
Nonparametric Methods: c2 Test
1.
Situation/hypotheses
2.
Test statistic
3
.Distribution
4.
Assumptions
Nonparametric Methods: c2 Test,
Hypotheses
The null hypothesis for
the c2 test for
contingency tables may be stated in one of two equivalent ways:
Independence of the two
categorical variables, for example, Ho:who initiates touch is
independent of age of the relationship.
Equality of
distributions for levels of one of the categorical variables, e.g. Ho:distributionyoung=distributionold.
Nonparametric Methods: c2 Test,
Es
E=(row frequency)(column
frequency)/N.
For the <1 yr. and F
cell, compute E. E=(146)(101)/219=67.33. Do the same for all other cells.
Nonparametric Methods: c2 Test
Statistic
The c2=S[(O-E)2/E], where O is the observed
frequency for a cell and E is the expected frequency for a cell. Note that
there is a separate (O-E)2/E for each cell, and then these are added
together.
For the question of independence
of age of relationship and who initiates touch, we have c2=S[(O-E)2/E] =(60-67.33)2/67.33+(86-78.67)2/78.67 +(41-33.67)2/33.67+(32-39.33)2/39.33 =.7987+.6836+1.5974+1.3672 =4.45.
Nonparametric Methods: c2 Test,
Distribution
The c2
distribution is positively skewed, has a minimum of zero, and one parameter,
df.
Nonparametric Methods: c2 Test,
Distribution
Does who initiates touch
in a public setting, M or F, depend on whether the relationship is young or
old?
Nonparametric Methods: c2 Test,
Distribution
c2=4.45, so reject Ho:independence because
4.45>3.84. We believe that who initiates touch depends on the length of the
relationship.
Note that we have a
one-tailed test even though the hypothesis is non-directional.
Nonparametric Methods: Rank Tests
Nonparametric statistics
based on ranks have the following common characteristics:
They use the sum of the
ranks
They have common
assumptions of independence and a continuous underlying distribution, and the
rank tests for two and J independent samples have an implicit assumption of
equal variances.
Simplicity
Power: if the normality
assumption is met, rank tests are up to 96% as powerful as their parametric
counterpart, but potentially much more powerful if normality is not met.
They are sensitive to
difference in middle (specifically, medians) or to monotonic relationships.
Nonparametric Methods: Rank Tests
The following table
shows analogous parametric and nonparametric statistics based on ranks for
several situations.