Go to:
CoHort Software |
CoStat |
CoStat Statistics
ANOVA in CoStat
(Including Experimental Designs, Unbalanced Designs,
Missing Values, Multiple Comparisons of Means,
Planned Contrasts, and Orthogonal Contrasts)
ANOVA is an acronym for ANalysis Of VAriance. An ANOVA
segregates different sources of variation seen in experimental
results. Some of the sources are "explained" (usually due to the
treatments the experimenter applied), while the remainder
are lumped together as "unexplained" variation (also called the
"Error term"). An ANOVA then tests if the variation associated with
an explained source is large relative to the unexplained
variation. If that ratio (the F statistic)
is so large that the probability that it
occurred by chance is low (for example, P<=0.05), we can conclude (at that
level of probability) that that source of variation did have a
significant effect.
For example, consider an experiment where
three varieties of wheat were grown at four locations.
At each of the locations, there were four blocks, within
each of which were small plots for each of the varieties.
The yield of each plot was measured.
We wish to know if there is a significant difference in yield associated with
the different varieties (one source of variation). We also wish to
know if one location was superior to another. Finally, we wish to
know if some varieties are superior at one location but inferior at
another (that is, if there is an interaction of variety and location).
The ANOVA procedure will answer these questions.
The layout of the various
test plots and the method of assigning treatments to those plots
constitutes the "experimental design." The wheat experiment,
for example, is a "randomized complete blocks" experiment; all of the
treatments occur once, randomly arranged in each block. Experimental
designs can vary greatly. Each design requires a slightly
different mathematical model and a slightly different procedure for
analysis. Extensive discussions of different experimental
designs and different ANOVA procedures can be found in statistics
texts such as Gomez and Gomez (1984),
Little and Hills (1978),
Snedecor and Cochran (1980), and
Sokal and Rohlf (1995).
CoStat can handle virtually any type of experimental design.
It has a large number of pre-defined models that you can pick
from a list (including: 1, 2, 3 and 4 way completely randomized,
1 and 2 way randomized blocks,
latin square, nested, split plot, split-split plot, split
block, some covariance designs, etc).
Or, you can use a special language to describe
different models.
Because the ANOVA procedure uses a Generalized
Linear Models (GLM) approach, it can analyze
unbalanced designs and experiments with missing values.
It can calculate the Type I, II, or III Sums of Squares.
Before performing the ANOVA, CoStat performs Bartlett's
test for homogeneity of variances, one of the assumptions of ANOVA.
After performing the ANOVA, the procedure can automatically run a
means comparisons test (also called multiple comparisons of means)
(for example, Duncan's, Student-Newman-Keuls
(SNK), Tukey-Kramer, Tukey's HSD, or
Least Significant Difference (LSD)).
Contrasts are
related to multiple comparisons of means, but the tests are done
during the ANOVA procedure. Contrasts are comparisons
of different subsets of means and are planned before the experiment is
conducted. You can specify any contrasts that you want.
For example, you might test the control against all other
treatments. Contrasts are also called a priori comparisons, planned
comparisons, and orthogonal contrasts (which indicates there is
no overlap between the statistical questions asked by several
contrasts).
ANOVA in the CoStat Manual
CoStat's manual has:
- An introduction to Analysis of Variance.
- A description of the calculation methods that are used by the program.
- A complete description of the little language that
can be used to describe models
(if the model you need has been pre-defined).
- 11 complete sample runs.
The sample runs show how to do 11 different types of ANOVAs
and some of the
related tests (for example, orthogonal contrasts and
multiple comparisons of means). Here is sample run #4:
Sample Run 4 - 2 Way Randomized Blocks ANOVA
In a randomized
blocks design, the experimental units are in groups called
blocks. Usually, each block contains 1 replicate of each
combinations of treatments in
random order. Thus, there is 1 restriction on
randomization. Such experiments are useful in fields with
naturally high variability along one axis (for example, due to irrigation).
The ANOVA segregates this variability so that differences between
treatments are not hidden by differences among the blocks
(presumably, the variability is much less within blocks). This is a
randomized "complete" blocks design because each block contains one
replicate of each of the treatment combinations. In CoStat, the
experiments need not be complete; there can be missing data points (by
design or by accident). Also, CoStat allows for more than one
replicate per treatment combination per block.
The sample run demonstrates a 2 way (also known as "2 factor") randomized
blocks design.
Here is the ANOVA model for a 2 Way Randomized Blocks ANOVA
(2WRB.aov):
\\\CoStat.AOV 1.00
\\\2 Way Randomized Blocks
\\\"1st Factor" "2nd Factor" "Blocks"
\\\Type III
Blocks \M 3
Main Effects
@1 \M 1
@2 \M 2
Interaction
@1 x @2 \I 1 2
Error \E
Total \T
In the wheat experiment
(modified from Allen, 1981),
three varieties of wheat were grown at four locations.
At each of the locations, there were four blocks, within
each of which were small plots for each of the varieties.
The Height and Yield of each plot were measured.
This data set is also important because it demonstrates
the use of string indices (Butte, Shelby, ...)
instead of numeric indices (1, 2, 3, ...)
(which older versions of CoStat required).
PRINT DATA
2000-08-03 09:43:16
Using: C:\cohort6\wheat.dt
First Column: 1) Location
Last Column: 5) Yield
First Row: 1
Last Row: 48
Location Variety Block Height Yield
--------- ---------- --------- --------- ---------
Butte Dwarf 1 91.75 58.77
Butte Dwarf 2 93 58.98
Butte Dwarf 3 91.75 53.73
Butte Dwarf 4 92.75 62.08
Butte Semi-dwarf 1 127.5 39.8
Butte Semi-dwarf 2 132.5 41.4
Butte Semi-dwarf 3 127.75 53.35
Butte Semi-dwarf 4 131.75 39.08
Butte Normal 1 146.5 24.33
Butte Normal 2 154.75 20.66
Butte Normal 3 150.75 24.22
Butte Normal 4 157.75 20.68
Shelby Dwarf 1 63.25 25.22
Shelby Dwarf 2 61.5 26.3
Shelby Dwarf 3 62.75 21.92
Shelby Dwarf 4 63.5 27.54
Shelby Semi-dwarf 1 80 25.97
Shelby Semi-dwarf 2 80 22.73
Shelby Semi-dwarf 3 82.5 28.44
Shelby Semi-dwarf 4 83.75 25.09
Shelby Normal 1 95 23.77
Shelby Normal 2 94 18.7
Shelby Normal 3 96.25 24.9
Shelby Normal 4 91.5 11.29
Dillon Dwarf 1 74 39.44
Dillon Dwarf 2 80 39.37
Dillon Dwarf 3 78.25 37.99
Dillon Dwarf 4 78.25 40.69
Dillon Semi-dwarf 1 106.5 28.42
Dillon Semi-dwarf 2 110.75 35.13
Dillon Semi-dwarf 3 110 36.14
Dillon Semi-dwarf 4 110.75 32.93
Dillon Normal 1 116.5 24.98
Dillon Normal 2 116.75 28.62
Dillon Normal 3 120.25 28.69
Dillon Normal 4 120.25 26.37
Havre Dwarf 1 67.5 26.47
Havre Dwarf 2 72.5 26.22
Havre Dwarf 3 68.75 26.15
Havre Dwarf 4 73.75 28.28
Havre Semi-dwarf 1 90.5 21.13
Havre Semi-dwarf 2 90.5 24.25
Havre Semi-dwarf 3 90.5 25.06
Havre Semi-dwarf 4 96 22.58
Havre Normal 1 97.75 24.16
Havre Normal 2 96.5 21.98
Havre Normal 3 103 25.86
Havre Normal 4 98.5 22.09
For the sample run, use File : Open to open the file called
wheat.dt
in the cohort directory. Then:
- From the menu bar, choose: Statistics : ANOVA
- Type: 2WRB - 2 Way Randomized Blocks
- Y Column: 5) Yield
- 1st Factor: 2) Variety
- 2nd Factor: 1) Location
- Blocks: 3) Block
- SS Type: (automatic)
- Keep If:
- Means Test: Student-Newman-Keuls
- Significance Level: 0.05
- OK
HOMOGENEITY OF VARIANCES - RAW DATA
2000-07-25 10:16:29
Using: c:\cohort6\wheat.dt
Data Column: 5) Yield
Broken Down By:
2) Variety
1) Location
3) Block
Keep If:
Bartlett's Test tests the homogeneity of variances, an assumption of
ANOVA. Bartlett's Test is known to be overly sensitive to non-normal data.
A resulting probability of P<=0.05 indicates the variances may be not
homogeneous and you may wish to transform the data before doing an ANOVA.
For ANOVA designs without replicates (notably most Randomized Blocks
and Latin Square designs), there is not enough data to do this test.
There is not enough data to do the test.
ANOVA
2000-07-25 10:16:29
Using: c:\cohort6\wheat.dt
.AOV Filename: 2WRB.AOV - 2 Way Randomized Blocks
Y Column: 5) Yield
1st Factor: 2) Variety
2nd Factor: 1) Location
Blocks: 3) Block
Keep If:
Rows of data with missing values removed: 0
Rows which remain: 48
Source df Type III SS MS F P
------------------------- -------- ----------- --------- --------- ----- ---
Blocks 3 39.24825625 13.082752 1.1827612 .3313 ns
Main Effects
Variety 2 1633.399687 816.69984 73.834688 .0000 ***
Location 3 2539.06904 846.35635 76.515818 .0000 ***
Interaction
Variety x Location 6 1387.188179 231.19803 20.901724 .0000 ***
Error 33 365.0194188 11.061195<-
------------------------- -------- ----------- --------- --------- ----- ---
Total 47 5963.924581
Model 14 5598.905163 399.9218 36.15539 .0000 ***
R^2 = SSmodel/SStotal = 0.93879543348
Root MSerror = sqrt(MSerror) = 3.32583741448
Mean Y = 30.665625
Coefficient of Variation = (Root MSerror) / abs(Mean Y) * 100% = 10.84549%
COMPARE MEANS
Factor: 2) Variety
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 3
LSD 0.05 = 2.39230738434
Rank Mean Name Mean n Non-significant ranges
----- ---------- ------------- ------- ----------------------------------------
1 Dwarf 37.446875 16 a
2 Semi-dwarf 31.34375 16 b
3 Normal 23.20625 16 c
COMPARE MEANS
Factor: 1) Location
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 4
LSD 0.05 = 2.76239862466
Rank Mean Name Mean n Non-significant ranges
----- --------- ------------- ------- ----------------------------------------
1 Butte 41.4233333333 12 a
2 Dillon 33.2308333333 12 b
3 Havre 24.5191666667 12 c
4 Shelby 23.4891666667 12 c
COMPARE MEANS
Factor: 3) Block
Test: Student-Newman-Keuls
Variance: 11.0611945076
Degrees of Freedom: 33
Significance Level: 0.05
Keep If:
n Means = 4
LSD 0.05 = 2.76239862466
Rank Mean Name Mean n Non-significant ranges
----- --------- ------------- ------- ----------------------------------------
1 3 32.2041666667 12 a
2 2 30.3616666667 12 a
3 1 30.205 12 a
4 4 29.8916666667 12 a
Go to:
CoHort Software |
CoStat |
CoStat Statistics |
Top
|