The methods by which sums of squares are calculated within a linear model differ depending on the function used to fit the model (lm(), aov()) and the function used to calculate the ANOVA table (anova(), car::Anova()). This results in different test statistic values and corresponding P-values.

Methods of estimating Sums of Squares (SSQ)

Type I (Sequential)

Steps for partitioning variance among multiple independent variables:

  1. SS(A) - Main effect of independent variable A.
  2. SS(B | A) - Main effect of B after the main effect of A.
  3. SS(AB | B, A) - Interaction A*B after all main effects.

Because the main factors are tested in a particular order (defined in the model specification), if the factors are unbalanced (i.e. different numbers of observations for each level of the factors) the order of factors in the specification matters.

Often there is a degree of multicollinearity among independent variables, when this is the case, there is a portion of the variance in the dependent variable which could potentially be attributed to multiple independent variables. In this method of partitioning, any variance that could be attributable to both independent variables is given to the one entered into the model specification first.

Type II (Partial with separate interactions)

Steps for partitioning variance among multiple independent variables:

  1. SS(A | B)
  2. SS(B | A)
  3. SS(A*B | A,B)

In this method, the variance is partitioned simultaneously between A with respect to B and B with respect to A. Any variance that can be attributed to both independent variables is ignored until the interaction term is calculated. Unlike Type III, this method doesn’t adjust SSQ(A) and SSQ(B) in response to the interaction term.

Type III (Partial with combined interactions)

Steps for partitioning variance among multiple independent variables:

  1. SS(A | B, A*B)
  2. SS(B | A, A*B)
  3. SS(A*B | A,B)

In this method, like Type II, the variance is partitioned simultaneously between A and B with respect to each other, but this time it also takes into account interaction terms during this partitioning. Then, like Type II it calculates the variance attributed to the interaction term with respect to both A and B.

This method disregards any potentially multi-attributable variance completely, meaning that SSQ(A) + SSQ(B) != SSQ(total). For this reason, many statisticians think this method is pretty rubbish except in very specific hypothesis testing circumstances.

Testing methods in R

All the code tested in this section is done so using data adapted from the mtcars dataset, in the {datasets} package.

First, getting set up, loading packages, data and some new grouping variables::

# Packages
library(dplyr)
library(car)

# Remove old crap
rm(list = ls())

# Clear console
cat("\014")  

# Load some example data
data(mtcars)

# Add factors for the anova analyses
mtcars$group <- mtcars$cyl %>%
    gsub(4, "A", .) %>%
    gsub(6, "B", .) %>%
    gsub(8, "C", .) %>%
    as.factor(.)

mtcars$group2 <- as.factor(rep(c("D", "E", "F", "G"), times = 8))

mtcars$group3 <- as.factor(rep(c("H", "I"), times = 16))

mtcars$group4 <- as.factor(rep(c("J", "J", "K", "K", "L", "L", "M", "M", "J", "K", "L", "M", "M", "L", "K", "J"), times = 2))

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb group
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4     B
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     B
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1     A
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1     B
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2     C
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1     B
##                   group2 group3 group4
## Mazda RX4              D      H      J
## Mazda RX4 Wag          E      I      J
## Datsun 710             F      H      K
## Hornet 4 Drive         G      I      K
## Hornet Sportabout      D      H      L
## Valiant                E      I      L

Basic linear model - testing reporting methods

Lets see what happens when I run a basic linear model and then create summary tables using different methods:

mod1 <- lm(wt ~ hp, data = mtcars)

summary(mod1)
## 
## Call:
## lm(formula = wt ~ hp, data = mtcars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41757 -0.53122 -0.02038  0.42536  1.56455 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.838247   0.316520   5.808 2.39e-06 ***
## hp          0.009401   0.001960   4.796 4.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7483 on 30 degrees of freedom
## Multiple R-squared:  0.4339, Adjusted R-squared:  0.4151 
## F-statistic:    23 on 1 and 30 DF,  p-value: 4.146e-05
anova(mod1)
## Analysis of Variance Table
## 
## Response: wt
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## hp         1 12.879  12.879  22.999 4.146e-05 ***
## Residuals 30 16.800   0.560                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mod1)
## Anova Table (Type II tests)
## 
## Response: wt
##           Sum Sq Df F value    Pr(>F)    
## hp        12.879  1  22.999 4.146e-05 ***
## Residuals 16.800 30                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All of these methods give the same P-values for the effect of hp on wt. The only difference being that summary() gives the standard error and model fit coefficients for the independent variable while anova() and Anova() give the Sums of Squares. Basically, summary() gives a linear regression test output, while anova() and Anova() give analysis of variance test outputs, no surprise there.

Multivariate models - testing ordering of variables using summary(lm())

First, create a bunch of models:

mod2 <- lm(wt ~ group + hp, data = mtcars)  # 1 group, group first
mod3 <- lm(wt ~ hp + group, data = mtcars)  # 1 group, group second
mod4 <- lm(wt ~ group + group2, data = mtcars)  # 2 groups of unequal cell size
mod5 <- lm(wt ~ group2 + group, data = mtcars)  # 2 groups of unequal cell size, order reversed
mod6 <- lm(wt ~ group2 + group4, data = mtcars)  # 2 groups of equal cell size
mod7 <- lm(wt ~ group4 + group2, data = mtcars)  # 2 groups of equal cell size, order reversed
mod8 <- lm(wt ~ hp + drat, data = mtcars)  # No groups, just continuous predictors
mod9 <- lm(wt ~ drat + hp, data = mtcars)  # No groups, just continuous predictors reversed

Then test the output of summary() on all these models:

summary(mod2)  # 1 group, group first
## 
## Call:
## lm(formula = wt ~ group + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8450 -0.4497 -0.1588  0.3226  1.4231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.2618558  0.3234149   6.994 1.32e-07 ***
## groupB      0.8199619  0.3339030   2.456  0.02053 *  
## groupC      1.6769220  0.4737962   3.539  0.00142 ** 
## hp          0.0002889  0.0031384   0.092  0.92732    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6409 on 28 degrees of freedom
## Multiple R-squared:  0.6125, Adjusted R-squared:  0.571 
## F-statistic: 14.75 on 3 and 28 DF,  p-value: 5.951e-06
summary(mod3)  # 1 group, group second
## 
## Call:
## lm(formula = wt ~ hp + group, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8450 -0.4497 -0.1588  0.3226  1.4231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.2618558  0.3234149   6.994 1.32e-07 ***
## hp          0.0002889  0.0031384   0.092  0.92732    
## groupB      0.8199619  0.3339030   2.456  0.02053 *  
## groupC      1.6769220  0.4737962   3.539  0.00142 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6409 on 28 degrees of freedom
## Multiple R-squared:  0.6125, Adjusted R-squared:  0.571 
## F-statistic: 14.75 on 3 and 28 DF,  p-value: 5.951e-06
summary(mod4)  # 2 groups of unequal cell size
## 
## Call:
## lm(formula = wt ~ group + group2, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9526 -0.3532 -0.1571  0.3176  1.3391 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.26848    0.29719   7.633 4.22e-08 ***
## groupB       0.93016    0.33456   2.780  0.00996 ** 
## groupC       1.73740    0.26722   6.502 6.83e-07 ***
## group2E     -0.17041    0.34674  -0.491  0.62722    
## group2F     -0.08595    0.32621  -0.263  0.79426    
## group2G      0.19710    0.33130   0.595  0.55704    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.649 on 26 degrees of freedom
## Multiple R-squared:  0.631,  Adjusted R-squared:  0.5601 
## F-statistic: 8.893 on 5 and 26 DF,  p-value: 5.07e-05
summary(mod5)  # 2 groups of unequal cell size, order reversed
## 
## Call:
## lm(formula = wt ~ group2 + group, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9526 -0.3532 -0.1571  0.3176  1.3391 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.26848    0.29719   7.633 4.22e-08 ***
## group2E     -0.17041    0.34674  -0.491  0.62722    
## group2F     -0.08595    0.32621  -0.263  0.79426    
## group2G      0.19710    0.33130   0.595  0.55704    
## groupB       0.93016    0.33456   2.780  0.00996 ** 
## groupC       1.73740    0.26722   6.502 6.83e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.649 on 26 degrees of freedom
## Multiple R-squared:  0.631,  Adjusted R-squared:  0.5601 
## F-statistic: 8.893 on 5 and 26 DF,  p-value: 5.07e-05
summary(mod6)  # 2 groups of equal cell size
## 
## Call:
## lm(formula = wt ~ group2 + group4, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7472 -0.7507  0.1062  0.5483  2.2463 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.62733    0.44133   8.219 1.43e-08 ***
## group2E     -0.28954    0.56640  -0.511    0.614    
## group2F      0.01742    0.60550   0.029    0.977    
## group2G     -0.10029    0.56640  -0.177    0.861    
## group4K     -0.64108    0.60550  -1.059    0.300    
## group4L     -0.36004    0.56640  -0.636    0.531    
## group4M     -0.26679    0.56640  -0.471    0.642    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.049 on 25 degrees of freedom
## Multiple R-squared:  0.0735, Adjusted R-squared:  -0.1489 
## F-statistic: 0.3305 on 6 and 25 DF,  p-value: 0.9144
summary(mod7)  # 2 groups of equal cell size, order reversed
## 
## Call:
## lm(formula = wt ~ group4 + group2, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7472 -0.7507  0.1062  0.5483  2.2463 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.62733    0.44133   8.219 1.43e-08 ***
## group4K     -0.64108    0.60550  -1.059    0.300    
## group4L     -0.36004    0.56640  -0.636    0.531    
## group4M     -0.26679    0.56640  -0.471    0.642    
## group2E     -0.28954    0.56640  -0.511    0.614    
## group2F      0.01742    0.60550   0.029    0.977    
## group2G     -0.10029    0.56640  -0.177    0.861    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.049 on 25 degrees of freedom
## Multiple R-squared:  0.0735, Adjusted R-squared:  -0.1489 
## F-statistic: 0.3305 on 6 and 25 DF,  p-value: 0.9144
summary(mod8)  # No groups, just continuous predictors
## 
## Call:
## lm(formula = wt ~ hp + drat, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3345 -0.3304 -0.1136  0.2541  1.2729 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.763735   0.956557   6.025 1.49e-06 ***
## hp           0.006058   0.001751   3.461  0.00169 ** 
## drat        -0.955128   0.224482  -4.255  0.00020 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5972 on 29 degrees of freedom
## Multiple R-squared:  0.6515, Adjusted R-squared:  0.6275 
## F-statistic: 27.11 on 2 and 29 DF,  p-value: 2.301e-07
summary(mod9)  # No groups, just continuous predictors reversed
## 
## Call:
## lm(formula = wt ~ drat + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3345 -0.3304 -0.1136  0.2541  1.2729 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.763735   0.956557   6.025 1.49e-06 ***
## drat        -0.955128   0.224482  -4.255  0.00020 ***
## hp           0.006058   0.001751   3.461  0.00169 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5972 on 29 degrees of freedom
## Multiple R-squared:  0.6515, Adjusted R-squared:  0.6275 
## F-statistic: 27.11 on 2 and 29 DF,  p-value: 2.301e-07

In all the above cases, summary(lm()) gives the same outputs, regardless of the order or type of variables. Because summary(lm()) isn’t necessarily fitting an ANOVA, it doesn’t use sequential sums of squares (Type I), instead it uses Type III partial sums of squares.

And just to prove that the cell sizes (i.e. number of data points in each level of each factor) are different between group and group2:

replications(wt ~ group + group2, data = mtcars)
## $group
## group
##  A  B  C 
## 11  7 14 
## 
## $group2
## [1] 8
replications(wt ~ group2 + group4, data = mtcars)
## group2 group4 
##      8      8

Comparing methods of fitting ANOVA type linear models, e.g. aov(), lm()

Create some more models, using lm() and aov():

mod10 <- lm(wt ~ group + hp, data = mtcars)  # lm() method, one grouping factor, one continuous
mod10a <- lm(wt ~ hp + group, data = mtcars)  # lm(), reversed factors
mod11 <- aov(wt ~ group + hp, data = mtcars)  # aov() method, one grouping factor, one continuous
mod11a <- aov(wt ~ hp + group, data = mtcars)  # aov(), reversed factors

and comparing model output using summary():

summary(mod10)  # lm()
## 
## Call:
## lm(formula = wt ~ group + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8450 -0.4497 -0.1588  0.3226  1.4231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.2618558  0.3234149   6.994 1.32e-07 ***
## groupB      0.8199619  0.3339030   2.456  0.02053 *  
## groupC      1.6769220  0.4737962   3.539  0.00142 ** 
## hp          0.0002889  0.0031384   0.092  0.92732    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6409 on 28 degrees of freedom
## Multiple R-squared:  0.6125, Adjusted R-squared:  0.571 
## F-statistic: 14.75 on 3 and 28 DF,  p-value: 5.951e-06
summary(mod10a)  # lm() reversed
## 
## Call:
## lm(formula = wt ~ hp + group, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8450 -0.4497 -0.1588  0.3226  1.4231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.2618558  0.3234149   6.994 1.32e-07 ***
## hp          0.0002889  0.0031384   0.092  0.92732    
## groupB      0.8199619  0.3339030   2.456  0.02053 *  
## groupC      1.6769220  0.4737962   3.539  0.00142 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6409 on 28 degrees of freedom
## Multiple R-squared:  0.6125, Adjusted R-squared:  0.571 
## F-statistic: 14.75 on 3 and 28 DF,  p-value: 5.951e-06
summary(mod11)  # aov()
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## group        2 18.176   9.088  22.128 1.72e-06 ***
## hp           1  0.003   0.003   0.008    0.927    
## Residuals   28 11.499   0.411                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mod11a)  # aov() reversed
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## hp           1  12.88  12.879  31.359 5.4e-06 ***
## group        2   5.30   2.650   6.453 0.00496 ** 
## Residuals   28  11.50   0.411                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm() decomposes estimations by group, whereas aov() just gives a simple F value for that variable. Models with aov() are affected by the order which factors are entered into the model, resulting in different Sums of Squares estimates. summary(lm()) uses Type I (sequential) SSQ while summary(aov()) uses Type III (partial) SSQ.

Comparing lm() models using different ANOVA table methods, anova(), Anova(), summary()

Now that I’ve tested the model fitting functions, I need to test the difference in reporting methods.

First, create some models of unequal cell size (unbalanced design):

mod12 <- lm(wt ~ group2 + group, data = mtcars)  
mod12a <- lm(wt ~ group + group2, data = mtcars)

Fits using Type I Sums of Squares by default, gives possible SSQ to first factor then leftover to second factor:

anova(mod12)  # lm()
## Analysis of Variance Table
## 
## Response: wt
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## group2     3  0.9218  0.3073  0.7295    0.5437    
## group      2 17.8058  8.9029 21.1371 3.543e-06 ***
## Residuals 26 10.9511  0.4212                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(mod12a)  # lm() reversed
## Analysis of Variance Table
## 
## Response: wt
##           Df  Sum Sq Mean Sq F value Pr(>F)    
## group      2 18.1758  9.0879 21.5763  3e-06 ***
## group2     3  0.5518  0.1839  0.4367 0.7286    
## Residuals 26 10.9511  0.4212                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sums of squared estimates change when the order of factors is changed!

Next, fits using Type II Sums of Squares, simultaneously assigns SSQ to both factors, not including any shared SSQ:

Anova(mod12, type = "II")  # lm()
## Anova Table (Type II tests)
## 
## Response: wt
##            Sum Sq Df F value    Pr(>F)    
## group2     0.5518  3  0.4367    0.7286    
## group     17.8058  2 21.1371 3.543e-06 ***
## Residuals 10.9511 26                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mod12a, type = "II")  # lm() reversed
## Anova Table (Type II tests)
## 
## Response: wt
##            Sum Sq Df F value    Pr(>F)    
## group     17.8058  2 21.1371 3.543e-06 ***
## group2     0.5518  3  0.4367    0.7286    
## Residuals 10.9511 26                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the Sums of squared estimates don’t change when the order of factors is changed.

Comparing Type III and Type II SSQ methods from Anova()

Anova() from the {car} package allows specification of the Sums of squares estimation method, so let’s test how they change the results.

Create models:

mod14 <- lm(wt ~ group2 + group4, data = mtcars)
mod14a <- lm(wt ~ group4 + group2, data = mtcars)
Anova(mod14, type = "II")
## Anova Table (Type II tests)
## 
## Response: wt
##            Sum Sq Df F value Pr(>F)
## group2     0.4401  3  0.1334 0.9393
## group4     1.2595  3  0.3817 0.7671
## Residuals 27.4975 25
Anova(mod14a, type = "II")
## Anova Table (Type II tests)
## 
## Response: wt
##            Sum Sq Df F value Pr(>F)
## group4     1.2595  3  0.3817 0.7671
## group2     0.4401  3  0.1334 0.9393
## Residuals 27.4975 25
Anova(mod14, type = "III")
## Anova Table (Type III tests)
## 
## Response: wt
##             Sum Sq Df F value    Pr(>F)    
## (Intercept) 74.301  1 67.5529 1.431e-08 ***
## group2       0.440  3  0.1334    0.9393    
## group4       1.259  3  0.3817    0.7671    
## Residuals   27.497 25                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(mod14, type = "III")
## Anova Table (Type III tests)
## 
## Response: wt
##             Sum Sq Df F value    Pr(>F)    
## (Intercept) 74.301  1 67.5529 1.431e-08 ***
## group2       0.440  3  0.1334    0.9393    
## group4       1.259  3  0.3817    0.7671    
## Residuals   27.497 25                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All the above methods give identical results. Phew.

Conclusions

If design of analysis is balanced, method of estimating SSQ doesn’t matter, but if unbalanced, Type I SSQ estimation will result in different estimates depending on the order of factors in the model specification.

Using Type I SSQ estimation is recommended for some types of analysis, i.e. possibly for ANCOVA where you only want to attribute variance to the grouping factor that is unable to be attributed to the continuous factor.

Using Type I SSQ estimation, a second factor may appear less significant than it actually is, if your hypothesis doesn’t assume that the second factor is subsidiary.

If running an ANCOVA, generally specify the model as the following, unless you have a very specific reason to preferentially load variance onto the grouping variable, or want to disregard multi-attributable variance:

library(car)

mod <- lm(y ~ continuous + grouping)

Anova(mod, method = "II")