Mixed-design ANOVA : 2 between-subject factors and 1 within-subject factor

Standard

Suppose you want to examine the impact of diet and exercise on pulse rate. To investigate these issues, you collect a sample of 18 individuals and group them according to their dietary preferences: meat eaters and vegetarians. You then divide each diet category into three groups, randomly assigning each group to one of three types of exercise: exercise1, exercise2, exercise 3.  In addition to these between-subjects factors, you want to include a single within-subjects factor in the analysis. Each subject’s pulse rate will be measured at three levels of exertion: intensity1, intensity2, intensity3.

So we have 3 factors to work with:

  • Two between-subjects (grouping) factors: dietary preference and exercise type.
  • One within-subjects factor : intensity (of exertion)

This is what our data looks like. Onwards, then!

1 112 166 215 1
1 111 166 225 1
1 89 132 189 1
1 95 134 186 2
1 66 109 150 2
1 69 119 177 2
2 125 177 241 1
2 85 117 186 1
2 97 137 185 1
2 93 151 217 2
2 77 122 178 2
2 78 119 173 2
3 81 134 205 1
3 88 133 180 1
3 88 157 224 1
3 58 99 131 2
3 85 132 186 2
3 78 110 164 2

After reading in the file,  we give the columns appropriate names.

diet <- read_excel(path,col_names=F)
names(diet) <- c("subject","exercise","intensity1","intensity2","intensity3",
"diet")


Then we convert ‘exercise’,’subject’ and ‘diet’ into factors .

diet$exercise<- factor(diet$exercise)
diet$diet<- factor(diet$diet)
diet$subject <- factor(diet$subject)

For repeated measures ANOVA, the data must be in the long form . We will use the melt() form the reshape2 package to achieve this. We are now at one row per participant per condition.

diet.long <- melt(diet, id = c("subject","diet","exercise"), 
 measure = c("intensity1","intensity2","intensity3"), 
 variable.name="intensity")

At this point we’re ready to actually construct our ANOVA.

Our anova looks like this –

mod <- aov(value ~ diet*exercise*intensity + Error(subject/intensity) , 
data=diet.long)

The asterisk specifies that we want to look at the interaction between the three factors. But since this is a repeated measures design as well, we need to specify an error term that accounts for natural variation from participant to participant.

Running a summary() on our anova above  yields the following results –

2016-09-19_16-21-54

The main conclusions we can arrive at are as follows:

  • There is a significant main effect of ‘diet’ on the pulse rate. We can conclude that a statistically significant difference exists between vegetarians and meat eaters on their overall pulse rates.
  • There is a statistically significant within-subjects main effect for intensity.
  • There is a marginally statistically significant interaction between diet and intensity. We’ll look at this later.
  • The type of exercise has no statistically significant effect on overall pulse rates.

 

Let’s plot the average pulse rate as explained by diet, exercise, and the intensity.

mean_pulse1<-with(diet.long,tapply(value,list(diet,intensity,exercise),mean))
mean_pulse1
mp1 <- stack(as.data.frame(mean_pulse1))
mp1<- separate(mp1,ind,c("Intensity","Exercise"))
mp1$Diet<- rep(seq_len(nrow(mean_pulse1)),ncol(mean_pulse1))
mp1$Diet <- factor(mp1$Diet,labels = c("Meat","Veg"))
mp1$Intensity<-factor(mp1$Intensity)
mp1$Exercise<-factor(mp1$Exercise)
ggplot(mp1,aes(Intensity,values,group=Diet,color=Diet)) + geom_line(lwd=1) + xlab("Intensity of the exercise") +
 ylab("Mean Pulse Rate") + ggtitle("Mean Pulse rate - \n Exercise Intensity vs Diet") + theme_grey()+
 facet_grid(Exercise ~.)

 

 

Rplot06.png

 

The plot agrees with our observations from earlier.

 

 

UPDATE: Understanding the Results

Earlier we had rejected a null hypothesis and concluded that change in mean pulse rate across intensity levels marginally depends upon dietary preference. Now ,we will turn our attention to the study of this interaction.

We begin by plotting an interaction plot as follows:

interaction.plot(mp1$Intensity, mp1$Diet, mp1$values , type="b", col=c("red","blue"), legend=F,
 lwd=2, pch=c(18,24),
 xlab="Exertion intensity", 
 ylab="Mean pulse rate ", 
 main="Interaction Plot")

legend("bottomright",c("Meat","Veg"),bty="n",lty=c(1,2),lwd=2,pch=c(18,24),col=c("red","blue"),title="Diet")

Rplot.png

We see that the mean pulse rate increases across exertion intensity(‘trials’) : this is the within-subject effect.

Further, it’s clear that vegetarians have a lower average pulse rate than do meat eaters at every trial: this is the diet main effect.

The difference between the mean pulse rate of meat-eaters vs vegetarians is different at each exertion level. This is the result of the diet by intensity interaction.

The main effect for diet is reflected in the fact that meat-eaters had a mean pulse rate roughly 10 to 20 points higher than that for vegetarians.

The main effect of intensity is reflected in the fact  for both diet groups, the mean pulse rate after jogging increased about 50 points beyond the rate after warmup exercises, and increased another  55 points (approx.) after running.

The interaction effect of diet and intensity is reflected in the fact that the gap between the two dietary groups changes across the three intensities. But this change is not as significant as the main effects of diet and intensity.

 

 

 

That’s all,folks.

Did you find this article helpful?  Can it be improved upon ? Let me know!

Also, you can find the code here.

Until next time!

 

Advertisements

An Investigation for Determining the Optimum Length of Chopsticks : A case for Single-Factor Repeated Measures ANOVA.

Standard

In the pursuit to determine the optimum length of chopsticks, two laboratory studies were conducted, using a randomised complete block design, to evaluate the effects of the length of the chopsticks on the food-serving performance of adults and children. Thirty-one male junior college students and 21 primary school pupils served as subjects for the experiment.The response was recorded in terms of food-pinching efficiency (number of peanuts picked, and placed in a cup).

Data source https://www.udacity.com/api/nodes/4576183932/supplemental_media/chopstick-effectivenesscsv/download

The data was read in using read.csv() and allotted appropriate column names.

stick<-read.csv(file.choose(),header = TRUE,stringsAsFactors = FALSE)
chop<-stick
names(chop)<-c("Efficiency","Individual","Length")

 

Next, we’ll convert  the ‘Length’ and ‘Individual’ to factors and the dependent variable ‘Efficiency’ to numeric to carry out our repeated measures ANOVA analysis.

chop$Efficiency<-as.numeric(chop$Efficiency)
chop$Length<- factor(chop$Length)
chop$Individual<-factor(chop$Individual)

Let’s use the tapply() to find out the mean pinching efficiency grouped by the chopstick length and plot the result.

group_mean<-tapply(chop$Efficiency,chop$Length,mean)

plot(group_mean,type="p",xlab="Chopstick Length",ylab="Mean Efficiency",
 main="Average Food Pinching Efficiency \n by Chopstick Length",col="green",pch=16)

Rplot05.png

 

Visually, we can see that the efficiency grows steadily from 180 mm through to the 240  and then falling sharply at 270 mm and despite a slight increase at 300,it continues to fall at 330.

Before we can perform the repeated measures ANOVA, we will need to check for the Sphericity which assumes the homogeneity of variance among differences between all possible pairs of groups. This is done by the Mauchly’s test.

chop1<- cbind(chop$Efficiency[chop$Length==180],
 chop$Efficiency[chop$Length==210],
 chop$Efficiency[chop$Length==240],
 chop$Efficiency[chop$Length==270],
 chop$Efficiency[chop$Length==300],
 chop$Efficiency[chop$Length==330])

mauchly.test (lm (chop1 ~ 1), X = ~1)


> mauchly.test (lm (chop1 ~ 1), X = ~1)

           Mauchly's test of sphericity
           Contrasts orthogonal to
           ~1


data: SSD matrix from lm(formula = chop1 ~ 1)
W = 0.43975, p-value = 0.05969

Since, (thankfully!) the p-value is >0.05, the homogeneity of variance assumption holds and no further adjustment is necessary.

We can now get on with the repeated measures ANOVA.

aov.chop = aov(Efficiency~Length + Error(Individual/Length),data=chop)
summary(aov.chop)

Error: Individual
 Df Sum Sq Mean Sq F value Pr(>F)
Residuals 30 2278 75.92 

Error: Individual:Length
         Df Sum Sq Mean Sq F value Pr(>F) 
Length    5   106.9 21.372   5.051 0.000262 ***
Residuals 150 634.6 4.231 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The F-ratio is 5.051 and p-value is 0.000262 which means there is significant effect of the length of the chopstick on the eating efficiency.

Since, the F-ratio is significant , we can now process the post-hoc analysis to perform pair-wise comparisons.

1. Holm’s adjustment

with(chop,pairwise.t.test(Efficiency,Length,paired=T))

> with(chop,pairwise.t.test(Efficiency,Length,paired=T))

 Pairwise comparisons using paired t tests 

data: Efficiency and Length 

      180 210   240   270    300 
210 1.000 - - - - 
240 0.323 1.000 - - - 
270 1.000 0.034 0.035 - - 
300 1.000 1.000 0.198 1.000 - 
330 0.630 0.048 8.1e-05 1.000 0.435

P value adjustment method: holm

2. Bonferroni adjustment

with(chop,pairwise.t.test(Efficiency,Length,paired=T,p.adjust.method="bonferroni"))

> with(chop,pairwise.t.test(Efficiency,Length,paired=T,p.adjust.method="bonferroni"))

 Pairwise comparisons using paired t tests 

data: Efficiency and Length 

       180 210 240 270 300 
210 1.000 - - - - 
240 0.485 1.000 - - - 
270 1.000 0.036 0.040 - - 
300 1.000 1.000 0.269 1.000 - 
330 1.000 0.060 8.1e-05 1.000 0.725

P value adjustment method: bonferroni

As we can see, the Bonferroni adjustment inflates the p-values as compared to Holm’s adjustment.

3. Treatment-by-Subjects

Another method which can be employed is the Treatment-by-Subjects i.e. as a 2 factor design without interaction , without replication. The advantage of this approach is that it allows us to employ the TukeyHSD() post-hoc .

 

aov.chop1 = aov(Efficiency ~ Length + Individual, data=chop)
summary(aov.chop1)

> summary(aov.chop1)
         Df   Sum Sq Mean Sq     F value Pr(>F) 
Length     5   106.9       21.37 5.051    0.000262 ***
Individual 30 2277.5       75.92 17.944   < 2e-16 ***
Residuals 150 634.6 4.23 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 

TukeyHSD(aov.chop1,which="Length")

> tukey
 Tukey multiple comparisons of means
 95% family-wise confidence level

Fit: aov(formula = Efficiency ~ Length + Individual, data = chop)

$Length
           diff        lwr       upr        p adj
210-180 0.54870968 -0.9595748 2.05699418 0.8999148
240-180 1.38774194 -0.1205426 2.89602644 0.0904885
270-180 -0.61129032 -2.1195748 0.89699418 0.8503866
300-180 0.03290323 -1.4753813 1.54118773 0.9999999
330-180 -0.93548387 -2.4437684 0.57280063 0.4749602
240-210 0.83903226 -0.6692522 2.34731676 0.5959492
270-210 -1.16000000 -2.6682845 0.34828450 0.2346843
300-210 -0.51580645 -2.0240910 0.99247805 0.9213891
330-210 -1.48419355 -2.9924781 0.02409096 0.0565555
270-240 -1.99903226 -3.5073168 -0.49074775 0.0025803
300-240 -1.35483871 -2.8631232 0.15344579 0.1053005
330-240 -2.32322581 -3.8315103 -0.81494130 0.0002412
300-270 0.64419355 -0.8640910 2.15247805 0.8199855
330-270 -0.32419355 -1.8324781 1.18409096 0.9893780
330-300 -0.96838710 -2.4766716 0.53989741 0.4349561

By studying the difference between the means and the Tukey adjusted p-value, the most significant diff is between the groups 240-330 , with a mean difference of 2.32.

 

Conclusion:

The results show that the food-pinching performance is significantly affected by the length of the chopsticks, and that chopsticks of about 240 mm long performed the best.