Question 1: Identify variables in the experiment
A1: The indendendent variable is the type of the list of words given to the subjects - congruent or incongruent. The dependent variable is the time it took the subjects to name the ink colors of the words in the lists.The time is presumed to be measured in seconds. Unit of observation is the individual subject participating in the test.
Question 2: Establish a hypothesis and statistical test
A2a:
Null hypothesis (H0): µ_congruent = µ_incongruent
Alternative hypothesis (H1): µ_congruent < µ_incongruent
Where “µ” is the real and unknown average time it takes people in general (not just in our sample but in the whole world) to name the ink colors from each list. This “µ” is an unseen parameter, not to be confused with X_bar which is the observed sample mean statistic.
My null hypothesis (H0) is that there there is no difference in the time it takes people (the population in general, not our sample) to name the ink colors for the two lists. My alternative hypothesis (H1) is one-sided - that it takes a longer time for people to name the ink colors from the incongruent list because the mismatch between the two messages (the meaning of the word and the color of the ink it was written with) would lead to more mental activity and, therefore, longer processing time.
A2b:
Given that the observations (subjects) are independent of each other, a parametric test of means is appropriate.
Since each subject was given two treatments, this calls for a paired test of means. Because my alternative is in one direction, the test will be one sided (“greater”).
Why did I choose to do a T-test instead of a Z-test?
Because T-test is more appropriate when a). sample size is small - only 24 observations in our case. b). because we have to use the sample standard deviation as an estimate of the unknown (and impossible to get) population standard deviation that is needed to calculate our test statistic.
#R-code
setwd('C:\\Users\\rf\\Google Drive\\Education\\R\\Udacity\\stroop')
#Load all libraries on top
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape))
#Manual input required - the name of the file
tinput = 'stroopdata.csv'
t <- read.csv(tinput)
# Data Preparation
# Add a row id, rename columns, calculate averages, variances, differences.
colnames(t) = c('con','inc')
t$id = 1:nrow(t)
t$d = t$inc - t$con
attach(t)
#print (summary(t))
conmu = round(mean(con),4)
incmu = round(mean(inc),4)
convar = round(var(con),4)
incvar = round(var(inc),4)
dmu = round(mean(d),4)
dvar = round(var(d),4)
Question 4: Plot the data
A4: please see the output from R below
#melt first for the histogram
t2 = melt(t, id=(c("id")))
t3 = subset(t2,(variable !='d'))
hist1 = ggplot(data = t3, aes(x = value, fill=variable)) +
geom_histogram(binwidth = 1, position="dodge") +
ylab('Count of subjects') +
xlab('Time to name the ink colors (seconds)') +
ggtitle('Histograms of the completion time for each list')
print (hist1)

Q6. What does the Stroop test really prove?
A6: The human brain is a very complex organic computer. Like any other computer, it is not immune to interferences, distractions, exhaustion. What the Stroop effect shows is that conflicting information affects the brain’s ability to respond and complete tasks. The more complex and contradictory the information, the more mental effort and time it takes to process and reconcile the information in order to complete the tasks.
One example that I have sometimes found myself in is the wrong names on the “hot” and “cold” faucets. It takes considerable effort to re-program the brain to turn on the “cold” faucet when what you really want is hot water. In general, pretty much any multi-tasking involving different projects would lead to similiar delays in response as well.
Comments: Pretty obvious that the completion time for the incongruent list is “right-shifted” in comparison to the congruent list.