Fitting impulse~response models in R

Introduction
In this blog’s very first post/tutorial I’ll show how to model the performance of several subjects using the Banister impulse~response (IR) model. Before starting I highly recommend checking out this fantastic resource by Clarke & Skiba (2013), which I’ll be referencing a lot throughout this post. I also urge you to check out these posts ( here, and here) by Mladen Jovanovic, where he shares some very creative ideas on how to use Banister modelling for both performance and injury prediction. Lastly, the data we’ll be using can be found in this excellent article by Thierry Busso (2017), who has done a lot of research into the modelling of the dose-response relationship between training and performance. A massive thanks to him and his contribution to open science!
The Banister impulse~response model
\(p(t) = p(0) + k_{1}\sum\limits_{s=0}^{t-1}e^{-(t-s)/t_1}w(s) + k_{2}\sum\limits_{s=0}^{t-1}e^{-(t-s)/t_2}w(s)\)
In simple terms, the Banister IR model quantitatively relates performance ability at a specific time to the cumulative effects of prior training loads (Taha & Thomas, 2003). Though the above equation looks rather intimidating at first, it’s actually quite easy to grasp once put in layman’s terms. First and foremost, the model posits that training will have both a positive (fitness) and a negative training effect (fatigue). Performance at a given time is thus simply the sum of base level performance plus the difference between the accumulated fitness and the accumulated fatigue. Other than being conceptually easy to understand, the model is also able to capture several phases related to the training process, including overreaching, plateau, taper, and detraining (Clarke & Skiba, 2013). Using athlete-specific data the model is easily fitted to individual athletes, making it a useful tool in the planning of training.
Fitting the model
In contrast to the black box structure of neural networks, which I’ll cover in a separate post, the Banister IR model has five adjustable parameters. These include the initial performance level \(p(0)\), two time constants that describe the decay of the fitness and fatigue components \(k_1\), \(k_2\), and two gain parameters \(\tau_1\), \(\tau_2\) that relate to how the training load determines the amplitude of fitness and fatigue. As showed by Clarke and Skiba (2011), these parameters can be easily fitted to data using nonlinear regression, which involves iteratively changing the parameters until the error between the model and the data is minimized. In the example below, we’re going to do this using R’s optim function.
Preparing the data
To start off, let’s review the data:
str(banister.data)
## 'data.frame': 109 obs. of 13 variables:
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Training.dose.Subject.1..tu.: num 0 100 100 0 100 ...
## $ Performance.Subject.1..Watt.: int NA 253 252 NA 245 NA NA 254 NA 258 ...
## $ Training.dose.Subject.2..tu.: num 0 100 0 100 0 ...
## $ Performance.Subject.2..Watt.: int NA 240 NA 236 NA 250 NA NA 244 NA ...
## $ Training.dose.Subject.3..tu.: num 0 0 100 100 0 ...
## $ Performance.Subject.3..Watt.: int NA NA 296 293 NA 284 NA NA 286 NA ...
## $ Training.dose.Subject.4..tu.: num 0 0 100 100 0 ...
## $ Performance.Subject.4..Watt.: int NA NA 300 291 NA 288 NA NA 296 299 ...
## $ Training.dose.Subject.5..tu.: num 0 0 100 100 0 ...
## $ Performance.Subject.5..Watt.: int NA NA 332 324 NA 332 NA NA 326 NA ...
## $ Training.dose.Subject.6..tu.: num 0 0 100 100 0 ...
## $ Performance.Subject.6..Watt.: int NA NA 218 231 NA 233 NA NA 232 NA ...
As can be seen above, the data consists of training and testing data of six healthy male subjects over 15 weeks. The performance variable in this instance is the average power output over a 5-minute maximal cycling effort, while the training itself has been recorded in arbitrary units. Furthermore, the data has been coerced into a data frame with 13 columns and 109 observations. One column lists the number of days, while each consecutive column shows each participant’s performance and training dose. To make this data a little easier to work with we’re going to start by coercing it into six individual lists, one for each subject. This can easily be done by first subsetting the columns Training dose and Performance into two separate vectors. Next, we’re going to use a for loop to create six individual lists, each containing a data frame with a column for Day, Training.Load, and Performance. This can be done as follows:
training <- banister.data[seq(2, 13, by = 2)] #Subsets all the 'training dose' columns
performance <- banister.data[seq(3, 13, by = 2)] #Subsets all the 'performance' columns
banister.list <- vector("list", 6) #Creates an empty list vector
for (i in 1:6) {
banister.list[i] <- list(cbind(banister.data[1], training[i], performance[i]))
}
names(banister.list) <- paste("Subject", 1:6, sep ="") #Sets the name of each data frame in the list to 'Athlete1', 2, etc
banister.list <- lapply(banister.list, function(x) { #Sets the name of each column within the data frames
colnames(x) <- c("Day", "Training.Load", "Performance")
return(x)
})
str(banister.list)
## List of 6
## $ Subject1:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 100 100 0 100 ...
## ..$ Performance : int [1:109] NA 253 252 NA 245 NA NA 254 NA 258 ...
## $ Subject2:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 100 0 100 0 ...
## ..$ Performance : int [1:109] NA 240 NA 236 NA 250 NA NA 244 NA ...
## $ Subject3:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 0 100 100 0 ...
## ..$ Performance : int [1:109] NA NA 296 293 NA 284 NA NA 286 NA ...
## $ Subject4:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 0 100 100 0 ...
## ..$ Performance : int [1:109] NA NA 300 291 NA 288 NA NA 296 299 ...
## $ Subject5:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 0 100 100 0 ...
## ..$ Performance : int [1:109] NA NA 332 324 NA 332 NA NA 326 NA ...
## $ Subject6:'data.frame': 109 obs. of 3 variables:
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 0 100 100 0 ...
## ..$ Performance : int [1:109] NA NA 218 231 NA 233 NA NA 232 NA ...
Writing the IR function
Now that the data is neatly structured, we can start thinking about ways to write the IR function. If we look at the R Documentation for the optim function (which can be done by typing ?optim in the console), we can see that the first argument of our IR function needs to contain a vector with the parameters we want to optimize for. In our case this will be the five adjustible parameters associated with the IR model: the initial performance level \(p(0)\), the two time constants (\(k_1\), \(k_2\)), and the two gain parameters (\(\tau_1\), \(\tau_2\)). Also, although I referenced the summation equation of the IR model previously, we’re going to follow the example of Clarke & Skiba (2013) and use a set of recursion equations instead. These can be seen in the code chunk below. Lastly, to fit a model for each subject we need to minimize the error between predicted performance and actual performance. To do this we’ll use the sum of squared errors as the metric to be minimised.
banister.function <- function (v, Training.Load, Performance) { #The function takes three inputs...
p0 <- v[1]; k1 <- v[2]; k2 <- v[3]; tau1 <- v[4]; tau2 <- v[5]
Fitness <- 0
Fatigue <- 0
for (i in 1:length(Training.Load)) {
Fitness[i+1] <- Fitness[i] * exp(-1/tau1) + Training.Load[i+1] #Recursion equations
Fatigue[i+1] <- Fatigue[i] * exp(-1/tau2) + Training.Load[i+1]
Predicted.Performance <- p0 + k1*Fitness - k2*Fatigue
}
errors <- Performance[!is.na(Performance)] - Predicted.Performance[which(!is.na(Performance))]
SSE <- sum(errors^2)
return(SSE) #...and returns the sum of squared errors
}
Optimizing parameters for each subject
Now that we have created our IR function we can use lapply to apply said function over the data for each subject. This in turn will create six optimized sets of parameters, one for each subject. However, before running the function, we have to set our initial parameter values. In this case, I’ve used \(p(0) = 256\), \(k_1 = 0.10\), \(k_2 = 0.10\), \(\tau_1 = 15\), and \(\tau_2 = 11\).
banister.optimised <- vector("list", 6)
for (j in 1:6) {
banister.optimised[[j]] <- optim(par = c(256, 0.10, 0.10, 15, 11), fn = banister.function, Training.Load = banister.list[[j]][,2], Performance = banister.list[[j]][,3])
}
names(banister.optimised) <- paste("Subject", 1:6, sep = "")
str(banister.optimised[1])
## List of 1
## $ Subject1:List of 5
## ..$ par : num [1:5] 256.95 0.137 0.135 14.373 11.382
## ..$ value : num 3145
## ..$ counts : Named int [1:2] 501 NA
## .. ..- attr(*, "names")= chr [1:2] "function" "gradient"
## ..$ convergence: int 1
## ..$ message : NULL
The output above shows a list of components generated by the optim function for Subject1. Two of these components are of special interest to us: par and value. The par component shows the optimized parameter values for the given athlete, while the value component shows the sum of squared errors. Let’s create a new list with these components alongside Day, Training.Load, and Performance.
banister.parameters <- vector("list", 6)
for (i in 1:6) {
banister.parameters[[i]] <- list("Day" = banister.list[[i]][,1], "Training.Load" = banister.list[[i]][,2], "Performance" = banister.list[[i]][,3], "Parameters" = banister.optimised[[i]]$par, "SSE" = banister.optimised[[i]]$value)
}
names(banister.parameters) <- paste("Subject", 1:6, sep = "")
str(banister.parameters[1])
## List of 1
## $ Subject1:List of 5
## ..$ Day : int [1:109] 1 2 3 4 5 6 7 8 9 10 ...
## ..$ Training.Load: num [1:109] 0 100 100 0 100 ...
## ..$ Performance : int [1:109] NA 253 252 NA 245 NA NA 254 NA 258 ...
## ..$ Parameters : num [1:5] 256.95 0.137 0.135 14.373 11.382
## ..$ SSE : num 3145
Creating models
Now that we have opimized each athletes parameters, and organised the metrics into neatly organised lists, we can use the magic that is lapply to create six individual models.
banister.models <- lapply(banister.parameters, function(x) {
p0 <- x$Parameters[1]; k1 <- x$Parameters[2]; k2 <- x$Parameters[3]; tau1 <- x$Parameters[4]; tau2 <- x$Parameters[5]
Fitness <- 0
Fatigue <- 0
for (i in 1:length(x$Training.Load)) {
Fitness[i+1] <- Fitness[i] * exp(-1/tau1) + x$Training.Load[i+1]
Fatigue[i+1] <- Fatigue[i] * exp(-1/tau2) + x$Training.Load[i+1]
Predicted.Performance <- p0 + k1*Fitness - k2*Fatigue
}
Errors <- x$Performance[!is.na(x$Performance)] - Predicted.Performance[which(!is.na(x$Performance))]
SSE <- sum(Errors^2)
R.2 <- (cor(x$Performance[!is.na(x$Performance)], Predicted.Performance[which(!is.na(x$Performance))]))^2
return(list("Day" = x$Day, "Training.Load" = x$Training.Load, "Performance" = x$Performance, "Predicted.Performance" = Predicted.Performance[!is.na(Predicted.Performance)], "SSE" = SSE ,"R.2" = R.2))
})
As shown below the individual \(r^2\) values ranges from 0.86 to 0.95, indicating that the models fit nicely to the data.
## $Subject1
## [1] 0.9305551
##
## $Subject2
## [1] 0.8805885
##
## $Subject3
## [1] 0.9472296
##
## $Subject4
## [1] 0.8595214
##
## $Subject5
## [1] 0.9485933
##
## $Subject6
## [1] 0.9060418
Visualization
Now that we have six nicely fitted models we can start thinking about ways to visualize them. A great package to do this is ggplot, however ggplot only take dataframes as inputs. We therefore first have to transform our lists into dataframes. We’re also going to create a new variable called ‘Week’ to visualize the weekly progression in performance rather than day by day. To do this we’re going to use dplyr:
banister.models <- lapply(banister.models, function(x) {
x$Week <- x$Day/7
return(x) #Creates a variable called 'Week'
})
banister.df <- plyr::ldply(banister.models, data.frame)
colnames(banister.df)[colnames(banister.df)==".id"] <- "Subject"
banister.plots <- ggplot(banister.df, aes(x = Week, y = Predicted.Performance)) +
geom_line(aes(y = Predicted.Performance, colour = "black"), size = 1) +
geom_point(aes(y = Performance, colour = "red"), shape = 1) +
scale_color_manual("", values = c("black", "red"), labels = c("Predicted Performance", "Actual Performance")) +
guides(colour = guide_legend(override.aes = list(linetype = c("solid", "blank"), shape = c(NA, 1)))) +
ylab("Performance") +
xlab("Week") +
scale_x_continuous(breaks = seq(0,16,2)) +
geom_label(data = banister.df, aes(x = 2, y = 400, label = paste("italic(R) ^ 2 == ", round(R.2, 2))), parse = TRUE, size = 3) +
theme_minimal() +
facet_wrap(~ Subject, ncol = 2)
banister.plots
Final notes
And there you have it. The plot above shows how each subject’s performance capacity changes over time as a function of training. This underlines the usefuleness of the original Banister IR model as an excellent gateway into performance modelling and as a useful pedagogical tool within sports and exercise science (Clarke & Skiba, 2013). In later blog posts I’ll show a couple of extensions to the IR and model, and maybe compare it to some modern neural networks. Stay tuned.
References
Busso, T. (2017). From an indirect response pharmacodynamic model towards a secondary signal model of dose-response relationship between exercise training and physical performance. Scientific Reports, 7, 40422.
Clarke, D. C., & Skiba, P. F. (2013). Rationale and resources for teaching the mathematical modeling of athletic training and performance. Advances in Physiology Education, 37(2), 134–152.
Taha, T., & Thomas, S. G. (2003). Systems modelling of the relationship between training and performance. Sports Medicine, 33(14), 1061–1073.