--- title: "Apply any R function on rolling windows" author: "Dawid Kałędkowski" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Apply any R function on rolling windows} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Using runner `runner` package provides functions applied on running windows. The most universal function is `runner::runner` which gives user possibility to apply any R function `f` on running windows. Running windows are defined for each data window size `k`, `lag` with respect to their indexes. Unlike other available R packages, `runner` supports any input and output type and also gives full control to manipulate window size and lag/lead. There are different kinds of running windows and all of them are implemented in `runner`. ### Cumulative windows The simplest window type which is similar to `base::cumsum`. At each element window is defined by all elements appearing before current. ![](images/cumulative_windows.png) In `runner` this can be achieved as simple by: ```{r eval=FALSE} library(runner) # full windows runner(1:15) # summarizing - sum runner( 1:15, f = sum ) # summarizing - concatenating runner( letters[1:15], f = paste, collapse = " > " ) ``` ### Constant sliding windows Second type of windows are these commonly known as running/rolling/moving/sliding windows. This types of windows moves along the index instead of cumulating like a previous one. Following diagram illustrates running windows of length `k = 4`. Each of 15 windows contains 4 elements (except first three). ![](images/running_windows_explain.png) To obtain constant sliding windows one just needs to specify `k` argument ```{r eval=FALSE} # summarizing - sum of 4-elements runner( 1:15, k = 4, f = sum ) # summarizing - slope from lm df <- data.frame( a = 1:15, b = 3 * 1:15 + rnorm(15) ) runner( x = df, k = 5, f = function(x) { model <- lm(b ~ a, data = x) coefficients(model)["a"] } ) ``` ### Windows depending on date By default `runner` calculates on assumption that index increments by one, but sometimes data points in dataset are not equally spaced (missing weekends, holidays, other missings) and thus window size should vary to keep expected time frame. If one specifies `idx` argument, than running functions are applied on windows depending on date rather on a sequence 1-n. `idx` should be the same length as `x` and should be of type `Date`, `POSIXt` or `integer`. Example below illustrates window of size `k = 5` lagged by `lag = 1`. Note that one can specify also `k = "5 days"` and `lag = "day"` as in `seq.POSIXt`. In the example below in square brackets ranges for each window. ![](images/running_date_windows.png) ```{r eval=FALSE} idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48) # summarize - mean runner::runner( x = idx, k = 5, # 5-days window lag = 1, idx = idx, f = function(x) mean(x) ) # use Date or datetime sequences runner::runner( x = idx, k = "5 days", # 5-days window lag = 1, idx = Sys.Date() + idx, f = function(x) mean(x) ) # obtain window from above illustration runner::runner( x = idx, k = "5 days", lag = 1, idx = Sys.Date() + idx ) ``` ### running at Runner by default returns vector of the same size as `x` unless one puts any-size vector to `at` argument. Each element of `at` is an index on which runner calculates function. Example below illustrates output of runner for `at = c(13, 27, 45, 31)` which gives windows in ranges enclosed in square brackets. Range for `at = 27` is `[22, 26]` which is not available in current indices. ![](images/runner_at_date.png) ```{r eval=FALSE} idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48) # summary runner::runner( x = 1:15, k = 5, lag = 1, idx = idx, at = c(18, 27, 48, 31), f = mean ) # full window runner::runner( x = idx, k = 5, lag = 1, idx = idx, at = c(18, 27, 48, 31) ) ``` `at` can also be specified as interval of the output defined by time interval which results in obtaining results on following indices `seq(min(idx), max(idx), by = "