--- title: "'reslife' Package Demo" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{reslife-demo} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( error = TRUE, collapse = TRUE, comment = "#>" ) ``` ## Load in the Package ```{r setup} library(reslife) ``` ## Demonstration of the 'reslifefsr' function This function exclusively uses flexsurvreg() objects as the source of the data. We will show how to use the function below. First, we'll start off with a simple example and show how the mean produced is equal to the integration. We run a flexsurvreg() with a single covariate for the 'gengamma.orig' distribution. We then run the reslifefsr() function, using the flexsurvreg() object. We have a life of 1, which is a scalar value at which we calculate the residual life, and specify that we only want the mean. ```{r} library(flexsurv) fs1 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc,dist = "gengamma.orig") r <- reslifefsr(obj = fs1, life = 1, type = 'mean') r ``` We can verify the accuracy of our derivation using the integrate function. ```{r} b <- exp(as.numeric(fs1$coefficients[1])) a <- exp(as.numeric(fs1$coefficients[2])) k <- exp(as.numeric(fs1$coefficients[3])) # numerical integration f <- function(x) { return(pgengamma.orig(x, shape = b, scale = a, k = k, lower.tail = FALSE)) } mx_interal <- integrate(f, 1, Inf)$value/pgengamma.orig(1, shape = b, scale = a, k = k, lower.tail = FALSE) mx_interal ``` As you can see, the mean values are the same. ### Demonstration with multiple covariates What if the flexsurvreg() object has multiple covariates? We will see this below. We will use the 'bc' dataset contained within 'flexsurv' with an added covariate 'age'. ```{r} data(bc) newbc <- bc newbc$age <- rnorm(dim(bc)[1], mean = 65-scale(newbc$recyrs, scale=FALSE),sd = 5) ``` Generating a flexsurvreg object: ```{r} fsr2 = flexsurvreg(Surv(recyrs, censrec) ~ group+age, data=newbc, dist = 'weibull') ``` We can now use the reslifefsr() function to generate residual life values. There are 686 values, since the length of the input is 686. We are only going to show 10, in order to save space. ```{r} head(reslifefsr(obj = fsr2, life= 4), 10) ``` If we want to look at the median, we can specify that. ```{r} head(reslifefsr(obj = fsr2,life= 4, type = 'median'),10) ``` We can look at all 3 types now (mean, median, and percentile). ```{r} head(reslifefsr(obj = fsr2, life= 4, p = .8, type = 'all'),10) ``` Notice how when 'all' is used, the output is a dataframe with the name of each value as the column names. But what if we want to supply our own data and make predictions? Since the 'flexsurv' regression object generates parameters, we can add our own data points and get the residual life values for each. Let's start by generating some data. This data has the same two levels as 'fsr2' above (group, age). Note that the data is placed in a data frame. The new data must be inputted as a data frame for the function to work. ```{r} group = c("Medium", 'Good', "Poor") age = c(43, 35, 39) newdata = data.frame(age, group) ``` We run that through the reslifefsr() function and get a 3 row data frame ```{r} reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata) ``` Now we can show off some features of the function, specifically how it handles different input data. Lets change the order of the variables in the input data and rerun. It will show the exact same results as above. ```{r} newdata = data.frame(group, age) reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata) ``` This is a handy feature of the function that can enhance the user experience. Another feature of this is the ability to handle data with extra columns. Let's show that feature below. ```{r} extra = c(100, 100, 100) newdata2 = data.frame(age, group, extra) reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata2) ``` The extra column 'extra' is ignored by the function. Are there scenarios where the package would return an error? Yes, in fact, there are a few. The first is if not enough data is provided, which is shown below. ```{r} newdata3 = data.frame(group) reslifefsr(obj = fsr2, life = 4,p = .6, type = 'all', newdata= newdata3) ``` Since only 'group' is provided, predictions cannot be generated. The function needs values for 'age' in order to make predictions. Another scenario could happen with factor data. The levels for 'group' are 'Good', 'Medium', and 'Poor'. But what if a level was inputted that isn't one of those three? Let's see. ```{r} group = c("Medium", 'Good', "Terrible") newdata = data.frame(age, group) reslifefsr(obj = fsr2,life= 4,p = .6, type = 'all', newdata= newdata) ``` Since 'Terrible' is not a valid level, the function errors and returns a message notifying the user of this error. It is also possible to plot the mean residual life for a set of values from a 'flexsurv' object. We take the first line from newbc and create a blank vector of length 10. We then loop the function to create the MRLs for the first 10 integers, which gives us the expected survival time given the patient has survived y years. We can then plot these values. ```{r} newdata4 = newbc[1,] mrl = rep(0,10) for (i in 1:10){ mrl[i] = reslifefsr(obj = fsr2,life= i,p = .6, type = 'mean', newdata= newdata4) } plot(x=c(1:10), y=mrl,type="l", xlab = 'Years Survived', ylab = 'Mean Residual Life', main = 'MRL over Time') ``` That is an overview of how to use the flexsurvreg() dependent function. Next, we will show how to get residual life values by inputting your own parameters. ## Demonstration of the 'residlife' function This function works similarly to the reslifefsr() function, except it doesn't rely on a flexsurvreg() input. Instead, the user inputs the name of the distribution and the parameters. The distribution must be one contained within 'flexsurv' ('gamma', 'gengamma.orig', 'weibull', 'gompertz', 'exponential', 'lnorm', 'llogis', 'gengamma', 'genf', 'genf.orig'), and the names of the parameters must match those specified for each distribution. We will show a demonstration below. The inputs for the function are simple. Values is the first argument, which are the values for which the residual life will be calculated. Values is usually a vector or a seq function. The next argument is the distribution name, 'weibull'. The last argument are the parameters, inputted as a vector. The names, shape and scale, are specified in the vector. For the examples in this vignette, we will use the output from a flexsurvreg() using the bc dataset. ```{r} fsr3 = flexsurvreg(Surv(recyrs, censrec) ~1, data=newbc, dist = 'weibull') fsr3$res ``` We use a sequence of values from 1 to 10, counting up by .5, for this example. ```{r} residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191)) ``` What if the parameter names are incorrect? The function will return an error. ```{r} residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, not_scale = 6.191)) ``` For distributions where there are different parameterizations, the function can be flexible. For example, 'gamma' has both shape/rate and shape/scale parameterizations. The function works for both, as long as the parameters are correctly specified in the input. ```{r} residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, scale = 6.191)) ``` ```{r} residlife(values = seq(1,10,.5), distribution = 'gamma', parameters= c(shape= 1.272, rate = 1/6.191)) ``` Much like the reslifefsr() function, the reslife() function has the ability to show mean, median, percentile, or all three. ```{r} residlife(values = seq(1,10,.5), distribution = 'weibull', parameters= c(shape= 1.272, scale = 6.191), p = .7, type = 'all') ``` The outputs of this function can then easily be taken and used in a plot or any other type of analysis tool. Let's look at an example of such a plot. Say we want to plot the mean residual life over time. First, we would create a vector for time, run it through the residlife() function with type = 'mean', and plot the output against time. After extracting the parameters, we can run residlife() and generate an output vector. Note that since residlife() can take a vector input, we don't have to use a loop. ```{r, fig.width=7, fig.height=5} time= seq(1,10,.5) life = residlife(values = time, distribution = 'weibull',parameters= c(shape= 1.272, scale = 6.191)) plot(time, life, xlab = 'Years Survived', ylab = 'Mean Residual Life', main = 'MRL over Time', type="l") ```