--- title: "Using provenance to view the lineage of a data item" date: "June 11, 2020" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using provenance to view the lineage of a data item} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- A common question when debugging is wondering how a variable got the value that it has. This question can be answered with the `debug.lineage` function. When asking for the lineage of a variable or a plot or file output, the `debug.lineage` function will show just the lines of code that contributed either directly or indirectly to the value being queried. The `debug.lineage` function can also be used to find every variable, plot, or file that was derived from a particular variable, showing the lines of code involved in that derivation. For each data node queried, `debug.lineage` returns a data frame representing its forwards (how the data is used), or backwards (how the data was generated) lineage. Each data frame contains the following columns: * `scriptNum` The script number of the data item * `scriptName` The name of the script containing the data item * `startLine` The line number for that part of the lineage * `code` The line of code for that part of the lineage ## Usage The function signature for `debug.lineage` is: ``` debug.lineage(..., start.line = NA, script.num = 1, all = FALSE, forward = FALSE) ``` The parameters of this function are: * `...` The names of data nodes to be queried. * `start.line` The line number of the queried data nodes. * `script.num` The script number of the queried data nodes. Defaults to script number 1 (main script). * `all` If TRUE, this function returns the lineages of all data items. * `forward` If TRUE, this function returns the forwards lineage (how the data is used) instead of the backwards lineage (how the data was generated). This function may be called only after initialising the debugger using either `prov.debug`, `prov.debug.run`, or `prov.debug.file`. For example: ``` prov.debug.run("myScript.R") debug.lineage(x) debug.lineage("x", start.line = 5, script.num = 2) debug.lineage("a", b, forward = TRUE) debug.lineage(all = TRUE) ``` ## Example Let `myScript.R` be the following: ``` a <- 1 b <- 2 cc <- 4 v1 <- c(a:10) v1 <- rep(v1, b) m1 <- matrix(v1, cc) print(m1) ``` ### Backwards lineage of a variable By default, the backwards lineage of the queried object is returned when the function is called, showing how it was derived. If no start lines are specified, the backwards lineage of the last occurence of that object will be returned. For example, the result for `debug.lineage(v1)` is: ``` $v1 scriptNum scriptName startLine code 1 1 myScript.R 1 a <- 1 2 1 myScript.R 2 b <- 2 3 1 myScript.R 5 v1 <- c(a:10) 4 1 myScript.R 6 v1 <- rep(v1, b) ``` The backwards lineage for the `v1` variable at line 6 is given as it is the last occurence of that variable. The `start.line` parameter is used to specify which occurence of the queried variable the lineage should be returned for. For example, `debug.lineage("v1", start.line = 5)` will return the backwards lineage of `v1` at line 5, resulting in: ``` $v1 scriptNum scriptName startLine code 1 1 myScript.R 1 a <- 1 2 1 myScript.R 5 v1 <- c(a:10) ``` ### Forwards lineage of a variable If the `forward` parameter is set to `TRUE`, the forwards lineage will be returned instead of the default backwards lineage, showing how that object was used. If no start lines are specified, the forwards lineage of the first occurence of that object will be returned. For example, the result for `debug.lineage("v1", forward = TRUE)` is: ``` $v1 scriptNum scriptName startLine code 1 1 myScript.R 5 v1 <- c(a:10) 2 1 myScript.R 6 v1 <- rep(v1, b) 3 1 myScript.R 8 m1 <- matrix(v1, cc) 4 1 myScript.R 9 print(m1) ``` The forwards lineage for the `v1` variable at line 5 is given as it is the first occurrence of that variable. The `start.line` parameter may also be used to specify which occurence of the queried variable the forwards lineage should be returned for. Multiple objects may be queried per function call for both backwards and forwards lineages, but only 1 start line may be specified in this case.