--- title: "Calculating the proportion of code classified in an R file" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{calculating-proportion-of-code} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7 ) ``` ## Introduction tidycode can be used to easily classify the lines of code in an R file (e.g., as data cleaning, setup, etc.). This vignette shows how tidycode can easily be used to calculate the proportion of a total R file classified to different categories? ## Loading and setting up We will frist load the tidyverse and tidycode packages and then use the tidycode function `read_rfiles()` to read the two example files (built-in to tidycode): ```{r, message = FALSE} library(tidycode) library(dplyr) library(ggplot2) two_rfiles <- read_rfiles( tidycode_example("example_plot.R"), tidycode_example("example_analysis.R") ) ``` ## Classify the lines of code in the R files Next, we can classify the lines of code in the two rfiles saved to the object `two_rfiles`, using the `unnest_calls()` and subsequent functions as described in the [tidycode vignette](tidycode.html): ```{r} unnested_expressions <- unnest_calls(two_rfiles, expr) classified_code <- unnested_expressions %>% inner_join( get_classifications("crowdsource", include_duplicates = FALSE) ) %>% anti_join(get_stopfuncs()) %>% select(file, func, classification) ``` ## Creating a function Then, we will create a simple function that a) takes the classified code and then b) calculates the proportion of the lines of code in each file that is classified into different categories: ```{r} calc_proportion_file <- function(d) { d %>% count(file, classification) %>% group_by(file) %>% mutate(prop = n / sum(n)) } ``` ## Using the function It is easy to use the function on our classified code; just pass the classified code to it.: ```{r} proportion_of_file <- calc_proportion_file(classified_code) proportion_of_file ``` ## Visualizing classified code on a per-file basis We can also easily visualize the results: ```{r} proportion_of_file %>% ggplot(aes(x = 0, y = prop, fill = reorder(classification, prop))) + geom_bar(stat = "identity", size = 1) + scale_y_continuous(labels = scales::percent_format()) + coord_flip() + facet_wrap(~file, ncol = 1)+ labs( title = "Proportion of Code by File", y = "Proportion of Code", fill = "Classification" ) + theme( axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), strip.text = element_text(hjust=0), panel.background = element_blank(), strip.background = element_blank(), panel.grid.major.x = element_line(color="grey80") ) ``` This can become quite a large visualization if there are very many files. Thus, this approach may be more useful when trying to visualize code on on a per-file basis for a relatively small (perhaps 10-15 or fewer) files. Another approach can be scaled up to a larger number of files, as is described next. ## Visualizing classified code across files First, we'll create a function that is an analog to `calc_proportion_file()`, but for calculating the mean proportion across many files: ```{r} calc_proportion_overall <- function(d) { d %>% group_by(classification) %>% count() %>% ungroup() %>% mutate( prop = prop.table(n) ) } ``` We can use this in the same way as `calc_proportion_file()`, passing `classified_code` as the sole argument: ```{r} proportion_overall <- calc_proportion_overall(classified_code) proportion_overall ``` These results can be visualized as follows: ```{r} proportion_overall %>% ggplot() + geom_bar(aes(x = reorder(classification, prop), y = 1), stat = "identity", fill = "grey80") + geom_bar(aes(x = reorder(classification, prop), y = prop, fill = prop), stat = "identity")+ geom_text(aes(x = reorder(classification, prop), y = prop, label = paste0(round(prop * 100, digits = 0), "%"), hjust = -.5)) + scale_y_continuous(labels = scales::percent_format()) + coord_flip() + labs( title = "Overall Proportion of Code", y = "Proportion of Code", x = "Classification" ) + theme( panel.background = element_blank(), legend.position = "none" ) ```