tree-tracker-report/report.Rmd

---
title: "Assessment of Public Engagement with the 25 Million Trees Initiative"
author:
  - name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
    affiliation: Office of Information Technology Services
  - name: Annabel Gregg <annabel.gregg@dec.ny.gov>
    affiliation: Department of Environmental Conservation
date: "`r format(Sys.Date(), '%B %d, %Y')`"
abstract: >
  This report summarizes survey findings related to the 25 Million Trees Initiative,
  analyzing public awareness, engagement, and regional participation across New York State.
keywords: "urban forestry, public survey, tree planting, New York State, environmental engagement"
output:
  html_document:
    toc: true
    toc_depth: 1
    toc_float: true
    number_sections: false
    css: custom.css
    code_folding: hide
lang: en
geometry: margin=1in
---

```{r setup, include=FALSE}
# Global setup
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE
)

set.seed(123)  # For reproducibility

# Load tidyverse and additional necessary libraries
library(tidyverse)
library(sf)
library(tigris)
library(scales)
library(RColorBrewer)
library(viridis)
library(lubridate)

# Define file paths
survey_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
locations_pt_path <- "data/location_points_1.csv"
locations_poly_path <- "data/location_polygons_2.csv"
participants_path <- "data/participant_organizations_3.csv"
species_path <- "data/species_planted_4.csv"
vendors_path <- "data/vendors_5.csv"

# Check for expected files
stopifnot(file.exists(survey_path))
stopifnot(file.exists(locations_pt_path))
stopifnot(file.exists(locations_poly_path))
stopifnot(file.exists(participants_path))
stopifnot(file.exists(species_path))
stopifnot(file.exists(vendors_path))

# Load survey and related datasets
survey_data <- read_csv(survey_path)
location_points <- read_csv(locations_pt_path)
location_polygons <- read_csv(locations_poly_path)
participant_organizations <- read_csv(participants_path)
species_planted <- read_csv(species_path)
vendors <- read_csv(vendors_path)

# Convert relevant date columns to datetime format and recode planting agency responses to standardized labels
survey_data <- survey_data %>%
  mutate(CreationDate = mdy_hms(CreationDate)) %>%
  mutate(`Start Date of Planting (Required)` = mdy_hms(`Start Date of Planting (Required)`)) %>%
  mutate(`End Date of Planting (Required)` = mdy_hms(`End Date of Planting (Required)`)) %>%
  mutate(`Who Planted The Tree(s)? (Required)` = recode(`Who Planted The Tree(s)? (Required)`,
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional"))

# Derive and apply submission date from 'Survey ID' to replace 'CreationDate' when available
survey_data <- survey_data %>%
  mutate(
    Submitted_Date_Str = str_extract(`Survey ID`, "Submitted: \\d{2}-\\d{2}-\\d{2}") %>%
                         str_remove("Submitted: "),

    Submitted_Date_Str = if_else(
      !is.na(Submitted_Date_Str),
      paste0("20", Submitted_Date_Str),  # add "20" prefix to "24-11-07"
      NA_character_
    ),

    Submitted_Date = as.Date(Submitted_Date_Str, format = "%Y-%m-%d"),

    CreationDate = if_else(
      !is.na(`Survey ID`) & str_trim(`Survey ID`) != "",
      Submitted_Date,
      CreationDate
    )
  ) %>%
  select(-Submitted_Date_Str, -Submitted_Date)

# Count and exclude flagged records, then summarize total and used counts, and determine submission date range
excluded_count <- survey_data %>% filter(`Exclude Result` == 1) %>% nrow()
survey_data <- survey_data %>%
  filter(`Exclude Result` == 0)
used_count <- survey_data %>% filter(`Exclude Result` == 0) %>% nrow()
total_records <- excluded_count + used_count
start_date_submission <- format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")
end_date_submission <- format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")
```

---
subtitle: "`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")` to  `r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`."
---

# Report Overview {.tabset}

[Back to Top](#)

## Key Findings
```{r key-findings-summary}
kf_date_planting_start <- format(min(survey_data$`Start Date of Planting (Required)`, na.rm = TRUE), "%B %d, %Y")
kf_date_planting_end   <- format(max(survey_data$`End Date of Planting (Required)`, na.rm = TRUE), "%B %d, %Y")
kf_total_trees <- format(sum(survey_data$`Number of Trees Planted (Required)`), big.mark = ",")

kf_region_total_trees_ranked <- survey_data %>%
  group_by(Region) %>%
  summarise(Total_Trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE)) %>%
  arrange(desc(Total_Trees))

kf_participant_total_trees_ranked <- survey_data %>%
  group_by(`Who Planted The Tree(s)? (Required)`) %>%
  summarise(Total_Trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE)) %>%
  arrange(desc(Total_Trees))

kf_dac_total_trees <- sum(survey_data$`Number of Trees Planted (Required)`[!is.na(survey_data$`Disadvantaged Communities Indicator`)], na.rm = TRUE)
kf_dac_percent <- (kf_dac_total_trees / sum(survey_data$`Number of Trees Planted (Required)`, na.rm = TRUE)) * 100
kf_dac_percent_display <- round(kf_dac_percent, 1)

kf_generic_tree_type_ranked <- species_planted %>%
  filter(!is.na(`Generic Type of Tree (Optional)`)) %>%
  count(`Generic Type of Tree (Optional)`, name = "Survey_Count") %>%
  arrange(desc(Survey_Count))

kf_most_common_generic_tree_type <- kf_generic_tree_type_ranked$`Generic Type of Tree (Optional)`[1]
kf_most_common_generic_tree_type_count <- kf_generic_tree_type_ranked$Survey_Count[1]
kf_most_common_generic_tree_type_count_formatted <- format(kf_most_common_generic_tree_type_count, big.mark = ",")

kf_total_surveys <- nrow(survey_data)
kf_total_surveys_formatted <- format(kf_total_surveys, big.mark = ",")

kf_unique_counties <- n_distinct(survey_data$County)
kf_unique_counties_formatted <- format(kf_unique_counties, big.mark = ",")

kf_unique_municipalities <- n_distinct(survey_data$Municipality)
kf_unique_municipalities_formatted <- format(kf_unique_municipalities, big.mark = ",")
```

Between **`r kf_date_planting_start` and `r kf_date_planting_end`**, a total of **`r kf_total_trees` trees** were reported planted across **New York State** through the 25 Million Trees by 2033 Initiative.

These efforts reflect broad collaboration between **municipal governments**, **community organizations**, **private landowners**, and other stakeholders.

- **Most Trees Planted**: The highest number of trees were reported in **`r kf_region_total_trees_ranked$Region[1]`**, followed by **`r kf_region_total_trees_ranked$Region[2]`**.
- **Top Planting Groups**: The most trees, approximately **`r scales::comma(kf_participant_total_trees_ranked$Total_Trees[1])`**, were planted by **`r kf_participant_total_trees_ranked$"Who Planted The Tree(s)? (Required)"[1]`**, followed by **`r kf_participant_total_trees_ranked$"Who Planted The Tree(s)? (Required)"[2]`**, which contributed **`r scales::comma(kf_participant_total_trees_ranked$Total_Trees[2])`** trees.
- **Disadvantaged Communities**: Approximately **`r kf_dac_percent_display`%** of all trees were planted in **Disadvantaged Communities**, as defined by New York State’s Climate Act.
- **Most Reported Tree Genus**: **`r kf_most_common_generic_tree_type`** appeared most frequently, reported in **`r kf_most_common_generic_tree_type_count_formatted`** surveys.
- The project received data from **`r kf_total_surveys_formatted` unique surveys**, representing **`r kf_unique_counties_formatted` counties** and **`r kf_unique_municipalities_formatted` municipalities**.

These findings help track progress toward equity-centered climate goals, highlight areas of strong participation, and support data-driven planning for future tree planting across the state.

## Background

To track progress, the **New York State Department of Environmental Conservation (DEC)** launched the **Tree Tracker**, a public-facing survey tool built on the  ***ArcGIS Survey123***. It allows individuals and organizations to submit information about tree planting efforts, including species, quantity, and location. These submissions feed into a real-time dashboard that maps tree planting activities across the state.

This report analyzes data submitted through the Tree Tracker, offering insights into participation patterns, planting trends, and geographic distribution. The findings are intended to inform DEC staff and leadership in strategic decision-making and program outreach.

## Purpose & Objectives

This report aims to:

-   Summarize overall participation and progress toward the 25 million tree goal.
-   Analyze tree planting data by location, timeframe, and participant type.
-   Identify gaps or trends to inform outreach and support efforts.

By understanding planting behavior statewide, DEC can better allocate resources and strengthen engagement with communities and partners.

## Survey Period & Exclusions

This analysis covers submissions from **`r start_date_submission`** to **`r end_date_submission`**, totaling **`r total_records`** records. Of these, **`r used_count`** records were deemed valid and included in the analysis.

### Excluded Records

**`r excluded_count`** records were removed based on the `Exclude Result` field (`1` = exclude). Common reasons for exclusion:

-   **Duplicate Submissions**: Entries identified as double-counts.
-   **Test Entries**: Data flagged as internal testing or mock submissions.

Only submissions marked with `0` in the `Exclude Result` field were included in the analysis.

## Validation & Data Consistency

To ensure data reliability, multiple validation checks were applied:

### Required Fields
-   **Who Planted the Tree(s)?**
-   **Number of Trees**
-   **Start and End Dates of Planting**
-   **Geographic Location** (latitude/longitude)

### Data Validation
-   **Geographic Validation**: Coordinates were cross-checked with administrative boundaries to assign locality, county, and region. Records with unresolved locations were reviewed manually before inclusion.
-   **Temporal Logic**: Planting dates were required to occur within the initiative timeline and in chronological order.
-   **Optional Field Checks**: Formats (e.g., email addresses) were validated for consistency even if fields were not mandatory.

These validation processes enhance the accuracy and interpretability of the data, ensuring the results reflect genuine community contributions to the initiative.

# Submission Analysis {.tabset}

[Back to Top](#)

## Day of Week

The chart below shows the distribution of survey submissions by day of the week. It reveals which days respondents were most likely to submit entries, offering insight into user behavior that could inform outreach timing.

```{r func-create_histogram, echo=TRUE, message=FALSE}
create_histogram <- function(data, field, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Count",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  # Default color palette
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  # Set default labels if not provided
  if (is.null(title)) title <- paste("Distribution of", field,)
  if (is.null(x_title)) x_title <- field

  # Plot
  ggplot(data, aes(x = .data[[field]])) +
    geom_bar(fill = primary_color, color = secondary_color, stat = "count") +
    # Add text labels, but conditionally hide small bars if specified
    (if (show_labels) geom_label(stat = "count", aes(label = scales::comma(after_stat(count))),
      position = position_stack(vjust = 0.5),
      color = accent_color, size = 5,
      fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(
      title = title,
      x = x_title,
      y = y_title
    ) +
    # Customize x-axis labels
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}
```

```{r create-histogram-day-of-week, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data %>%
  mutate(DayOfWeek = factor(weekdays(CreationDate),
    levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))) %>%
create_histogram(
  survey_data,
  field = "DayOfWeek",
  title = "Submissions by Day of the Week",
  x_title = "Day",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

## 30 Day Trend

The following plot displays the number of survey submissions recorded each day over the past 30 days. It highlights short-term trends in participation and identifies periods of high or low activity. A smoothed trend line (dashed) has been added to help visualize patterns.

```{r func-plot_submission_trends, echo=TRUE}
plot_submission_trends <- function(data, days_ago = 30, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                                   title = NULL, subtitle = NULL, x_title = "Submission Date", y_title = "Number of Submissions") {

  # Default title and subtitle
  if (is.null(title)) title <- "Survey Submission Trends by Date"
  if (is.null(subtitle)) subtitle <- paste("Tracking submissions for the last", days_ago, "days")

  # Calculate the start date (days_ago days before today)
  start_date <- Sys.Date() - days_ago

  # Filter the data based on the calculated start date (up to today)
  submission_trends <- data %>%
    filter(CreationDate >= start_date) %>%
    group_by(CreationDate) %>%
    summarize(submissions = n())

  # Create the plot with the new theming
  ggplot(submission_trends, aes(x = CreationDate, y = submissions)) +
    geom_line(color = color_palette[1], linewidth = 1) +  # Line color from palette
    geom_point(color = color_palette[1], size = 3, shape = 16) +  # Points for visibility
    labs(
      title = title,
      subtitle = subtitle,
      x = x_title,
      y = y_title
    ) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold", size = 16, color = color_palette[1]),
      plot.subtitle = element_text(hjust = 0.5, size = 12, color = "grey40"),
      axis.title.x = element_text(color = color_palette[1], size = 12),
      axis.title.y = element_text(color = color_palette[1], size = 12),
      axis.text = element_text(color = color_palette[1], size = 10),
      panel.grid.major = element_line(color = color_palette[3], linewidth = 0.2),
      panel.grid.minor = element_line(color = color_palette[3], linewidth = 0.1),
      panel.background = element_rect(fill = color_palette[3]),
      axis.text.x = element_text(angle = 45, hjust = 1),
      axis.text.y = element_text(size = 10, color = color_palette[1])
    ) +
    # Add a smoothed trend line (loess)
    geom_smooth(method = "loess", color = color_palette[4], linewidth = 1, linetype = "dashed")
}
```

```{r plot-submission-trends-30d, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data$CreationDate <- as.Date(survey_data$CreationDate)
plot_submission_trends(survey_data,
                       days_ago = 30,
                       color_palette = c(
                         "#233f2b", # primary
                         "#7e9084", # secondary
                         "#d9e1dd", # tertiary
                         "#face00"  # accent
                       ))
```

## 90 Day Trend

This chart presents submission trends over the last 90 days. It provides a broader view of participation patterns, helping to identify sustained surges, lulls, or seasonal effects. The dashed line indicates a smoothed average trend over time.

```{r plot-submission-trends-90d, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data$CreationDate <- as.Date(survey_data$CreationDate)
plot_submission_trends(survey_data,
                       days_ago = 90,
                       color_palette = c(
                         "#233f2b", # primary
                         "#7e9084", # secondary
                         "#d9e1dd", # tertiary
                         "#face00"  # accent
                       ))
```

## Optional Question Response Rates

The table below summarizes response rates for selected optional questions from the survey. For each field, the response rate represents the percentage of participants who provided a valid response. For most fields, any non-missing value is considered a response. However, for **"Total Number of Species Planted"**, only values greater than zero are treated as valid responses.

```{r func-calculate_response_rates, echo=TRUE, message=FALSE}
# Function to calculate response rates for selected fields
calculate_response_rates <- function(survey_data, fields, caption) {
  # Calculate the response rate for each field
  response_rates <- sapply(fields, function(field) {
    if (field == "Total Number of Species Planted") {
      # For "Total Number of Species Planted", consider answered if value is greater than 0
      sum(survey_data[[field]] > 0, na.rm = TRUE) / nrow(survey_data) * 100
    } else {
      # For other fields, check for non-NA values
      sum(!is.na(survey_data[[field]])) / nrow(survey_data) * 100
    }
  })

  # Round the response rates to 2 decimal places
  response_rates_rounded <- round(response_rates, 2)

  # Sort the response rates in descending order (highest to lowest)
  sorted_response_rates <- sort(response_rates_rounded, decreasing = TRUE)

  # Create a clean data frame with the field names and their response rates
  response_rate_table <- data.frame(
    "Field" = names(sorted_response_rates),
    "Response Rate (%)" = sorted_response_rates,
    stringsAsFactors = FALSE  # Ensure the "Field" column is treated as character, not factor
  )

  # Remove the row names (the extra column that appears as a result of conversion)
  rownames(response_rate_table) <- NULL

  # Fix column names to ensure proper headers
  colnames(response_rate_table) <- c("Field", "Response Rate (%)")

  # Display the table using kable for better formatting
  library(knitr)
  kable(response_rate_table, caption = caption, align = "l")
}
```

```{r response-rate-table-optional, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
fields <- c("Planter Contact Email (Optional)", "Funding Source (Optional)", "Land Ownership (Optional)",
            "Tree Size Planted (Optional)", "Source of Trees (Optional)", "Total Number of Species Planted")
calculate_response_rates(survey_data, fields, "Response Rates for Key Survey Questions")
```

The following provides additional context for each survey question/field, detailing what the percentage represents.

* **Planter Contact Email**: % of respondents who provided an email address.
* **Funding Source**: % who reported how their tree planting was funded.
* **Land Ownership**: % who identified whether the land is publicly or privately owned.
* **Tree Size Planted**: % who specified the size category of planted trees.
* **Source of Trees**: % who indicated where the trees were sourced.
* **Total Number of Species Planted**: % who listed at least one species (excluding blanks or 0s).

# Participant Analysis {.tabset}

[Back to Top](#)

## Participant Type

The following section contains an analysis of tree planting by participant type.

### Submissions

This chart displays the number of survey submissions by participant type, such as community organizations, municipalities, private landowners, and professionals. It provides insight into who is most actively reporting tree planting activities. Identifying the most engaged participant groups helps DEC tailor outreach and support efforts accordingly.

```{r create-histogram-participant-type, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_histogram <- function(data, field, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Count",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  if (is.null(title)) title <- paste("Distribution of", field)
  if (is.null(x_title)) x_title <- field

  ggplot(data, aes(x = .data[[field]])) +
    geom_bar(fill = primary_color, color = secondary_color, stat = "count") +
    (if (show_labels) geom_label(stat = "count", aes(label = scales::comma(after_stat(count))),
                                 position = position_stack(vjust = 0.5),
                                 color = accent_color, size = 5,
                                 fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(title = title, x = x_title, y = y_title) +
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}

create_histogram(
  survey_data,
  field = "Who Planted The Tree(s)? (Required)",
  x_labels = c(
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional"
  ),
  title = "Tree Planting Submissions by Participant Type",
  x_title = "Participant Type",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

### Trees Planted

This chart summarizes the total number of trees planted by each participant type. While some groups may submit more frequently, this view helps highlight the actual planting impact. For example, a smaller number of submissions from professional landscapers might correspond to a large number of trees planted. This supports prioritizing high-impact contributors.

```{r create-bar-chart-participant-trees, echo=TRUE, message=FALSE}
create_bar_chart <- function(data, field, sum_field = NULL, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Sum",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  if (is.null(title)) title <- paste("Sum of", field)
  if (is.null(x_title)) x_title <- field

  if (is.null(sum_field)) {
    sum_field <- field
    data <- data.frame(!!field := data[[field]], Count = 1)
  } else {
    data <- data %>%
      group_by(.data[[field]]) %>%
      summarize(Sum = sum(.data[[sum_field]], na.rm = TRUE)) %>%
      ungroup()
  }

  ggplot(data, aes(x = .data[[field]], y = .data$Sum)) +
    geom_bar(stat = "identity", fill = primary_color, color = secondary_color) +
    (if (show_labels) geom_label(aes(label = scales::comma(Sum)),
                                 position = position_stack(vjust = 0.5),
                                 color = accent_color, size = 5,
                                 fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(title = title, x = x_title, y = y_title) +
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}

create_bar_chart(
  survey_data,
  field = "Who Planted The Tree(s)? (Required)",
  sum_field = "Number of Trees Planted (Required)",
  x_labels = c(
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional"
  ),
  x_title = "Participant Type",
  y_title = "Total Trees Planted",
  title = "Total Trees Planted by Participant Type",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

```{r func-create-summary-table, echo=TRUE}
create_summary_table <- function(data, field, sum_field, remove_na = TRUE, table_font_size = 14) {
  if (!field %in% colnames(data)) stop(paste("Error: Field", field, "does not exist in the data"))
  if (!sum_field %in% colnames(data)) stop(paste("Error: Sum field", sum_field, "does not exist in the data"))
  if (!is.numeric(data[[sum_field]]) && !is.integer(data[[sum_field]])) stop(paste("Error: Sum field", sum_field, "is not numeric"))

  summary_data <- data %>%
    group_by(.data[[field]]) %>%
    summarise(
      submissions = n(),
      total_value = sum(.data[[sum_field]], na.rm = remove_na)
    ) %>%
    mutate(
      submissions_percentage = submissions / sum(submissions) * 100,
      value_percentage = total_value / sum(total_value) * 100
    ) %>%
    mutate(
      submissions = scales::comma(submissions),
      total_value = scales::comma(total_value),
      submissions_percentage = paste0(round(submissions_percentage, 1), "%"),
      value_percentage = paste0(round(value_percentage, 1), "%")
    )

  summary_data %>%
    knitr::kable(
      col.names = c(field, "Number of Submissions", paste("Total", sum_field), "Proportion of Submissions (%)", "Proportion of Sum Field (%)"),
      caption = paste("Summary of Submissions and", sum_field, "by", field),
      align = c("l", "c", "c", "c", "c")
    ) %>%
    kableExtra::kable_styling(
      full_width = F,
      position = "center",
      bootstrap_options = c("striped", "hover"),
      font_size = table_font_size
    ) %>%
    kableExtra::column_spec(1, width = "20em", bold = TRUE) %>%
    kableExtra::column_spec(2:3, width = "12em") %>%
    kableExtra::add_footnote("The proportions represent the percentage of submissions and sum of the field for each category relative to the overall dataset.")
}
```

### Participant Type Table

This table presents a detailed summary of tree planting activity by participant group, showing both the number of submissions and the total trees planted. The percentage columns offer a clear view of each group’s relative contribution to both participation and total planting, which helps evaluate equity and engagement across the initiative.

```{r participant-type-table, echo=TRUE}
survey_data %>%
  create_summary_table("Who Planted The Tree(s)? (Required)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## Named User Activity

This table breaks down the number of submissions and trees planted by named users, typically representing staff accounts or recurring contributors. Public (anonymous) users are grouped separately. This view is useful for recognizing heavy contributors and assessing platform usage patterns.

```{r named-user-activity-table}
survey_data %>%
  mutate(Creator = ifelse(is.na(Creator), "Public User", Creator)) %>%
  create_summary_table("Creator", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## Unique E-mail Activity

This table summarizes the planting activity associated with unique email addresses submitted via the optional contact field. It helps assess how many individuals are participating and the extent of their contributions. Since providing an email is optional, this data may also reflect comfort levels with contact sharing.

```{r unique-email-activity-table}
survey_data %>%
  mutate(`Planter Contact Email (Optional)` = ifelse(is.na(`Planter Contact Email (Optional)`), "Not Provided", `Planter Contact Email (Optional)`)) %>%
  create_summary_table("Planter Contact Email (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

### Municipal Activity

This table presents the number of trees planted by self-reported municipality. It accounts for formatting variations such as town/city/village prefixes. These insights are valuable for understanding geographic participation and for identifying municipalities that may need additional support or engagement.

```{r municipal-activity-table}
survey_data %>%
  mutate(`Participant Municipality (Optional)` = case_when(
    str_starts(`Participant Municipality (Optional)`, "c_") ~ str_replace(`Participant Municipality (Optional)`, "^c_", "") %>% paste0(" (city)"),
    str_starts(`Participant Municipality (Optional)`, "v_") ~ str_replace(`Participant Municipality (Optional)`, "^v_", "") %>% paste0(" (village)"),
    str_starts(`Participant Municipality (Optional)`, "t_") ~ str_replace(`Participant Municipality (Optional)`, "^t_", "") %>% paste0(" (town)"),
    TRUE ~ `Participant Municipality (Optional)`
  )) %>%
  create_summary_table("Participant Municipality (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)

```

### Organization Activity

This table highlights planting contributions by named organizations, either selected from a predefined list or entered manually by participants. It’s useful for identifying high-performing organizations, recognizing partners, and exploring collaboration opportunities. It also helps validate the effectiveness of predefined organization lists in the survey tool.

```{r organization-activity-table}
survey_data %>%
  inner_join(participant_organizations, by = c("GlobalID" = "ParentGlobalID")) %>%
  filter(!(is.na(`Participant Organization (Optional)`) & is.na(`Other (Optional)`))) %>%
  filter(!(tolower(`Participant Organization (Optional)`) == "other" & is.na(`Other (Optional)`))) %>%
  mutate(`Participant Organization (Optional)` = ifelse(
    tolower(`Participant Organization (Optional)`) == "other" & !is.na(`Other (Optional)`),
    `Other (Optional)`,
    `Participant Organization (Optional)`
  )) %>%
  mutate(`Participant Organization (Optional)` = str_replace_all(`Participant Organization (Optional)`, "_", " ")) %>%
  create_summary_table("Participant Organization (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

# Location Analysis {.tabset}

[Back to Top](#)

```{r func-plot_geographic_data, echo=TRUE}
plot_geographic_data <- function(joined_data,
                                 title,
                                 legend,
                                 fill_option = "plasma",
                                 subtitle = NULL,
                                 theme_options = theme_minimal(),
                                 legend_position = "right",
                                 color_scale = "viridis",
                                 save_path = NULL,
                                 na_fill_color = "lightgrey") {

  current_date <- format(Sys.Date(), "%B %d, %Y")
  subtitle_text <- ifelse(is.null(subtitle), paste("Date:", current_date), subtitle)

  joined_data[is.na(joined_data$total_trees), "total_trees"] <- NA

  if (color_scale == "viridis") {
    fill_color <- scale_fill_viridis_c(option = fill_option, na.value = na_fill_color)
  } else if (color_scale == "RColorBrewer") {
    fill_color <- scale_fill_brewer(palette = "Set3")
  } else {
    fill_color <- scale_fill_manual(values = color_scale)
  }

  plot <- ggplot(data = joined_data) +
    geom_sf(aes(fill = total_trees), color = "white") +
    fill_color +
    theme_options +
    labs(title = title,
         subtitle = subtitle_text,
         fill = legend) +
    theme(axis.text = element_blank(),
          axis.title = element_blank(),
          legend.position = legend_position)

  if (!is.null(save_path)) {
    ggsave(save_path, plot = plot, width = 10, height = 6)
  }

  return(plot)
}
```

## By Region

This map shows the **total number of trees planted** in each of New York’s economic development regions. The shading reflects the volume of planting activity, with darker areas representing higher totals.

Use this map to identify which regions are leading in planting activity, and where more outreach or support might be beneficial.

```{r create-region-choropleth-map, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data_aggregated <- survey_data %>%
  group_by(Region) %>%
  summarise(total_trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE))

shapefile_path <- "/home/nick/gitea/tree-tracker-report/data/redc/redc.shp"

geographic_data <- st_read(shapefile_path) %>%
  mutate(
    REDC = str_replace(REDC, "Western NY", "Western New York"),
    REDC = str_replace(REDC, "Central NY", "Central New York"),
    REDC = str_replace(REDC, "Mid-Hudson", "Hudson Valley"),
    REDC = str_replace(REDC, "Capital Region", "Capital District"),
  ) %>%
  st_as_sf()

survey_data_joined <- geographic_data %>%
  left_join(survey_data_aggregated, by = c("REDC" = "Region"))

plot_geographic_data(joined_data = survey_data_joined,
                     title = "Number of Trees Planted by Region in New York",
                     legend = "Total Trees Planted",
                     fill_option = "plasma",
                     subtitle = "Generated: March 13, 2025",
                     theme_options = theme_minimal(),
                     legend_position = "right",
                     color_scale = "viridis",
                     na_fill_color = "lightgrey")
```

### Regional Planting Summary

The table below breaks down the total number of trees planted by region. It also shows each region’s percentage contribution to overall planting activity across New York State.

```{r create-summary-table-region, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_summary_table(survey_data, "Region", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## By County

This map provides a county-level view of total trees planted. Darker counties indicate higher planting activity.

This visual helps uncover local patterns within regions, and may guide localized support, outreach, or reporting strategies.

```{r create-county-choropleth-map, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data_aggregated <- survey_data %>%
  group_by(County) %>%
  summarise(total_trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE))

geographic_data <- counties(state = "NY", cb = TRUE, progress = FALSE) %>%
  st_as_sf() %>%
  mutate(NAME = str_replace(NAME, "\\.", ""))

survey_data_joined <- geographic_data %>%
  left_join(survey_data_aggregated, by = c("NAME" = "County"))

plot_geographic_data(joined_data = survey_data_joined,
                     title = "Number of Trees Planted by County in New York",
                     legend = "Total Trees Planted",
                     fill_option = "plasma",
                     subtitle = "Generated: March 13, 2025",
                     theme_options = theme_minimal(),
                     legend_position = "right",
                     color_scale = "viridis",
                     na_fill_color = "lightgrey")
```

### County-Level Planting Summary

This table provides a detailed breakdown of trees planted by county. Use it alongside the map to validate trends or investigate specific areas.

```{r create-summary-table-county, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_summary_table(survey_data, "County", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

# Tree Analysis {.tabset}

[Back to Top](#)

This section analyzes the tree data submitted in surveys, broken down by both genus and species. It provides insight into which types of trees are most commonly planted and the completeness of species-level reporting.

---

```{r func-create_species_summary_table, echo=TRUE}
create_species_summary_table <- function(data, field, field_label = NULL) {
  # Replace empty strings and NA values with "Not Provided"
  data <- data %>%
    mutate(
      !!sym(field) := ifelse(!!sym(field) == "" | is.na(!!sym(field)), "Not Provided", !!sym(field))
    )

  # Format values: replace underscores, convert to title case
  data <- data %>%
    mutate(
      !!sym(field) := gsub("_", " ", !!sym(field)),
      !!sym(field) := tools::toTitleCase(!!sym(field))
    )

  # Summarize
  summary_data <- data %>%
    group_by(!!sym(field)) %>%
    summarise(
      submissions = n(),
      .groups = "drop"
    ) %>%
    mutate(
      submissions_percentage = submissions / sum(submissions) * 100
    )

  # Format for table
  summary_data_formatted <- summary_data %>%
    mutate(
      submissions = scales::comma(submissions),
      submissions_percentage = paste0(round(submissions_percentage, 1), "%")
    )

  # Field label for header
  label <- ifelse(is.null(field_label), field, field_label)

  # Return table
  summary_data_formatted %>%
    knitr::kable(
      col.names = c(label, "Number of Surveys", "Proportion of Surveys (%)"),
      caption = paste("Summary of Surveys by", label),
      align = c("l", "c", "c")
    ) %>%
    kableExtra::kable_styling(
      full_width = F,
      position = "center",
      bootstrap_options = c("striped", "hover"),
      font_size = 14
    ) %>%
    kableExtra::column_spec(1, width = "20em", bold = TRUE) %>%
    kableExtra::column_spec(2, width = "12em") %>%
    kableExtra::column_spec(3, width = "12em") %>%
    kableExtra::add_footnote("The proportions represent the percentage of surveys for each group relative to the total number of surveys.")
}
```

---

## By Genus

This table summarizes the number and percentage of surveys by **tree genus**. It helps identify which genera were most frequently planted or reported across all submissions.

* **"Number of Surveys"**: Total surveys that mention each genus.
* **"Proportion of Surveys (%)"**: Share of each genus relative to the entire dataset.
* **"Not Provided"**: Includes submissions where the genus was not specified.

```{r create-summary-table-genus, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_species_summary_table(species_planted, "Generic Type of Tree (Optional)", "Tree Genus")
```

---

## By Species

This table provides a breakdown of survey submissions by **tree species**. It offers a more detailed view of which species were planted or reported most often.

* **"Number of Surveys"**: Total submissions for each species.
* **"Proportion of Surveys (%)"**: Percentage of all surveys reporting the species.
* **"Not Provided"**: Surveys that omitted species details.

```{r create-summary-table-species, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_species_summary_table(species_planted, "Tree Species (Optional)", "Tree Species")
```

# Disadvantaged Communities {.tabset}

[Back to Top](#)

New York State's **Climate Leadership and Community Protection Act** (Climate Act) acknowledges that climate change disproportionately impacts certain populations. To address these inequities, the **Climate Justice Working Group (CJWG)** developed and finalized criteria on **March 27, 2023**, to identify **Disadvantaged Communities (DACs)** across the state.

These communities are prioritized for support in the transition to clean energy, air quality improvement, and access to green economic opportunities. The following analysis highlights tree planting activity specifically within these disadvantaged areas.

Only survey entries associated with DAC-designated areas (as indicated by the **Disadvantaged Communities Indicator**) are included in the summaries below.

---

## By Region

This table presents the total number of trees planted within DACs, grouped by New York's **economic development regions**. It helps illustrate where tree planting efforts are concentrated in underserved areas at the regional scale.

```{r create-summary-table-region-dac, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`)) %>%
  create_summary_table("Region", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

---

## By County

This table summarizes tree planting within DACs by **county**. It provides a more localized view of which counties are seeing tree planting investments in disadvantaged communities.

```{r create-summary-table-county-dac, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`)) %>%
  create_summary_table("County", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

---

## By Municipality

This table breaks down the number of trees planted within DACs by **municipality**. It enables more granular insight into local-level engagement and delivery of benefits to disadvantaged areas.

```{r create-summary-table-county-municipality, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`)) %>%
  create_summary_table("Municipality", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```