tree-tracker-report/report.Rmd

---
title: "Assessment of Public Engagement with the 25 Million Trees Initiative"
author:
  - name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
    affiliation: Office of Information Technology Services
  - name: Annabel Gregg <annabel.gregg@dec.ny.gov>
    affiliation: Department of Environmental Conservation
date: "`r format(Sys.Date(), '%B %d, %Y')`"
abstract: >
  This report summarizes survey findings related to the 25 Million Trees Initiative,
  analyzing public awareness, engagement, and regional participation across New York State.
keywords: "urban forestry, public survey, tree planting, New York State, environmental engagement"
output:
  html_document:
    toc: true
    toc_depth: 1
    toc_float: true
    number_sections: false
    css: custom.css
    code_folding: hide
lang: en
geometry: margin=1in
---

```{r setup, include=FALSE}
# Global setup
knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE
)

set.seed(123)  # For reproducibility

# Load tidyverse and additional necessary libraries
library(tidyverse)
library(sf)
library(tigris)
library(scales)
library(RColorBrewer)
library(viridis)

# Define file paths
survey_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
locations_pt_path <- "data/location_points_1.csv"
locations_poly_path <- "data/location_polygons_2.csv"
participants_path <- "data/participant_organizations_3.csv"
species_path <- "data/species_planted_4.csv"
vendors_path <- "data/vendors_5.csv"

# Check for expected files
stopifnot(file.exists(survey_path))
stopifnot(file.exists(locations_pt_path))
stopifnot(file.exists(locations_poly_path))
stopifnot(file.exists(participants_path))
stopifnot(file.exists(species_path))
stopifnot(file.exists(vendors_path))

# Load survey and related datasets
survey_data <- read_csv(survey_path)
location_points <- read_csv(locations_pt_path)
location_polygons <- read_csv(locations_poly_path)
participant_organizations <- read_csv(participants_path)
species_planted <- read_csv(species_path)
vendors <- read_csv(vendors_path)

# Convert character dates to POSIXct
survey_data <- survey_data %>%
  mutate(CreationDate = mdy_hms(CreationDate))

# Count and filter records based on exclusion flag
start_date <- format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")
end_date <- format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")
excluded_count <- survey_data %>% filter(`Exclude Result` == 1) %>% nrow()
used_count <- survey_data %>% filter(`Exclude Result` == 0) %>% nrow()
total_records <- excluded_count + used_count


survey_data <- survey_data %>%
  filter(`Exclude Result` == 0)

# Join related datasets by GlobalID
combined_data <- survey_data %>%
  left_join(location_points, by = c("GlobalID" = "ParentGlobalID")) %>%
  left_join(location_polygons, by = c("GlobalID" = "ParentGlobalID")) %>%
  left_join(participant_organizations, by = c("GlobalID" = "ParentGlobalID")) %>%
  left_join(species_planted, by = c("GlobalID" = "ParentGlobalID")) %>%
  left_join(vendors, by = c("GlobalID" = "ParentGlobalID"))
```

---
subtitle: "`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")` to  `r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`."
---

# Report Overview {.tabset}

[Back to Top](#)

## Background

The **25 Million Trees Initiative** is a major environmental commitment announced by **Governor Kathy Hochul** in the 2024 State of the State Address. Its goal is to plant 25 million trees across New York State by 2033 to support climate resilience, improve community well-being, and enhance biodiversity.

To track progress, the **New York State Department of Environmental Conservation (DEC)** launched the **Tree Tracker**, a public-facing survey tool built on the  ***ArcGIS Survey123***. It allows individuals and organizations to submit information about tree planting efforts, including species, quantity, and location. These submissions feed into a real-time dashboard that maps tree planting activities across the state.

This report analyzes data submitted through the Tree Tracker, offering insights into participation patterns, planting trends, and geographic distribution. The findings are intended to inform DEC staff and leadership in strategic decision-making and program outreach.

## Purpose & Objectives

This report aims to:

-   Summarize overall participation and progress toward the 25 million tree goal.
-   Analyze tree planting data by location, timeframe, and participant type.
-   Identify gaps or trends to inform outreach and support efforts.

By understanding planting behavior statewide, DEC can better allocate resources and strengthen engagement with communities and partners.

## Survey Period & Exclusions

This analysis covers submissions from **`r start_date`** to **`r end_date`**, totaling **`r total_records`** records. Of these, **`r used_count`** records were deemed valid and included in the analysis.

### Excluded Records

**`r excluded_count`** records were removed based on the `Exclude Result` field (`1` = exclude). Common reasons for exclusion:

-   **Duplicate Submissions**: Entries identified as double-counts.
-   **Test Entries**: Data flagged as internal testing or mock submissions.

Only submissions marked with `0` in the `Exclude Result` field were included in the analysis.

## Validation & Data Consistency

To ensure data reliability, multiple validation checks were applied:

### Required Fields
-   **Who Planted the Tree(s)?**
-   **Number of Trees**
-   **Start and End Dates of Planting**
-   **Geographic Location** (latitude/longitude)

### Data Validation
-   **Geographic Validation**: Coordinates were cross-checked with administrative boundaries to assign locality, county, and region. Records with unresolved locations were reviewed manually before inclusion.
-   **Temporal Logic**: Planting dates were required to occur within the initiative timeline and in chronological order.
-   **Optional Field Checks**: Formats (e.g., email addresses) were validated for consistency even if fields were not mandatory.

These validation processes enhance the accuracy and interpretability of the data, ensuring the results reflect genuine community contributions to the initiative.

# Submission Analysis {.tabset}

[Back to Top](#)

## Day of Week

The chart below shows the distribution of survey submissions by day of the week. It reveals which days respondents were most likely to submit entries, offering insight into user behavior that could inform outreach timing.

```{r func-create_histogram, echo=TRUE, message=FALSE}
create_histogram <- function(data, field, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Count",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  # Default color palette
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  # Set default labels if not provided
  if (is.null(title)) title <- paste("Distribution of", field,)
  if (is.null(x_title)) x_title <- field

  # Plot
  ggplot(data, aes(x = .data[[field]])) +
    geom_bar(fill = primary_color, color = secondary_color, stat = "count") +
    # Add text labels, but conditionally hide small bars if specified
    (if (show_labels) geom_label(stat = "count", aes(label = scales::comma(after_stat(count))),
      position = position_stack(vjust = 0.5),
      color = accent_color, size = 5,
      fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(
      title = title,
      x = x_title,
      y = y_title
    ) +
    # Customize x-axis labels
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}
```

```{r create-histogram-day-of-week, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data %>%
  mutate(DayOfWeek = factor(weekdays(CreationDate),
    levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))) %>%
create_histogram(
  survey_data,
  field = "DayOfWeek",
  title = "Submissions by Day of the Week",
  x_title = "Day",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

## 30 Day Trend

The following plot displays the number of survey submissions recorded each day over the past 30 days. It highlights short-term trends in participation and identifies periods of high or low activity. A smoothed trend line (dashed) has been added to help visualize patterns.

```{r func-plot_submission_trends, echo=TRUE}
plot_submission_trends <- function(data, days_ago = 30, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                                   title = NULL, subtitle = NULL, x_title = "Submission Date", y_title = "Number of Submissions") {

  # Default title and subtitle
  if (is.null(title)) title <- "Survey Submission Trends by Date"
  if (is.null(subtitle)) subtitle <- paste("Tracking submissions for the last", days_ago, "days")

  # Calculate the start date (days_ago days before today)
  start_date <- Sys.Date() - days_ago

  # Filter the data based on the calculated start date (up to today)
  submission_trends <- data %>%
    filter(CreationDate >= start_date) %>%
    group_by(CreationDate) %>%
    summarize(submissions = n())

  # Create the plot with the new theming
  ggplot(submission_trends, aes(x = CreationDate, y = submissions)) +
    geom_line(color = color_palette[1], linewidth = 1) +  # Line color from palette
    geom_point(color = color_palette[1], size = 3, shape = 16) +  # Points for visibility
    labs(
      title = title,
      subtitle = subtitle,
      x = x_title,
      y = y_title
    ) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(hjust = 0.5, face = "bold", size = 16, color = color_palette[1]),
      plot.subtitle = element_text(hjust = 0.5, size = 12, color = "grey40"),
      axis.title.x = element_text(color = color_palette[1], size = 12),
      axis.title.y = element_text(color = color_palette[1], size = 12),
      axis.text = element_text(color = color_palette[1], size = 10),
      panel.grid.major = element_line(color = color_palette[3], linewidth = 0.2),
      panel.grid.minor = element_line(color = color_palette[3], linewidth = 0.1),
      panel.background = element_rect(fill = color_palette[3]),
      axis.text.x = element_text(angle = 45, hjust = 1),
      axis.text.y = element_text(size = 10, color = color_palette[1])
    ) +
    # Add a smoothed trend line (loess)
    geom_smooth(method = "loess", color = color_palette[4], linewidth = 1, linetype = "dashed")
}
```

```{r plot-submission-trends-30d, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data$CreationDate <- as.Date(survey_data$CreationDate)
plot_submission_trends(survey_data,
                       days_ago = 30,
                       color_palette = c(
                         "#233f2b", # primary
                         "#7e9084", # secondary
                         "#d9e1dd", # tertiary
                         "#face00"  # accent
                       ))
```

## 90 Day Trend

This chart presents submission trends over the last 90 days. It provides a broader view of participation patterns, helping to identify sustained surges, lulls, or seasonal effects. The dashed line indicates a smoothed average trend over time.

```{r plot-submission-trends-90d, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data$CreationDate <- as.Date(survey_data$CreationDate)
plot_submission_trends(survey_data,
                       days_ago = 90,
                       color_palette = c(
                         "#233f2b", # primary
                         "#7e9084", # secondary
                         "#d9e1dd", # tertiary
                         "#face00"  # accent
                       ))
```

## Optional Question Response Rates

The table below summarizes response rates for selected optional questions from the survey. For each field, the response rate represents the percentage of participants who provided a valid response. For most fields, any non-missing value is considered a response. However, for **"Total Number of Species Planted"**, only values greater than zero are treated as valid responses.

```{r func-calculate_response_rates, echo=TRUE, message=FALSE}
# Function to calculate response rates for selected fields
calculate_response_rates <- function(survey_data, fields, caption) {
  # Calculate the response rate for each field
  response_rates <- sapply(fields, function(field) {
    if (field == "Total Number of Species Planted") {
      # For "Total Number of Species Planted", consider answered if value is greater than 0
      sum(survey_data[[field]] > 0, na.rm = TRUE) / nrow(survey_data) * 100
    } else {
      # For other fields, check for non-NA values
      sum(!is.na(survey_data[[field]])) / nrow(survey_data) * 100
    }
  })

  # Round the response rates to 2 decimal places
  response_rates_rounded <- round(response_rates, 2)

  # Sort the response rates in descending order (highest to lowest)
  sorted_response_rates <- sort(response_rates_rounded, decreasing = TRUE)

  # Create a clean data frame with the field names and their response rates
  response_rate_table <- data.frame(
    "Field" = names(sorted_response_rates),
    "Response Rate (%)" = sorted_response_rates,
    stringsAsFactors = FALSE  # Ensure the "Field" column is treated as character, not factor
  )

  # Remove the row names (the extra column that appears as a result of conversion)
  rownames(response_rate_table) <- NULL

  # Fix column names to ensure proper headers
  colnames(response_rate_table) <- c("Field", "Response Rate (%)")

  # Display the table using kable for better formatting
  library(knitr)
  kable(response_rate_table, caption = caption, align = "l")
}
```

```{r response-rate-table-optional, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
fields <- c("Planter Contact Email (Optional)", "Funding Source (Optional)", "Land Ownership (Optional)",
            "Tree Size Planted (Optional)", "Source of Trees (Optional)", "Total Number of Species Planted")
calculate_response_rates(survey_data, fields, "Response Rates for Key Survey Questions")
```

The following provides additional context for each survey question/field, detailing what the percentage represents.

* **Planter Contact Email**: % of respondents who provided an email address.
* **Funding Source**: % who reported how their tree planting was funded.
* **Land Ownership**: % who identified whether the land is publicly or privately owned.
* **Tree Size Planted**: % who specified the size category of planted trees.
* **Source of Trees**: % who indicated where the trees were sourced.
* **Total Number of Species Planted**: % who listed at least one species (excluding blanks or 0s).

# Participant Analysis {.tabset}

[Back to Top](#)

## Participant Type

The following section contains an analysis of tree planting by participant type.

### Submissions

This chart displays the number of survey submissions by participant type, such as community organizations, municipalities, private landowners, and professionals. It provides insight into who is most actively reporting tree planting activities. Identifying the most engaged participant groups helps DEC tailor outreach and support efforts accordingly.

```{r create-histogram-participant-type, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_histogram <- function(data, field, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Count",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  if (is.null(title)) title <- paste("Distribution of", field)
  if (is.null(x_title)) x_title <- field

  ggplot(data, aes(x = .data[[field]])) +
    geom_bar(fill = primary_color, color = secondary_color, stat = "count") +
    (if (show_labels) geom_label(stat = "count", aes(label = scales::comma(after_stat(count))),
                                 position = position_stack(vjust = 0.5),
                                 color = accent_color, size = 5,
                                 fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(title = title, x = x_title, y = y_title) +
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}

create_histogram(
  survey_data,
  field = "Who Planted The Tree(s)? (Required)",
  x_labels = c(
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional"
  ),
  title = "Tree Planting Submissions by Participant Type",
  x_title = "Participant Type",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

### Trees Planted

This chart summarizes the total number of trees planted by each participant type. While some groups may submit more frequently, this view helps highlight the actual planting impact. For example, a smaller number of submissions from professional landscapers might correspond to a large number of trees planted. This supports prioritizing high-impact contributors.

```{r create-bar-chart-participant-trees, echo=TRUE, message=FALSE}
create_bar_chart <- function(data, field, sum_field = NULL, x_labels = NULL, color_palette = c("#154973", "#457aa5", "#eff6fb", "#face00"),
                             title = NULL, x_title = NULL, y_title = "Sum",
                             max_label_count = 10, label_angle = 45, show_labels = TRUE) {
  primary_color <- color_palette[1]
  secondary_color <- color_palette[2]
  tertiary_color <- color_palette[3]
  accent_color <- color_palette[4]

  if (is.null(title)) title <- paste("Sum of", field)
  if (is.null(x_title)) x_title <- field

  if (is.null(sum_field)) {
    sum_field <- field
    data <- data.frame(!!field := data[[field]], Count = 1)
  } else {
    data <- data %>%
      group_by(.data[[field]]) %>%
      summarize(Sum = sum(.data[[sum_field]], na.rm = TRUE)) %>%
      ungroup()
  }

  ggplot(data, aes(x = .data[[field]], y = .data$Sum)) +
    geom_bar(stat = "identity", fill = primary_color, color = secondary_color) +
    (if (show_labels) geom_label(aes(label = scales::comma(Sum)),
                                 position = position_stack(vjust = 0.5),
                                 color = accent_color, size = 5,
                                 fill = primary_color, label.padding = unit(0.25, "lines")) else NULL) +
    labs(title = title, x = x_title, y = y_title) +
    (if (!is.null(x_labels)) scale_x_discrete(labels = x_labels) else NULL) +
    theme_minimal(base_size = 14) +
    theme(
      plot.title = element_text(size = 16, face = "bold", color = primary_color),
      axis.title = element_text(size = 12, color = primary_color),
      axis.text = element_text(size = 10, color = primary_color, angle = label_angle, hjust = 1),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.major = element_line(color = tertiary_color, linewidth = 0.2),
      panel.grid.minor = element_line(color = tertiary_color, linewidth = 0.1),
      panel.background = element_rect(fill = tertiary_color),
      axis.text.x = element_text(angle = label_angle, hjust = 1),
      axis.text.y = element_text(size = 10, color = primary_color)
    )
}

create_bar_chart(
  survey_data,
  field = "Who Planted The Tree(s)? (Required)",
  sum_field = "Number of Trees Planted (Required)",
  x_labels = c(
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional"
  ),
  x_title = "Participant Type",
  y_title = "Total Trees Planted",
  title = "Total Trees Planted by Participant Type",
  color_palette = c("#233f2b", "#7e9084", "#d9e1dd", "#face00"))
```

```{r func-create-summary-table, echo=TRUE}
create_summary_table <- function(data, field, sum_field, remove_na = TRUE, table_font_size = 14) {
  if (!field %in% colnames(data)) stop(paste("Error: Field", field, "does not exist in the data"))
  if (!sum_field %in% colnames(data)) stop(paste("Error: Sum field", sum_field, "does not exist in the data"))
  if (!is.numeric(data[[sum_field]]) && !is.integer(data[[sum_field]])) stop(paste("Error: Sum field", sum_field, "is not numeric"))

  summary_data <- data %>%
    group_by(.data[[field]]) %>%
    summarise(
      submissions = n(),
      total_value = sum(.data[[sum_field]], na.rm = remove_na)
    ) %>%
    mutate(
      submissions_percentage = submissions / sum(submissions) * 100,
      value_percentage = total_value / sum(total_value) * 100
    ) %>%
    mutate(
      submissions = scales::comma(submissions),
      total_value = scales::comma(total_value),
      submissions_percentage = paste0(round(submissions_percentage, 1), "%"),
      value_percentage = paste0(round(value_percentage, 1), "%")
    )

  summary_data %>%
    knitr::kable(
      col.names = c(field, "Number of Submissions", paste("Total", sum_field), "Proportion of Submissions (%)", "Proportion of Sum Field (%)"),
      caption = paste("Summary of Submissions and", sum_field, "by", field),
      align = c("l", "c", "c", "c", "c")
    ) %>%
    kableExtra::kable_styling(
      full_width = F,
      position = "center",
      bootstrap_options = c("striped", "hover"),
      font_size = table_font_size
    ) %>%
    kableExtra::column_spec(1, width = "20em", bold = TRUE) %>%
    kableExtra::column_spec(2:3, width = "12em") %>%
    kableExtra::add_footnote("The proportions represent the percentage of submissions and sum of the field for each category relative to the overall dataset.")
}
```

### Participant Type Table

This table presents a detailed summary of tree planting activity by participant group, showing both the number of submissions and the total trees planted. The percentage columns offer a clear view of each group’s relative contribution to both participation and total planting, which helps evaluate equity and engagement across the initiative.

```{r participant-type-table, echo=TRUE}
survey_data %>%
  mutate(`Who Planted The Tree(s)? (Required)` = recode(`Who Planted The Tree(s)? (Required)`,
    "agency" = "State Agency",
    "community" = "Community Organization",
    "landowner" = "Private Landowner",
    "municipality" = "Municipal Government",
    "professional" = "Paid Professional")) %>%
  create_summary_table("Who Planted The Tree(s)? (Required)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## Named User Activity

This table breaks down the number of submissions and trees planted by named users, typically representing staff accounts or recurring contributors. Public (anonymous) users are grouped separately. This view is useful for recognizing heavy contributors and assessing platform usage patterns.

```{r named-user-activity-table}
survey_data %>%
  mutate(Creator = ifelse(is.na(Creator), "Public User", Creator)) %>%
  create_summary_table("Creator", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## Unique E-mail Activity

This table summarizes the planting activity associated with unique email addresses submitted via the optional contact field. It helps assess how many individuals are participating and the extent of their contributions. Since providing an email is optional, this data may also reflect comfort levels with contact sharing.

```{r unique-email-activity-table}
survey_data %>%
  mutate(`Planter Contact Email (Optional)` = ifelse(is.na(`Planter Contact Email (Optional)`), "Not Provided", `Planter Contact Email (Optional)`)) %>%
  create_summary_table("Planter Contact Email (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

### Municipal Activity

This table presents the number of trees planted by self-reported municipality. It accounts for formatting variations such as town/city/village prefixes. These insights are valuable for understanding geographic participation and for identifying municipalities that may need additional support or engagement.

```{r municipal-activity-table}
survey_data %>%
  mutate(`Participant Municipality (Optional)` = case_when(
    str_starts(`Participant Municipality (Optional)`, "c_") ~ str_replace(`Participant Municipality (Optional)`, "^c_", "") %>% paste0(" (city)"),
    str_starts(`Participant Municipality (Optional)`, "v_") ~ str_replace(`Participant Municipality (Optional)`, "^v_", "") %>% paste0(" (village)"),
    str_starts(`Participant Municipality (Optional)`, "t_") ~ str_replace(`Participant Municipality (Optional)`, "^t_", "") %>% paste0(" (town)"),
    TRUE ~ `Participant Municipality (Optional)`
  )) %>%
  create_summary_table("Participant Municipality (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)

```

### Organization Activity

This table highlights planting contributions by named organizations, either selected from a predefined list or entered manually by participants. It’s useful for identifying high-performing organizations, recognizing partners, and exploring collaboration opportunities. It also helps validate the effectiveness of predefined organization lists in the survey tool.

```{r organization-activity-table}
survey_data %>%
  inner_join(participant_organizations, by = c("GlobalID" = "ParentGlobalID")) %>%
  filter(!(is.na(`Participant Organization (Optional)`) & is.na(`Other (Optional)`))) %>%
  filter(!(tolower(`Participant Organization (Optional)`) == "other" & is.na(`Other (Optional)`))) %>%
  mutate(`Participant Organization (Optional)` = ifelse(
    tolower(`Participant Organization (Optional)`) == "other" & !is.na(`Other (Optional)`),
    `Other (Optional)`,
    `Participant Organization (Optional)`
  )) %>%
  mutate(`Participant Organization (Optional)` = str_replace_all(`Participant Organization (Optional)`, "_", " ")) %>%
  create_summary_table("Participant Organization (Optional)", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

# Location Analysis {.tabset}
```{r func-plot_geographic_data, echo=TRUE}
plot_geographic_data <- function(joined_data,
                                 title,
                                 legend,
                                 fill_option = "plasma",
                                 subtitle = NULL,
                                 theme_options = theme_minimal(),
                                 legend_position = "right",
                                 color_scale = "viridis",
                                 save_path = NULL,
                                 na_fill_color = "lightgrey") {

  current_date <- format(Sys.Date(), "%B %d, %Y")

  # If subtitle is not provided, use the current date as subtitle
  subtitle_text <- ifelse(is.null(subtitle), paste("Date:", current_date), subtitle)

  # Handle missing data by filling with a specified color
  joined_data[is.na(joined_data$total_trees), "total_trees"] <- NA

  # Select the color scale based on the user's input
  if (color_scale == "viridis") {
    fill_color <- scale_fill_viridis_c(option = fill_option, na.value = na_fill_color)  # Use na.value to fill NA
  } else if (color_scale == "RColorBrewer") {
    fill_color <- scale_fill_brewer(palette = "Set3")  # Default RColorBrewer palette
  } else {
    fill_color <- scale_fill_manual(values = color_scale)  # Custom color scale
  }

  # Create the plot
  plot <- ggplot(data = joined_data) +
    geom_sf(aes(fill = total_trees), color = "white") +
    fill_color +  # Color scale for the plot
    theme_options +  # Apply custom theme
    labs(title = title,
         subtitle = subtitle_text,  # Subtitle is handled here
         fill = legend) +
    theme(axis.text = element_blank(), axis.title = element_blank(),
          legend.position = legend_position)

  # If save_path is provided, save the plot to file
  if (!is.null(save_path)) {
    ggsave(save_path, plot = plot, width = 10, height = 6)
  }

  # Return the plot
  return(plot)
}
```

[Back to Top](#)

## By Region

This map displays the **total number of trees planted** across each economic region in **New York State**. The counties are color-coded, with darker shades representing areas where more trees have been planted. This allows users to quickly see which counties have had the most extensive tree planting efforts.

-   **What to look for**:
    -   **Dark colors**: Indicate regions with a higher number of trees planted.
    -   **Lighter colors**: Represent regions with fewer trees planted.

The map provides a visual overview of tree planting distribution across New York, making it easier to identify areas with the highest impact or need for further action.

```{r create-region-choropleth-map, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data_aggregated <- survey_data %>%
  group_by(Region) %>%
  summarise(total_trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE))

shapefile_path <- "/home/nick/gitea/tree-tracker-report/data/redc/redc.shp"

geographic_data <- st_read(shapefile_path) %>%
  mutate(
    REDC = str_replace(REDC, "Western NY", "Western New York"),
    REDC = str_replace(REDC, "Central NY", "Central New York"),
    REDC = str_replace(REDC, "Mid-Hudson", "Hudson Valley"),
    REDC = str_replace(REDC, "Capital Region", "Capital District"),
  ) %>%
  st_as_sf()

survey_data_joined <- geographic_data %>%
  left_join(survey_data_aggregated, by = c("REDC" = "Region"))

plot_geographic_data(joined_data = survey_data_joined,
                     title = "Number of Trees Planted by Region in New York",
                     legend = "Total Trees Planted",
                     fill_option = "plasma",
                     subtitle = "Generated: March 13, 2025",
                     theme_options = theme_minimal(),
                     legend_position = "right",
                     color_scale = "viridis",
                     na_fill_color = "lightgrey")
```

```{r create-summary-table-region, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_summary_table(survey_data, "Region", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## By County

This map displays the **total number of trees planted** across each county in **New York State**. The counties are color-coded, with darker shades representing areas where more trees have been planted. This allows users to quickly see which counties have had the most extensive tree planting efforts.

-   **What to look for**:
    -   **Dark colors**: Indicate counties with a higher number of trees planted.
    -   **Lighter colors**: Represent counties with fewer trees planted.

The map provides a visual overview of tree planting distribution across New York, making it easier to identify areas with the highest impact or need for further action.

```{r create-county-choropleth-map, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
survey_data_aggregated <- survey_data %>%
  group_by(County) %>%
  summarise(total_trees = sum(`Number of Trees Planted (Required)`, na.rm = TRUE))

geographic_data <- counties(state = "NY", cb = TRUE, progress = FALSE) %>%
  st_as_sf() %>%
  mutate(NAME = str_replace(NAME, "\\.", "")) # Remove period from "St. Lawrence"

survey_data_joined <- geographic_data %>%
  left_join(survey_data_aggregated, by = c("NAME" = "County"))

# Example of calling the function with enhancements
plot_geographic_data(joined_data = survey_data_joined,
                     title = "Number of Trees Planted by County in New York",
                     legend = "Total Trees Planted",
                     fill_option = "plasma",
                     subtitle = "Generated: March 13, 2025",
                     theme_options = theme_minimal(),
                     legend_position = "right",
                     color_scale = "viridis",  # Default viridis scale
                     na_fill_color = "lightgrey")  # Color for NA values
```

```{r create-summary-table-county, echo=TRUE, message=FALSE, , fig.height=6, fig.width=8}
create_summary_table(survey_data, "County", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)

```

# Tree Analysis {.tabset}

[Back to Top](#)

```{r func-create_species_summary_table, echo=TRUE}
create_species_summary_table <- function(data, field, field_label = NULL) {
  # Replace empty strings and NA values with "Not Provided" before summarization
  data <- data %>%
    mutate(
      !!sym(field) := ifelse(!!sym(field) == "" | is.na(!!sym(field)), "Not Provided", !!sym(field))  # Replace empty strings and NAs
    )

  # Clean up the species names: replace underscores with spaces and convert to title case
  data <- data %>%
    mutate(
      !!sym(field) := gsub("_", " ", !!sym(field)),  # Replace underscores with spaces
      !!sym(field) := tools::toTitleCase(!!sym(field))  # Convert to title case
    )

  # Summarize the data based on the field (e.g., Generic.Species.of.Tree)
  summary_data <- data %>%
    group_by(!!sym(field)) %>%
    summarise(
      submissions = n(),  # Count of surveys for each species (or category)
      .groups = "drop"  # To prevent issues with group structure
    ) %>%
    mutate(
      submissions_percentage = submissions / sum(submissions) * 100  # Proportion of surveys for each category
    )

  # Format the table for display
  summary_data_formatted <- summary_data %>%
    mutate(
      submissions = scales::comma(submissions),  # Format the submission counts with commas
      submissions_percentage = paste0(round(submissions_percentage, 1), "%")  # Round percentage and append '%'
    )

  # Determine the label for the field
  label <- ifelse(is.null(field_label), field, field_label)

  # Create and style the table
  summary_data_formatted %>%
    knitr::kable(col.names = c(label, "Number of Surveys", "Proportion of Surveys (%)"),
                 caption = paste("Summary of Surveys by", label),
                 align = c("l", "c", "c")) %>%  # Align the columns (left for the field, center for others)
    kableExtra::kable_styling(
      full_width = F,
      position = "center",
      bootstrap_options = c("striped", "hover"),
      font_size = 14
    ) %>%
    kableExtra::column_spec(1, width = "20em", bold = TRUE) %>%  # First column (Species) bold and wider
    kableExtra::column_spec(2, width = "12em") %>%  # Number of Surveys column
    kableExtra::column_spec(3, width = "12em") %>%  # Proportion column
    kableExtra::add_footnote("The proportions represent the percentage of surveys for each species relative to the total surveys.")
}
```

## By Genus

The following table shows a breakdown of survey submissions by **Genus**. For each genus, the table provides:

1.  **Number of Surveys**: The total number of surveys where this genus was reported.
2.  **Proportion of Surveys (%)**: The percentage of total surveys that reported this genus, relative to the entire dataset.
3.  **"Not Provided" Category**: Any surveys that did not specify a genus are grouped under the "Not Provided" category.

These figures provide an understanding of which genus are most commonly reported, how prevalent each genus is, and the proportion of surveys where no genus was specified.

```{r create-summary-table-genus, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_species_summary_table(species_planted, "Generic Type of Tree (Optional)", "Tree Genus")
```

## By Species

The following table shows a breakdown of survey submissions by **Species**. For each species, the table provides:

1.  **Number of Surveys**: The total number of surveys where this species was reported.
2.  **Proportion of Surveys (%)**: The percentage of total surveys that reported this species, relative to the entire dataset.
3.  **"Not Provided" Category**: Any surveys that did not specify a species are grouped under the "Not Provided" category.

These figures provide an understanding of which species are most commonly reported, how prevalent each species is, and the proportion of surveys where no genus was specified.

```{r create-summary-table-species, echo=TRUE, message=FALSE, fig.height=6, fig.width=8}
create_species_summary_table(species_planted, "Tree Species (Optional)", "Tree Species")
```

# Disadvantaged Communities {.tabset}

## By Region

```{r create-summary-table-region-dac, echo=TRUE, message=FALSE, , fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`), na.rm = TRUE) %>%
  create_summary_table("Region", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)

```

## By County

```{r create-summary-table-county-dac, echo=TRUE, message=FALSE, , fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`), na.rm = TRUE) %>%
  create_summary_table("County", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```

## By Municipality

```{r create-summary-table-county-municipality, echo=TRUE, message=FALSE, , fig.height=6, fig.width=8}
survey_data %>%
  filter(!is.na(`Disadvantaged Communities Indicator`), na.rm = TRUE) %>%
  create_summary_table("Municipality", "Number of Trees Planted (Required)", remove_na = FALSE, table_font_size = 16)
```