Update Report Overview, Submission Analysis.

2025-02-13 20:18:22 -05:00 · 2025-02-13 20:18:22 -05:00 · cda23941aa
commit cda23941aa
parent deb30c8270
1 changed files with 438 additions and 418 deletions
--- a/report.Rmd
+++ b/report.Rmd
@ -1,418 +1,438 @@
---
+---
-title: "25 Million Trees Initiative Survey Report"
+title: "25 Million Trees Initiative Survey Report"
-author: 
+author: 
-  - name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
+  - name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
-    affiliation: Office of Information Technology Services
+    affiliation: Office of Information Technology Services
-date: "`r format(Sys.Date(), '%B, %d, %Y')`"
+  - name: Annabel Gregg <annabel.gregg@dec.ny.gov>
-keywords: "keyword1, keyword2"
+    affiliation: Department of Environmental Convervation
-output: html_document
+date: "`r format(Sys.Date(), '%B, %d, %Y')`"
---
+keywords: "keyword1, keyword2"
-
+output: 
-```{r setup, include=FALSE}
+  html_document
-# Load necessary libraries
+---
-library(tidyverse)
+
-library(lubridate)
+```{r setup, include=FALSE}
-library(ggplot2)
+# Load necessary libraries
-
+library(tidyverse)
-# Read the CSV files into a dataframe
+library(lubridate)
-survey_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
+library(ggplot2)
-survey_data <- read_csv(survey_path)
+
-
+# Read the CSV files into a dataframe
-species_path <- "data/species_planted_4.csv"
+survey_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
-species_data <- read.csv(species_path)
+survey_data <- read_csv(survey_path)
-
+
-
+species_path <- "data/species_planted_4.csv"
-# Convert the CreationDate field to a proper datetime object (if applicable)
+species_data <- read.csv(species_path)
-survey_data <- survey_data %>%
+
-  mutate(CreationDate = mdy_hms(CreationDate))
+# Convert the CreationDate field to a proper datetime object (if applicable)
-
+survey_data <- survey_data %>%
-# Count the records to be excluded (Exclude Result == 1)
+  mutate(CreationDate = mdy_hms(CreationDate))
-excluded_count <- survey_data %>%
+
-  filter(`Exclude Result` == 1) %>%
+# Count the records to be excluded (Exclude Result == 1)
-  nrow()
+excluded_count <- survey_data %>%
-
+  filter(`Exclude Result` == 1) %>%
-# Count the records that are used (Exclude Result == 0)
+  nrow()
-used_count <- survey_data %>%
+
-  filter(`Exclude Result` == 0) %>%
+# Count the records that are used (Exclude Result == 0)
-  nrow()
+used_count <- survey_data %>%
-```
+  filter(`Exclude Result` == 0) %>%
-
+  nrow()
-# {.tabset .tabset-fade .tabset-pills}
+```
-
+
-## Summary
+---
-
+abstract: "This report was generated on: **`r format(Sys.Date(), '%B, %d, %Y')`**. For the period beginning : **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** and ending:  **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**. **`r used_count`** records were used in this analysis."
-### Background
+---
-
+
-The **25 Million Trees Initiative** is a bold commitment launched by **Governor Kathy Hochul** during the 2024 State of the State Address, aiming to plant 25 million trees by 2033 in New York State. This initiative recognizes the critical importance of trees and forests for climate mitigation, enhancing community health, and supporting biodiversity. The New York State Department of Environmental Conservation (DEC) is at the forefront of tracking the progress of this ambitious goal.
+# {.tabset .tabset-fade .tabset-pills}
-
+
-As part of this effort, DEC has launched the **Tree Tracker**, a tool for the public to record the trees they plant. These submissions contribute valuable data on the number, type, and locations of trees being planted across the state, helping to build a comprehensive, real-time dashboard of tree planting activities. 
+## Report Overview
-
+
-This report compiles the survey data collected via the Tree Tracker and provides detailed insights into the information submitted by New Yorkers. It aims to support DEC staff and executives in understanding the progress of the initiative and identifying areas for improvement in outreach and engagement.
+### Background
-
+
-### Purpose
+The **25 Million Trees Initiative** is a bold commitment launched by **Governor Kathy Hochul** during the 2024 State of the State Address, aiming to plant 25 million trees by 2033 in New York State. This initiative recognizes the critical importance of trees and forests for climate mitigation, enhancing community health, and supporting biodiversity. The New York State Department of Environmental Conservation (DEC) is at the forefront of tracking the progress of this ambitious goal.
-
+
-This report serves to present an overview of the data collected through the 25 Million Trees Initiative, offering insights into submission patterns, geographic distribution, and trends in tree planting activities. The report aims to:
+As part of this effort, DEC has launched the **Tree Tracker**, a tool for the public to record the trees they plant. These submissions contribute valuable data on the number, type, and locations of trees being planted across the state, helping to build a comprehensive, real-time dashboard of tree planting activities. 
-
+
- Summarize the overall progress of the initiative.
+This report compiles the survey data collected via the Tree Tracker and provides detailed insights into the information submitted by New Yorkers. It aims to support DEC staff and executives in understanding the progress of the initiative and identifying areas for improvement in outreach and engagement.
- Provide detailed data analysis on the submitted tree planting information.
+
- Identify areas where more outreach or support may be needed.
+### Purpose & Objectives
-
+
-As more individuals contribute their data to the Tree Tracker, the initiative's success will be better understood, and DEC can better align resources to further promote this critical program.
+This report serves to present an overview of the data collected through the 25 Million Trees Initiative, offering insights into submission patterns, geographic distribution, and trends in tree planting activities. The report aims to:
-
+
-## Submission Overview
+- Summarize the overall progress of the initiative.
-
+- Provide detailed data analysis on the submitted tree planting information.
-This section provides an overview of the surveys submitted to the Tree Tracker Tool, detailing the survey period, exclusions, validation processes, and key data checks to ensure accuracy and consistency.
+- Identify areas where more outreach or support may be needed.
-
+
-### Survey Period and Exclusions
+As more individuals contribute their data to the Tree Tracker, the initiative's success will be better understood, and DEC can better align resources to further promote this critical program.
-
+
-The report covers the survey period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, including a total of **`r nrow(survey_data)`** records. Out of these, **`r used_count`** records were deemed valid and included in the analysis. 
+### Survey Period and Data Exclusions
-
+
-Exclusions were applied to **`r excluded_count`** records, which were removed due to various reasons, such as:
+The report covers the survey period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, including a total of **`r nrow(survey_data)`** records. Out of these, **`r used_count`** records were deemed valid and included in the analysis. 
-
+
- **Double Count**: Some submissions were identified as duplicates and excluded to prevent data redundancy.
+Exclusions were applied to **`r excluded_count`** records, which were removed due to various reasons, such as:
- **Test Data**: Entries that were intended solely for testing purposes were excluded, as they do not represent actual survey data.
+
-
+- **Double Count**: Some submissions were identified as duplicates and excluded to prevent data redundancy.
-These excluded records are marked with a value of **1** in the `Exclude Result` field. The remaining **`r used_count`** records, marked with a **0**, represent legitimate data points that were included in the analysis.
+- **Test Data**: Entries that were intended solely for testing purposes were excluded, as they do not represent actual survey data.
-
+
---
+These excluded records are marked with a value of **1** in the `Exclude Result` field. The remaining **`r used_count`** records, marked with a **0**, represent legitimate data points that were included in the analysis.
-abstract: "This report was generated on: **`r format(Sys.Date(), '%B, %d, %Y')`** for the following Reporting Period: **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**. **`r used_count`** records were used in this analysis."
+
---
+### Survey Validation Process and Data Consistency
-
+
-### Survey Validation and Data Consistency
+To ensure data integrity, several validation steps are applied to survey submissions:
-
+
-To ensure data integrity, several validation steps are applied to survey submissions:
+- **Required Fields**:
-
+  - **Who Planted the Tree(s)?**: Describes the participant's role in the tree planting effort.
- **Required Fields**:
+  - **Number of Trees**: The number of trees planted during the planting period.
-  - **Who Planted The Tree(s)?**: A descriptor of the particpant's role in the tree planting effort.
+  - **Start Date of Planting**: The date when planting began.
-  - **Number of Trees**: The number of trees planted is a required field, and users cannot submit the survey without providing this information.
+  - **End Date of Planting**: The date when planting was completed.
-  - **Start Date of Planting**:
+  - **Location**: Geographic coordinates (latitude and longitude).
-  - **End Date of Planting**:
+
-  - **Location**: Geographic coordinates (latitude and longitude) are also required, and users must provide this data when submitting their survey.
+- **Response Validation**:
-
+  - **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to provide an accurate nominal locality, county, and region data. In rare cases, this check may fail due to service dependency, but such records are corrected before inclusion in the analysis.
- **Response Validation**:
+  - **Date Validation and Logic**: Users cannot enter planting dates prior to the start date of the initiative. The system enforces this restriction, and any records with such dates are not allowed to be submitted. Additionally, users cannot enter a planting end date that occurs before the planting start date.
-  - **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to provide an accurate nominal locality, county, and region data. In rare cases, this check may fail due to service dependency, but such records are corrected before inclusion in the analysis.
+  - **Optional Questions**: Even optional questions undergo validation to ensure the entered data meets the expected format or logic, providing further consistency and accuracy.
-
+    - **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format.
-  - **Date Validation and Logic**: Users cannot enter planting dates prior to the start date of the initiative. The system enforces this restriction, and any records with such dates are not allowed to be submitted. Additionally, users cannot enter a planting end date that occurs before the planting start date.
+
-
+By applying these validation checks, the integrity and consistency of the data is ensured, allowing for meaningful analysis of tree planting surveys.
-  - **Optional Questions**: Even optional questions undergo validation to ensure the entered data meets the expected format or logic, providing further consistency and accuracy.
+
-    - **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format.
+
-
+## Submission Analysis {.tabset}
-By applying these validation checks, the integrity and consistency of the data is ensured, allowing for meaningful analysis of tree planting surveys.
+
-
+### Submission Trend Analysis
-### Submission Trend
+
-
+```{r submission-trend-stats, echo=FALSE, message=FALSE}
-The following visualization illustrates the trend in the total number of submissions throughout the survey period, providing insights into any patterns or changes in submission activity.
+## library(dplyr)
-
+
-```{r submission-trend, echo=FALSE, message=FALSE, fig.height=6, fig.width=8}
+# Ensure CreationDate is in Date format
-
+survey_data$CreationDate <- as.Date(survey_data$CreationDate)
-library(ggplot2)
+
-library(dplyr)
+# Summarize the data to calculate the total number of submissions by CreationDate
-
+summary_data <- survey_data %>%
-survey_data <- survey_data %>%
+  filter(`Exclude Result` == 0) %>%
-  filter(`Exclude Result` == 0)
+  group_by(CreationDate) %>%
-
+  summarise(total_submissions = n(), .groups = "drop")
-survey_data$CreationDate <- as.Date(survey_data$CreationDate)
+
-
+# Number of days that have elapsed between the first and last submission date
-# Summarize the data to calculate the total number of submissions by CreationDate
+date_range <- range(summary_data$CreationDate)
-summary_data <- survey_data %>%
+elapsed_days <- as.integer(difftime(date_range[2], date_range[1], units = "days"))
-  group_by(CreationDate) %>%
+
-  summarise(total_submissions = n())
+# Number of days with 0 submissions
-
+all_dates <- data.frame(CreationDate = seq.Date(date_range[1], date_range[2], by = "day"))
-ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) + 
+merged_data <- left_join(all_dates, summary_data, by = "CreationDate")
-  geom_line(color = "#233f28", linewidth = 1) +  # Change 'size' to 'linewidth'
+days_with_0_submissions <- sum(is.na(merged_data$total_submissions))
-  geom_point(color = "#7e9084", size = 3) +
+
-  geom_smooth(method = "loess", color = "#face00", linewidth = 1, linetype = "dashed") +
+# Summary statistics based on the count of submissions
-  labs(
+submission_summary <- summary(merged_data$total_submissions, na.rm = TRUE)
-    title = "Total Number of Submissions by Date",
+
-    x = "Submission Date",
+# Dates where submissions exceeded the 3rd quartile
-    y = "Total Number of Submissions"
+third_quartile <- quantile(merged_data$total_submissions, 0.75, na.rm = TRUE)
-  ) +
+dates_above_3rd_quartile <- merged_data %>%
-  theme_minimal(base_size = 14) +
+  filter(total_submissions > third_quartile) %>%
-  theme(
+  pull(CreationDate)
-    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
+
-    axis.title = element_text(size = 12, color = "#233f28"),
+```
-    axis.text = element_text(size = 10, color = "#233f28"),  
+
-    plot.margin = margin(10, 10, 10, 10),
+The survey has been active for **`r elapsed_days`** days.During this period **`r days_with_0_submissions`** days had no submission.
-    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3),  # Change 'size' to 'linewidth'
+
-    panel.background = element_rect(fill = "#d9e1dd"),  
+The following visualization illustrates the trend in the total number of submissions throughout the survey period, providing insights into any patterns or changes in submission activity.
-    axis.text.x = element_text(angle = 45, hjust = 1)
+
-  ) +
+```{r submission-trend-plot, echo=FALSE, message=FALSE, fig.height=6, fig.width=8}
-  scale_x_date(date_labels = "%b %Y", date_breaks = "1 months")
+#library(ggplot2)
-```
+
-
+# Plot Submission Trend
-### Response Rates
+ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) + 
-The table below shows the response rates for a selection of optional fields within the survey. Each field represents a different aspect of the survey, and the response rate reflects the percentage of respondents who provided valid answers for each field.
+  geom_line(color = "#233f28", linewidth = 1) +
-
+  geom_point(color = "#7e9084", size = 3) +
- **Planter Contact Email**: The percentage of respondents who provided their email address.
+  geom_smooth(method = "loess", color = "#face00", linewidth = 1, linetype = "dashed") +
- **Funding Source**: The percentage of respondents who identified their funding source.
+  labs(
- **Land Ownership**: The percentage of respondents who indicated their land ownership status.
+    title = "Total Number of Submissions by Date",
- **Tree Size Planted**: The percentage of respondents who specified the size of trees they planted.
+    x = "Submission Date",
- **Source of Trees**: The percentage of respondents who reported the source of the trees they planted.
+    y = "Total Number of Submissions"
- **Species Planted**: The percentage of respondents who provided the species of tree(s) they planted.
+  ) +
-
+  theme_minimal(base_size = 14) +
-This breakdown helps identify which survey fields received higher levels of engagement, and which may require further clarification or encouragement to improve response rates.
+  theme(
-
+    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
-```{r response-rate, echo=FALSE, message=FALSE}
+    axis.title = element_text(size = 12, color = "#233f28"),
-# List of fields to check for response rates, with special handling for 'Total Number of Species Planted'
+    axis.text = element_text(size = 10, color = "#233f28"),  
-fields <- c("Planter Contact Email", "Funding Source", "Land Ownership", 
+    plot.margin = margin(10, 10, 10, 10),
-            "Tree Size Planted", "Source of Trees", "Total Number of Species Planted")
+    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3),
-
+    panel.background = element_rect(fill = "#d9e1dd"),
-# Calculate the response rate for each field
+    axis.text.x = element_text(angle = 45, hjust = 1)
-response_rates <- sapply(fields, function(field) {
+  ) +
-  if (field == "Total Number of Species Planted") {
+  scale_x_date(date_labels = "%b %Y", date_breaks = "1 months")
-    # For "Total Number of Species Planted", consider answered if value is greater than 0
+
-    sum(survey_data[[field]] > 0, na.rm = TRUE) / nrow(survey_data) * 100
+```
-  } else {
+
-    # For other fields, check for non-NA values
+### Survey Response Rates by Field
-    sum(!is.na(survey_data[[field]])) / nrow(survey_data) * 100
+The table below shows the response rates for a selection of optional fields within the survey. Each field represents a different aspect of the survey, and the response rate reflects the percentage of respondents who provided valid answers for each field.
-  }
+
-})
+- **Planter Contact Email**: The percentage of respondents who provided their email address.
-
+- **Funding Source**: The percentage of respondents who identified their funding source.
-# Round the response rates to 2 decimal places
+- **Land Ownership**: The percentage of respondents who indicated their land ownership status.
-response_rates_rounded <- round(response_rates, 2)
+- **Tree Size Planted**: The percentage of respondents who specified the size of trees they planted.
-
+- **Source of Trees**: The percentage of respondents who reported the source of the trees they planted.
-# Sort the response rates in descending order (highest to lowest)
+- **Species Planted**: The percentage of respondents who provided the species of tree(s) they planted.
-sorted_response_rates <- sort(response_rates_rounded, decreasing = TRUE)
+
-
+This breakdown helps identify which survey fields received higher levels of engagement, and which may require further clarification or encouragement to improve response rates.
-# Print the sorted, rounded response rates
+
-sorted_response_rates
+```{r response-rate, echo=FALSE, message=FALSE}
-
+# List of fields to check for response rates, with special handling for 'Total Number of Species Planted'
-```
+fields <- c("Planter Contact Email", "Funding Source", "Land Ownership", 
-
+            "Tree Size Planted", "Source of Trees", "Total Number of Species Planted")
-## Participant Type Overview
+
-
+# Calculate the response rate for each field
-This section provides an overview of the different types of participants involved in tree planting surveys. The data collected includes submissions from various categories of participants, including state agencies, community organizations, private landowners, and municipal governments. By understanding the distribution of these participant types and the scope of their contributions, we can gain insights into the reach and diversity of the program. The following visualizations highlight the number of surveys and total trees planted by each participant type.
+response_rates <- sapply(fields, function(field) {
-
+  if (field == "Total Number of Species Planted") {
-### Participant Type: Number of Submissions
+    # For "Total Number of Species Planted", consider answered if value is greater than 0
-The first visualization shows the distribution of the number of tree planting surveys based on the participant type. This breakdown helps highlight which groups are contributing most to the tree planting initiative.
+    sum(survey_data[[field]] > 0, na.rm = TRUE) / nrow(survey_data) * 100
-
+  } else {
-```{r participant-type-surveys, echo=FALSE, message=FALSE}
+    # For other fields, check for non-NA values
-library(ggplot2)
+    sum(!is.na(survey_data[[field]])) / nrow(survey_data) * 100
-library(dplyr)
+  }
-
+})
-ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) + 
+
-  geom_bar(fill = "#233f28", color = "#7e9084") +
+# Round the response rates to 2 decimal places
-  geom_text(stat = "count", aes(label = scales::comma(after_stat(count))), 
+response_rates_rounded <- round(response_rates, 2)
-            position = position_stack(vjust = 0.5),  # Places text in the middle of the bars
+
-            color = "#ffffff", size = 5) +  # Adjust label size
+# Sort the response rates in descending order (highest to lowest)
-  labs(
+sorted_response_rates <- sort(response_rates_rounded, decreasing = TRUE)
-    title = "Number of Tree Planting Submissions by Participant Type",
+
-    x = "Participant Type",
+# Print the sorted, rounded response rates
-    y = "Number of Submissions"
+sorted_response_rates
-  ) +
+
-  scale_x_discrete(labels = c(
+```
-    "agency" = "State Agency",
+
-    "community" = "Community Organization",
+## Participant Type Analysis {.tabset}
-    "landowner" = "Private Landowner",
+
-    "municipality" = "Municipal Government",
+### Number of Submissions
-    "professional" = "Paid Professional"
+The first visualization shows the distribution of the number of tree planting surveys based on the participant type. This breakdown helps highlight which groups are contributing most to the tree planting initiative.
-  )) +
+
-  theme_minimal(base_size = 14) +
+```{r participant-type-surveys, echo=FALSE, message=FALSE}
-  theme(
+#library(ggplot2)
-    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
+#library(dplyr)
-    axis.title = element_text(size = 12, color = "#233f28"),
+
-    axis.text = element_text(size = 10, color = "#233f28"),
+ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) + 
-    plot.margin = margin(10, 10, 10, 10),
+  geom_bar(fill = "#233f28", color = "#7e9084") +
-    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3),
+  geom_text(stat = "count", aes(label = scales::comma(after_stat(count))), 
-    panel.background = element_rect(fill = "#d9e1dd"),
+            position = position_stack(vjust = 0.5),  # Places text in the middle of the bars
-    axis.text.x = element_text(angle = 45, hjust = 1)
+            color = "#ffffff", size = 5) +  # Adjust label size
-  )
+  labs(
-
+    title = "Number of Tree Planting Submissions by Participant Type",
-```
+    x = "Participant Type",
-
+    y = "Number of Submissions"
-### Participant Type: Total Trees Planted
+  ) +
-This second plot provides a breakdown of the total number of trees planted by participant type. This visualization helps to assess the contribution of each participant group to the overall impact of the tree planting program.
+  scale_x_discrete(labels = c(
-
+    "agency" = "State Agency",
-```{r participant-type-planted, echo=FALSE, message=FALSE}
+    "community" = "Community Organization",
-library(ggplot2)
+    "landowner" = "Private Landowner",
-library(dplyr)
+    "municipality" = "Municipal Government",
-
+    "professional" = "Paid Professional"
-summary_data <- survey_data %>%
+  )) +
-  group_by(`Who Planted The Tree(s)?`) %>%
+  theme_minimal(base_size = 14) +
-  summarise(total_trees = sum(`Number of Trees Planted`, na.rm = TRUE))
+  theme(
-
+    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
-library(ggplot2)
+    axis.title = element_text(size = 12, color = "#233f28"),
-library(dplyr)
+    axis.text = element_text(size = 10, color = "#233f28"),
-
+    plot.margin = margin(10, 10, 10, 10),
-# Assuming 'summary_data' is already defined
+    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3),
-ggplot(summary_data, aes(x = `Who Planted The Tree(s)?`, y = total_trees)) + 
+    panel.background = element_rect(fill = "#d9e1dd"),
-  geom_bar(stat = "identity", fill = "#233f28", color = "#7e9084") +
+    axis.text.x = element_text(angle = 45, hjust = 1)
-  geom_text(aes(label = scales::comma(total_trees)), 
+  )
-            position = position_stack(vjust = 0.5),  # Places text in the middle of the bars
+
-            color = "#ffffff", size = 5) +  # Accent color for text labels
+```
-  labs(
+
-    title = "Total Number of Trees Planted by Participant Type",
+### Total Trees Planted
-    x = "Participant Type",
+This second plot provides a breakdown of the total number of trees planted by participant type. This visualization helps to assess the contribution of each participant group to the overall impact of the tree planting program.
-    y = "Total Number of Trees Planted"
+
-  ) +
+```{r participant-type-planted, echo=FALSE, message=FALSE}
-  scale_x_discrete(labels = c(
+library(ggplot2)
-    "agency" = "State Agency",
+library(dplyr)
-    "community" = "Community Organization",
+
-    "landowner" = "Private Landowner",
+summary_data <- survey_data %>%
-    "municipality" = "Municipal Government",
+  group_by(`Who Planted The Tree(s)?`) %>%
-    "professional" = "Paid Professional"
+  summarise(total_trees = sum(`Number of Trees Planted`, na.rm = TRUE))
-  )) +
+
-  theme_minimal(base_size = 14) +  # Adjusted base font size for clarity
+library(ggplot2)
-  theme(
+library(dplyr)
-    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
+
-    axis.title = element_text(size = 12, color = "#233f28"),
+# Assuming 'summary_data' is already defined
-    axis.text = element_text(size = 10, color = "#233f28"),
+ggplot(summary_data, aes(x = `Who Planted The Tree(s)?`, y = total_trees)) + 
-    plot.margin = margin(10, 10, 10, 10),
+  geom_bar(stat = "identity", fill = "#233f28", color = "#7e9084") +
-    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3), 
+  geom_text(aes(label = scales::comma(total_trees)), 
-    panel.background = element_rect(fill = "#d9e1dd"),
+            position = position_stack(vjust = 0.5),  # Places text in the middle of the bars
-    axis.text.x = element_text(angle = 45, hjust = 1)
+            color = "#ffffff", size = 5) +  # Accent color for text labels
-  )
+  labs(
-```
+    title = "Total Number of Trees Planted by Participant Type",
-
+    x = "Participant Type",
-```{r participant-type-table, echo=FALSE, message=FALSE}
+    y = "Total Number of Trees Planted"
-# Summarize the data to calculate the total number of trees planted by participant type
+  ) +
-summary_data <- survey_data %>%
+  scale_x_discrete(labels = c(
-  group_by(`Who Planted The Tree(s)?`) %>%
+    "agency" = "State Agency",
-  summarise(total_trees = sum(`Number of Trees Planted`, na.rm = TRUE))
+    "community" = "Community Organization",
-# Replace the participant type values with more readable labels
+    "landowner" = "Private Landowner",
-summary_data <- summary_data %>%
+    "municipality" = "Municipal Government",
-  mutate(
+    "professional" = "Paid Professional"
-    `Who Planted The Tree(s)?` = recode(`Who Planted The Tree(s)?`,
+  )) +
-                                       "agency" = "State Agency",
+  theme_minimal(base_size = 14) +  # Adjusted base font size for clarity
-                                       "community" = "Community Organization",
+  theme(
-                                       "landowner" = "Private Landowner",
+    plot.title = element_text(size = 16, face = "bold", color = "#233f28"),
-                                       "municipality" = "Municipal Government",
+    axis.title = element_text(size = 12, color = "#233f28"),
-                                       "professional" = "Paid Professional")
+    axis.text = element_text(size = 10, color = "#233f28"),
-  )
+    plot.margin = margin(10, 10, 10, 10),
-
+    panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3), 
-# Add percentage column
+    panel.background = element_rect(fill = "#d9e1dd"),
-summary_data <- summary_data %>%
+    axis.text.x = element_text(angle = 45, hjust = 1)
-  mutate(percentage = total_trees / sum(total_trees) * 100)
+  )
-
+```
-# Format the table to display the number of trees and percentage
+
-summary_data_formatted <- summary_data %>%
+```{r participant-type-table, echo=FALSE, message=FALSE}
-  mutate(
+# Summarize the data to calculate the total number of trees planted by participant type
-    total_trees = scales::comma(total_trees),  # Add commas to the total number of trees
+summary_data <- survey_data %>%
-    percentage = paste0(round(percentage, 1), "%")  # Round percentage and append '%'
+  group_by(`Who Planted The Tree(s)?`) %>%
-  )
+  summarise(total_trees = sum(`Number of Trees Planted`, na.rm = TRUE))
-
+# Replace the participant type values with more readable labels
-# Print the table
+summary_data <- summary_data %>%
-summary_data_formatted %>%
+  mutate(
-  knitr::kable(col.names = c("Participant Type", "Total Trees Planted", "Percentage of Total Trees"),
+    `Who Planted The Tree(s)?` = recode(`Who Planted The Tree(s)?`,
-               caption = "Total Number of Trees Planted by Participant Type and their Proportional Contribution") %>%
+                                       "agency" = "State Agency",
-  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
+                                       "community" = "Community Organization",
-```
+                                       "landowner" = "Private Landowner",
-
+                                       "municipality" = "Municipal Government",
-
+                                       "professional" = "Paid Professional")
-## Region Overview
+  )
-This section provides an overview of regional involved and response to the tree planting survey. 
+
-
+# Add percentage column
-In the table below, we aggregate plantings by Region. The results are provided in descending order of Total Trees Planted.
+summary_data <- summary_data %>%
-```{r region-summary, echo=FALSE, warning=FALSE, message=FALSE}
+  mutate(percentage = total_trees / sum(total_trees) * 100)
-# Summarize the data by Region
+
-region_summary_data <- survey_data %>%
+# Format the table to display the number of trees and percentage
-  group_by(Region) %>%
+summary_data_formatted <- summary_data %>%
-  summarise(
+  mutate(
-    total_records = n(),  # Count the number of records in each region
+    total_trees = scales::comma(total_trees),  # Add commas to the total number of trees
-    total_trees_planted = sum(`Number of Trees Planted`, na.rm = TRUE),  # Sum of trees planted in each region
+    percentage = paste0(round(percentage, 1), "%")  # Round percentage and append '%'
-    mean_trees_planted = mean(`Number of Trees Planted`, na.rm = TRUE),  # Mean number of trees planted
+  )
-    median_trees_planted = median(`Number of Trees Planted`, na.rm = TRUE)  # Median number of trees planted
+
-  ) %>%
+# Print the table
-  arrange(desc(total_trees_planted))  # Sort by total trees planted in descending order
+summary_data_formatted %>%
-
+  knitr::kable(col.names = c("Participant Type", "Total Trees Planted", "Percentage of Total Trees"),
-# Format the table to display the total number of records and trees planted
+               caption = "Total Number of Trees Planted by Participant Type and their Proportional Contribution") %>%
-region_summary_data_formatted <- region_summary_data %>%
+  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
-  mutate(
+```
-    total_trees_planted = scales::comma(total_trees_planted),  # Add commas to the total number of trees
+
-    total_records = scales::comma(total_records),  # Add commas to the total number of records
+
-    mean_trees_planted = round(mean_trees_planted, 1),  # Round mean for readability
+## Region Overview
-    median_trees_planted = round(median_trees_planted, 1)  # Round median for readability
+This section provides an overview of regional involved and response to the tree planting survey. 
-  )
+
-
+In the table below, we aggregate plantings by Region. The results are provided in descending order of Total Trees Planted.
-# Print the summary table
+```{r region-summary, echo=FALSE, warning=FALSE, message=FALSE}
-region_summary_data_formatted %>%
+# Summarize the data by Region
-  knitr::kable(col.names = c("Region", "Total Submissions", "Total Trees Planted", "Mean", "Median"),
+region_summary_data <- survey_data %>%
-               caption = "Total Records, Trees Planted, Mean, and Median by Region (Sorted by Trees Planted)") %>%
+  group_by(Region) %>%
-  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
+  summarise(
-
+    total_records = n(),  # Count the number of records in each region
-```
+    total_trees_planted = sum(`Number of Trees Planted`, na.rm = TRUE),  # Sum of trees planted in each region
-
+    mean_trees_planted = mean(`Number of Trees Planted`, na.rm = TRUE),  # Mean number of trees planted
-## County Overview
+    median_trees_planted = median(`Number of Trees Planted`, na.rm = TRUE)  # Median number of trees planted
-This section provides an overview of counties involved and response to the tree planting survey. 
+  ) %>%
-
+  arrange(desc(total_trees_planted))  # Sort by total trees planted in descending order
-In the table below, we aggregate plantings by County. The results are provided in descending order of Total Trees Planted.
+
-```{r county-summary, echo=FALSE, warning=FALSE, message=FALSE}
+# Format the table to display the total number of records and trees planted
-# Summarize the data by Region
+region_summary_data_formatted <- region_summary_data %>%
-county_summary_data <- survey_data %>%
+  mutate(
-  group_by(County) %>%
+    total_trees_planted = scales::comma(total_trees_planted),  # Add commas to the total number of trees
-  summarise(
+    total_records = scales::comma(total_records),  # Add commas to the total number of records
-    total_records = n(),  # Count the number of records in each county
+    mean_trees_planted = round(mean_trees_planted, 1),  # Round mean for readability
-    total_trees_planted = sum(`Number of Trees Planted`, na.rm = TRUE),  # Sum of trees planted in each region
+    median_trees_planted = round(median_trees_planted, 1)  # Round median for readability
-    mean_trees_planted = mean(`Number of Trees Planted`, na.rm = TRUE),  # Mean number of trees planted
+  )
-    median_trees_planted = median(`Number of Trees Planted`, na.rm = TRUE)  # Median number of trees planted
+
-  ) %>%
+# Print the summary table
-  arrange(desc(total_trees_planted))  # Sort by total trees planted in descending order
+region_summary_data_formatted %>%
-
+  knitr::kable(col.names = c("Region", "Total Submissions", "Total Trees Planted", "Mean", "Median"),
-# Format the table to display the total number of records and trees planted
+               caption = "Total Records, Trees Planted, Mean, and Median by Region (Sorted by Trees Planted)") %>%
-county_summary_data_formatted <- county_summary_data %>%
+  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
-  mutate(
+
-    total_trees_planted = scales::comma(total_trees_planted),  # Add commas to the total number of trees
+```
-    total_records = scales::comma(total_records),  # Add commas to the total number of records
+
-    mean_trees_planted = round(mean_trees_planted, 1),  # Round mean for readability
+## County Overview
-    median_trees_planted = round(median_trees_planted, 1)  # Round median for readability
+This section provides an overview of counties involved and response to the tree planting survey. 
-  )
+
-
+In the table below, we aggregate plantings by County. The results are provided in descending order of Total Trees Planted.
-# Print the summary table
+```{r county-summary, echo=FALSE, warning=FALSE, message=FALSE}
-county_summary_data_formatted %>%
+# Summarize the data by Region
-  knitr::kable(col.names = c("County", "Total Submissions", "Total Trees Planted", "Mean", "Median"),
+county_summary_data <- survey_data %>%
-               caption = "Total Records, Trees Planted, Mean, and Median by County (Sorted by Trees Planted)") %>%
+  group_by(County) %>%
-  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
+  summarise(
-
+    total_records = n(),  # Count the number of records in each county
-```
+    total_trees_planted = sum(`Number of Trees Planted`, na.rm = TRUE),  # Sum of trees planted in each region
-
+    mean_trees_planted = mean(`Number of Trees Planted`, na.rm = TRUE),  # Mean number of trees planted
-
+    median_trees_planted = median(`Number of Trees Planted`, na.rm = TRUE)  # Median number of trees planted
-## Species Overview
+  ) %>%
-The following section contains details on species plantings. These results indicate the number of occurrences where the tree species was planted. They are not necessarily the number of those trees planted, but can be used to indicate popularity.
+  arrange(desc(total_trees_planted))  # Sort by total trees planted in descending order
-
+
-```{r species-detail, echo=FALSE, message=FALSE}
+# Format the table to display the total number of records and trees planted
-# Load the required libraries
+county_summary_data_formatted <- county_summary_data %>%
-library(tidyverse)
+  mutate(
-# Count unique values in 'Generic.Species.of.Tree' and 'Precise.Species.of.Tree', handling NA and sorting
+    total_trees_planted = scales::comma(total_trees_planted),  # Add commas to the total number of trees
-generic_species_count <- species_data %>%
+    total_records = scales::comma(total_records),  # Add commas to the total number of records
-  count(`Generic.Species.of.Tree`) %>%
+    mean_trees_planted = round(mean_trees_planted, 1),  # Round mean for readability
-  mutate(
+    median_trees_planted = round(median_trees_planted, 1)  # Round median for readability
-    `Generic.Species.of.Tree` = if_else(is.na(`Generic.Species.of.Tree`), "Null Response", `Generic.Species.of.Tree`),
+  )
-    `Generic.Species.of.Tree` = str_replace_all(`Generic.Species.of.Tree`, "_", " "), # Replace underscores with spaces
+
-    `Generic.Species.of.Tree` = str_to_title(`Generic.Species.of.Tree`) # Convert to Title Case
+# Print the summary table
-  ) %>%
+county_summary_data_formatted %>%
-  arrange(desc(n)) # Sort by count in descending order
+  knitr::kable(col.names = c("County", "Total Submissions", "Total Trees Planted", "Mean", "Median"),
-
+               caption = "Total Records, Trees Planted, Mean, and Median by County (Sorted by Trees Planted)") %>%
-precise_species_count <- species_data %>%
+  kableExtra::kable_styling(full_width = F, position = "center", bootstrap_options = c("striped", "hover"))
-  count(`Precise.Species.of.Tree`) %>%
+
-  mutate(
+```
-    `Precise.Species.of.Tree` = if_else(is.na(`Precise.Species.of.Tree`), "Null Response", `Precise.Species.of.Tree`),
+
-    `Precise.Species.of.Tree` = str_replace_all(`Precise.Species.of.Tree`, "_", " "), # Replace underscores with spaces
+
-    `Precise.Species.of.Tree` = str_to_title(`Precise.Species.of.Tree`) # Convert to Title Case
+## Species Overview
-  ) %>%
+The following section contains details on species plantings. These results indicate the number of occurrences where the tree species was planted. They are not necessarily the number of those trees planted, but can be used to indicate popularity.
-  arrange(desc(n)) # Sort by count in descending order
+
-
+```{r species-detail, echo=FALSE, message=FALSE}
-# Print the results
+#library(tidyverse)
-print(generic_species_count)
+# Count unique values in 'Generic.Species.of.Tree' and 'Precise.Species.of.Tree', handling NA and sorting
-print(precise_species_count)
+generic_species_count <- species_data %>%
-```
+  count(`Generic.Species.of.Tree`) %>%
-
+  mutate(
-## Tree Count
+    `Generic.Species.of.Tree` = if_else(is.na(`Generic.Species.of.Tree`), "Null Response", `Generic.Species.of.Tree`),
-In this section, we present summary statistics for the number of trees planted by all participants in various tree planting surveys.
+    `Generic.Species.of.Tree` = str_replace_all(`Generic.Species.of.Tree`, "_", " "), # Replace underscores with spaces
-
+    `Generic.Species.of.Tree` = str_to_title(`Generic.Species.of.Tree`) # Convert to Title Case
-```{r summary-stats, echo=FALSE, warning=FALSE, message=FALSE}
+  ) %>%
-# Calculate summary statistics
+  arrange(desc(n)) # Sort by count in descending order
-summary_stats <- summary(survey_data$`Number of Trees Planted`, na.rm = TRUE)
+
-```
+precise_species_count <- species_data %>%
-
+  count(`Precise.Species.of.Tree`) %>%
-Below is a summary of the `Number of Trees Planted` across participants:
+  mutate(
-
+    `Precise.Species.of.Tree` = if_else(is.na(`Precise.Species.of.Tree`), "Null Response", `Precise.Species.of.Tree`),
-| Statistic   | Value       |
+    `Precise.Species.of.Tree` = str_replace_all(`Precise.Species.of.Tree`, "_", " "), # Replace underscores with spaces
-|-------------|-------------|
+    `Precise.Species.of.Tree` = str_to_title(`Precise.Species.of.Tree`) # Convert to Title Case
-| Min         | `r summary_stats["Min"]`  |
+  ) %>%
-| 1st Qu.     | `r summary_stats["1st Qu."]` |
+  arrange(desc(n)) # Sort by count in descending order
-| Median      | `r summary_stats["Median"]` |
+
-| Mean        | `r summary_stats["Mean"]` |
+# Print the results
-| 3rd Qu.     | `r summary_stats["3rd Qu."]` |
+print(generic_species_count)
-| Max         | `r summary_stats["Max"]` |
+print(precise_species_count)
-
+```
-The summary statistics for the number of trees planted provide insight into the distribution of trees planted by all participants in the tree planting surveys. While the median value gives us a sense of the "typical" number of trees planted, the mean might be skewed by a few participants planting a very large number of trees.
+
 ## Tree Count
 In this section, we present summary statistics for the number of trees planted by all participants in various tree planting surveys.
 ```{r summary-stats, echo=FALSE, warning=FALSE, message=FALSE}
 # Calculate summary statistics
 summary_stats <- summary(survey_data$`Number of Trees Planted`, na.rm = TRUE)
 ```
 Below is a summary of the `Number of Trees Planted` across participants:
 | Statistic   | Value       |
 |-------------|-------------|
 | Min         | `r summary_stats["Min"]`  |
 | 1st Qu.     | `r summary_stats["1st Qu."]` |
 | Median      | `r summary_stats["Median"]` |
 | Mean        | `r summary_stats["Mean"]` |
 | 3rd Qu.     | `r summary_stats["3rd Qu."]` |
 | Max         | `r summary_stats["Max"]` |
 The summary statistics for the number of trees planted provide insight into the distribution of trees planted by all participants in the tree planting surveys. While the median value gives us a sense of the "typical" number of trees planted, the mean might be skewed by a few participants planting a very large number of trees.