Update Report Overview, Submission Analysis.

This commit is contained in:
Nick Heppler 2025-02-13 20:18:22 -05:00
parent deb30c8270
commit cda23941aa

View File

@ -3,9 +3,12 @@ title: "25 Million Trees Initiative Survey Report"
author: author:
- name: Nicholas Hepler <nicholas.hepler@its.ny.gov> - name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
affiliation: Office of Information Technology Services affiliation: Office of Information Technology Services
- name: Annabel Gregg <annabel.gregg@dec.ny.gov>
affiliation: Department of Environmental Convervation
date: "`r format(Sys.Date(), '%B, %d, %Y')`" date: "`r format(Sys.Date(), '%B, %d, %Y')`"
keywords: "keyword1, keyword2" keywords: "keyword1, keyword2"
output: html_document output:
html_document
--- ---
```{r setup, include=FALSE} ```{r setup, include=FALSE}
@ -21,7 +24,6 @@ survey_data <- read_csv(survey_path)
species_path <- "data/species_planted_4.csv" species_path <- "data/species_planted_4.csv"
species_data <- read.csv(species_path) species_data <- read.csv(species_path)
# Convert the CreationDate field to a proper datetime object (if applicable) # Convert the CreationDate field to a proper datetime object (if applicable)
survey_data <- survey_data %>% survey_data <- survey_data %>%
mutate(CreationDate = mdy_hms(CreationDate)) mutate(CreationDate = mdy_hms(CreationDate))
@ -37,9 +39,13 @@ used_count <- survey_data %>%
nrow() nrow()
``` ```
---
abstract: "This report was generated on: **`r format(Sys.Date(), '%B, %d, %Y')`**. For the period beginning : **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** and ending: **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**. **`r used_count`** records were used in this analysis."
---
# {.tabset .tabset-fade .tabset-pills} # {.tabset .tabset-fade .tabset-pills}
## Summary ## Report Overview
### Background ### Background
@ -49,7 +55,7 @@ As part of this effort, DEC has launched the **Tree Tracker**, a tool for the pu
This report compiles the survey data collected via the Tree Tracker and provides detailed insights into the information submitted by New Yorkers. It aims to support DEC staff and executives in understanding the progress of the initiative and identifying areas for improvement in outreach and engagement. This report compiles the survey data collected via the Tree Tracker and provides detailed insights into the information submitted by New Yorkers. It aims to support DEC staff and executives in understanding the progress of the initiative and identifying areas for improvement in outreach and engagement.
### Purpose ### Purpose & Objectives
This report serves to present an overview of the data collected through the 25 Million Trees Initiative, offering insights into submission patterns, geographic distribution, and trends in tree planting activities. The report aims to: This report serves to present an overview of the data collected through the 25 Million Trees Initiative, offering insights into submission patterns, geographic distribution, and trends in tree planting activities. The report aims to:
@ -59,11 +65,7 @@ This report serves to present an overview of the data collected through the 25 M
As more individuals contribute their data to the Tree Tracker, the initiative's success will be better understood, and DEC can better align resources to further promote this critical program. As more individuals contribute their data to the Tree Tracker, the initiative's success will be better understood, and DEC can better align resources to further promote this critical program.
## Submission Overview ### Survey Period and Data Exclusions
This section provides an overview of the surveys submitted to the Tree Tracker Tool, detailing the survey period, exclusions, validation processes, and key data checks to ensure accuracy and consistency.
### Survey Period and Exclusions
The report covers the survey period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, including a total of **`r nrow(survey_data)`** records. Out of these, **`r used_count`** records were deemed valid and included in the analysis. The report covers the survey period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, including a total of **`r nrow(survey_data)`** records. Out of these, **`r used_count`** records were deemed valid and included in the analysis.
@ -74,52 +76,72 @@ Exclusions were applied to **`r excluded_count`** records, which were removed du
These excluded records are marked with a value of **1** in the `Exclude Result` field. The remaining **`r used_count`** records, marked with a **0**, represent legitimate data points that were included in the analysis. These excluded records are marked with a value of **1** in the `Exclude Result` field. The remaining **`r used_count`** records, marked with a **0**, represent legitimate data points that were included in the analysis.
--- ### Survey Validation Process and Data Consistency
abstract: "This report was generated on: **`r format(Sys.Date(), '%B, %d, %Y')`** for the following Reporting Period: **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**. **`r used_count`** records were used in this analysis."
---
### Survey Validation and Data Consistency
To ensure data integrity, several validation steps are applied to survey submissions: To ensure data integrity, several validation steps are applied to survey submissions:
- **Required Fields**: - **Required Fields**:
- **Who Planted The Tree(s)?**: A descriptor of the particpant's role in the tree planting effort. - **Who Planted the Tree(s)?**: Describes the participant's role in the tree planting effort.
- **Number of Trees**: The number of trees planted is a required field, and users cannot submit the survey without providing this information. - **Number of Trees**: The number of trees planted during the planting period.
- **Start Date of Planting**: - **Start Date of Planting**: The date when planting began.
- **End Date of Planting**: - **End Date of Planting**: The date when planting was completed.
- **Location**: Geographic coordinates (latitude and longitude) are also required, and users must provide this data when submitting their survey. - **Location**: Geographic coordinates (latitude and longitude).
- **Response Validation**: - **Response Validation**:
- **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to provide an accurate nominal locality, county, and region data. In rare cases, this check may fail due to service dependency, but such records are corrected before inclusion in the analysis. - **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to provide an accurate nominal locality, county, and region data. In rare cases, this check may fail due to service dependency, but such records are corrected before inclusion in the analysis.
- **Date Validation and Logic**: Users cannot enter planting dates prior to the start date of the initiative. The system enforces this restriction, and any records with such dates are not allowed to be submitted. Additionally, users cannot enter a planting end date that occurs before the planting start date. - **Date Validation and Logic**: Users cannot enter planting dates prior to the start date of the initiative. The system enforces this restriction, and any records with such dates are not allowed to be submitted. Additionally, users cannot enter a planting end date that occurs before the planting start date.
- **Optional Questions**: Even optional questions undergo validation to ensure the entered data meets the expected format or logic, providing further consistency and accuracy. - **Optional Questions**: Even optional questions undergo validation to ensure the entered data meets the expected format or logic, providing further consistency and accuracy.
- **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format. - **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format.
By applying these validation checks, the integrity and consistency of the data is ensured, allowing for meaningful analysis of tree planting surveys. By applying these validation checks, the integrity and consistency of the data is ensured, allowing for meaningful analysis of tree planting surveys.
### Submission Trend
The following visualization illustrates the trend in the total number of submissions throughout the survey period, providing insights into any patterns or changes in submission activity. ## Submission Analysis {.tabset}
```{r submission-trend, echo=FALSE, message=FALSE, fig.height=6, fig.width=8} ### Submission Trend Analysis
library(ggplot2) ```{r submission-trend-stats, echo=FALSE, message=FALSE}
library(dplyr) ## library(dplyr)
survey_data <- survey_data %>%
filter(`Exclude Result` == 0)
# Ensure CreationDate is in Date format
survey_data$CreationDate <- as.Date(survey_data$CreationDate) survey_data$CreationDate <- as.Date(survey_data$CreationDate)
# Summarize the data to calculate the total number of submissions by CreationDate # Summarize the data to calculate the total number of submissions by CreationDate
summary_data <- survey_data %>% summary_data <- survey_data %>%
filter(`Exclude Result` == 0) %>%
group_by(CreationDate) %>% group_by(CreationDate) %>%
summarise(total_submissions = n()) summarise(total_submissions = n(), .groups = "drop")
# Number of days that have elapsed between the first and last submission date
date_range <- range(summary_data$CreationDate)
elapsed_days <- as.integer(difftime(date_range[2], date_range[1], units = "days"))
# Number of days with 0 submissions
all_dates <- data.frame(CreationDate = seq.Date(date_range[1], date_range[2], by = "day"))
merged_data <- left_join(all_dates, summary_data, by = "CreationDate")
days_with_0_submissions <- sum(is.na(merged_data$total_submissions))
# Summary statistics based on the count of submissions
submission_summary <- summary(merged_data$total_submissions, na.rm = TRUE)
# Dates where submissions exceeded the 3rd quartile
third_quartile <- quantile(merged_data$total_submissions, 0.75, na.rm = TRUE)
dates_above_3rd_quartile <- merged_data %>%
filter(total_submissions > third_quartile) %>%
pull(CreationDate)
```
The survey has been active for **`r elapsed_days`** days.During this period **`r days_with_0_submissions`** days had no submission.
The following visualization illustrates the trend in the total number of submissions throughout the survey period, providing insights into any patterns or changes in submission activity.
```{r submission-trend-plot, echo=FALSE, message=FALSE, fig.height=6, fig.width=8}
#library(ggplot2)
# Plot Submission Trend
ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) + ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) +
geom_line(color = "#233f28", linewidth = 1) + # Change 'size' to 'linewidth' geom_line(color = "#233f28", linewidth = 1) +
geom_point(color = "#7e9084", size = 3) + geom_point(color = "#7e9084", size = 3) +
geom_smooth(method = "loess", color = "#face00", linewidth = 1, linetype = "dashed") + geom_smooth(method = "loess", color = "#face00", linewidth = 1, linetype = "dashed") +
labs( labs(
@ -133,14 +155,15 @@ ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) +
axis.title = element_text(size = 12, color = "#233f28"), axis.title = element_text(size = 12, color = "#233f28"),
axis.text = element_text(size = 10, color = "#233f28"), axis.text = element_text(size = 10, color = "#233f28"),
plot.margin = margin(10, 10, 10, 10), plot.margin = margin(10, 10, 10, 10),
panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3), # Change 'size' to 'linewidth' panel.grid.major = element_line(color = "#d9e1dd", linewidth = 0.3),
panel.background = element_rect(fill = "#d9e1dd"), panel.background = element_rect(fill = "#d9e1dd"),
axis.text.x = element_text(angle = 45, hjust = 1) axis.text.x = element_text(angle = 45, hjust = 1)
) + ) +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 months") scale_x_date(date_labels = "%b %Y", date_breaks = "1 months")
``` ```
### Response Rates ### Survey Response Rates by Field
The table below shows the response rates for a selection of optional fields within the survey. Each field represents a different aspect of the survey, and the response rate reflects the percentage of respondents who provided valid answers for each field. The table below shows the response rates for a selection of optional fields within the survey. Each field represents a different aspect of the survey, and the response rate reflects the percentage of respondents who provided valid answers for each field.
- **Planter Contact Email**: The percentage of respondents who provided their email address. - **Planter Contact Email**: The percentage of respondents who provided their email address.
@ -179,16 +202,14 @@ sorted_response_rates
``` ```
## Participant Type Overview ## Participant Type Analysis {.tabset}
This section provides an overview of the different types of participants involved in tree planting surveys. The data collected includes submissions from various categories of participants, including state agencies, community organizations, private landowners, and municipal governments. By understanding the distribution of these participant types and the scope of their contributions, we can gain insights into the reach and diversity of the program. The following visualizations highlight the number of surveys and total trees planted by each participant type. ### Number of Submissions
### Participant Type: Number of Submissions
The first visualization shows the distribution of the number of tree planting surveys based on the participant type. This breakdown helps highlight which groups are contributing most to the tree planting initiative. The first visualization shows the distribution of the number of tree planting surveys based on the participant type. This breakdown helps highlight which groups are contributing most to the tree planting initiative.
```{r participant-type-surveys, echo=FALSE, message=FALSE} ```{r participant-type-surveys, echo=FALSE, message=FALSE}
library(ggplot2) #library(ggplot2)
library(dplyr) #library(dplyr)
ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) + ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) +
geom_bar(fill = "#233f28", color = "#7e9084") + geom_bar(fill = "#233f28", color = "#7e9084") +
@ -220,7 +241,7 @@ ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) +
``` ```
### Participant Type: Total Trees Planted ### Total Trees Planted
This second plot provides a breakdown of the total number of trees planted by participant type. This visualization helps to assess the contribution of each participant group to the overall impact of the tree planting program. This second plot provides a breakdown of the total number of trees planted by participant type. This visualization helps to assess the contribution of each participant group to the overall impact of the tree planting program.
```{r participant-type-planted, echo=FALSE, message=FALSE} ```{r participant-type-planted, echo=FALSE, message=FALSE}
@ -370,8 +391,7 @@ county_summary_data_formatted %>%
The following section contains details on species plantings. These results indicate the number of occurrences where the tree species was planted. They are not necessarily the number of those trees planted, but can be used to indicate popularity. The following section contains details on species plantings. These results indicate the number of occurrences where the tree species was planted. They are not necessarily the number of those trees planted, but can be used to indicate popularity.
```{r species-detail, echo=FALSE, message=FALSE} ```{r species-detail, echo=FALSE, message=FALSE}
# Load the required libraries #library(tidyverse)
library(tidyverse)
# Count unique values in 'Generic.Species.of.Tree' and 'Precise.Species.of.Tree', handling NA and sorting # Count unique values in 'Generic.Species.of.Tree' and 'Precise.Species.of.Tree', handling NA and sorting
generic_species_count <- species_data %>% generic_species_count <- species_data %>%
count(`Generic.Species.of.Tree`) %>% count(`Generic.Species.of.Tree`) %>%