Updtate text, add species data, add labels for professional.
This commit is contained in:
parent
b25e1ba2c3
commit
4ae423f023
100
report.Rmd
100
report.Rmd
@ -3,9 +3,8 @@ title: "25 Million Trees Initiative Survey Report"
|
||||
author:
|
||||
- name: Nicholas Hepler <nicholas.hepler@its.ny.gov>
|
||||
affiliation: Office of Information Technology Services
|
||||
- name: Annabel Gregg <annabel.gregg@dec.ny.gov>
|
||||
affiliation: Department of Environmental Conservation
|
||||
date: "`r format(Sys.Date(), '%B, %d, %Y')`"
|
||||
keywords: "keyword1, keyword2"
|
||||
output: html_document
|
||||
---
|
||||
|
||||
@ -15,9 +14,13 @@ library(tidyverse)
|
||||
library(lubridate)
|
||||
library(ggplot2)
|
||||
|
||||
# Read the CSV file into a dataframe
|
||||
file_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
|
||||
survey_data <- read_csv(file_path)
|
||||
# Read the CSV files into a dataframe
|
||||
survey_path <- "data/_25_Million_Trees_Initiative_Survey_0.csv"
|
||||
survey_data <- read_csv(survey_path)
|
||||
|
||||
species_path <- "data/species_planted_4.csv"
|
||||
species_data <- read.csv(species_path)
|
||||
|
||||
|
||||
# Convert the CreationDate field to a proper datetime object (if applicable)
|
||||
survey_data <- survey_data %>%
|
||||
@ -32,9 +35,6 @@ excluded_count <- survey_data %>%
|
||||
used_count <- survey_data %>%
|
||||
filter(`Exclude Result` == 0) %>%
|
||||
nrow()
|
||||
|
||||
survey_data <- survey_data %>%
|
||||
filter(`Exclude Result` == 0)
|
||||
```
|
||||
|
||||
# {.tabset .tabset-fade .tabset-pills}
|
||||
@ -61,50 +61,56 @@ As more individuals contribute their data to the Tree Tracker, the initiative's
|
||||
|
||||
## Submission Overview
|
||||
|
||||
This section contains information about surveys to the Tree Tracker Tool.
|
||||
This section provides an overview of the surveys submitted to the Tree Tracker Tool, detailing the survey period, exclusions, validation processes, and key data checks to ensure accuracy and consistency.
|
||||
|
||||
### Survey Period and Exclusions
|
||||
|
||||
This report covers the period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, with a total of **`r nrow(survey_data)`** records. Of these, **`r used_count`** records were included in the analysis.
|
||||
The report covers the survey period from **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**, including a total of **`r nrow(survey_data)`** records. Out of these, **`r used_count`** records were deemed valid and included in the analysis.
|
||||
|
||||
- **Exclusions**: **`r excluded_count`** records have been excluded from the analysis. The primary reasons for exclusion include:
|
||||
- **Double Count**: Some records were excluded to prevent duplication of data (e.g., surveys or submissions that were entered multiple times).
|
||||
- **Test Data**: Some submissions were excluded as they were entered for testing purposes and do not represent actual survey submissions.
|
||||
Exclusions were applied to **`r excluded_count`** records, which were removed due to various reasons, such as:
|
||||
|
||||
These records are identified by the `Exclude Result` field, where a value of **1** indicates the record was excluded due to one of these reasons.
|
||||
- **Double Count**: Some submissions were identified as duplicates and excluded to prevent data redundancy.
|
||||
- **Test Data**: Entries that were intended solely for testing purposes were excluded, as they do not represent actual survey data.
|
||||
|
||||
- **Included Records**: **`r used_count`** records have been included in the report. These are valid survey submissions, marked with a value of **0** in the `Exclude Result` field, indicating they are legitimate data points.
|
||||
These excluded records are marked with a value of **1** in the `Exclude Result` field. The remaining **`r used_count`** records, marked with a **0**, represent legitimate data points that were included in the analysis.
|
||||
|
||||
---
|
||||
abstract: "This report was generated on: **`r format(Sys.Date(), '%B, %d, %Y')`** for the following Reporting Period: **`r format(min(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`** to **`r format(max(survey_data$CreationDate, na.rm = TRUE), "%B %d, %Y")`**. **`r used_count`** records were used in this analysis."
|
||||
---
|
||||
|
||||
### Survey Validation and Data Consistency
|
||||
|
||||
To ensure data integrity, several validation steps are applied to survey submissions:
|
||||
|
||||
- **Required Fields**:
|
||||
- **Who Planted The Tree(s)?**: A descriptor of the particpant's role in the tree planting effort.
|
||||
- **Number of Trees**: The number of trees planted is a required field, and users cannot submit the survey without providing this information.
|
||||
- **Geographic Data**: Geographic coordinates (latitude and longitude) are also required, and users must provide this data when submitting their survey.
|
||||
- **Start Date of Planting**:
|
||||
- **End Date of Planting**:
|
||||
- **Location**: Geographic coordinates (latitude and longitude) are also required, and users must provide this data when submitting their survey.
|
||||
|
||||
- **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to ensure the accuracy of locality, county, and region data. In rare cases, this check may fail due to discrepancies in coordinates, but such records are corrected before inclusion in the analysis.
|
||||
- **Response Validation**:
|
||||
- **Geographic Validation**: Once geographic coordinates are entered, they are checked against official civil boundaries to provide an accurate nominal locality, county, and region data. In rare cases, this check may fail due to service dependency, but such records are corrected before inclusion in the analysis.
|
||||
|
||||
- **Data Correction for Missing Information**: In cases where certain critical fields (such as geographic location or number of trees planted) are missing due to system issues, records are corrected prior to report generation. This ensures that only complete and accurate records are included in the analysis.
|
||||
- **Date Validation and Logic**: Users cannot enter planting dates prior to the start date of the initiative. The system enforces this restriction, and any records with such dates are not allowed to be submitted. Additionally, users cannot enter a planting end date that occurs before the planting start date.
|
||||
|
||||
- **Date Logic**:
|
||||
- **Program Start Date**: Users cannot enter planting dates prior to the official Program Start Date. The system enforces this restriction, and any records with such dates are not allowed to be submitted.
|
||||
|
||||
- **Format and Consistency Checks**:
|
||||
- **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format.
|
||||
- **Optional Questions**: Even optional questions undergo validation to ensure the entered data meets the expected format or logic, providing further consistency and accuracy.
|
||||
- **Email Format**: The email addresses entered in the survey are validated to ensure they follow the correct format.
|
||||
|
||||
By applying these validation checks, the integrity and consistency of the data is ensured, allowing for meaningful analysis of tree planting surveys.
|
||||
|
||||
### Submission Trend
|
||||
|
||||
With this context in mind, the following visualization shows the trend in the total number of submissions over the survey period, highlighting any notable patterns.
|
||||
The following visualization illustrates the trend in the total number of submissions throughout the survey period, providing insights into any patterns or changes in submission activity.
|
||||
|
||||
```{r submission-trend, echo=FALSE, message=FALSE, fig.height=6, fig.width=8}
|
||||
|
||||
library(ggplot2)
|
||||
library(dplyr)
|
||||
|
||||
survey_data <- survey_data %>%
|
||||
filter(`Exclude Result` == 0)
|
||||
|
||||
survey_data$CreationDate <- as.Date(survey_data$CreationDate)
|
||||
|
||||
# Summarize the data to calculate the total number of submissions by CreationDate
|
||||
@ -135,7 +141,7 @@ ggplot(summary_data, aes(x = CreationDate, y = total_submissions)) +
|
||||
```
|
||||
|
||||
### Response Rates
|
||||
The following table provides the response rates for a set of optional fields in the survey dataset. Each field represents a different aspect of the survey, and the response rate is calculated as the percentage of respondents who provided a valid answer.
|
||||
The table below shows the response rates for a selection of optional fields within the survey. Each field represents a different aspect of the survey, and the response rate reflects the percentage of respondents who provided valid answers for each field.
|
||||
|
||||
- **Planter Contact Email**: The percentage of respondents who provided their email address.
|
||||
- **Funding Source**: The percentage of respondents who identified their funding source.
|
||||
@ -144,7 +150,7 @@ The following table provides the response rates for a set of optional fields in
|
||||
- **Source of Trees**: The percentage of respondents who reported the source of the trees they planted.
|
||||
- **Species Planted**: The percentage of respondents who provided the species of tree(s) they planted.
|
||||
|
||||
The data is sorted from the highest to the lowest response rate, allowing for easy identification of fields with higher or lower levels of respondent engagement. This helps to identify areas where respondents may have been more likely to provide answers, as well as fields that could benefit from clarification or further encouragement to respond.
|
||||
This breakdown helps identify which survey fields received higher levels of engagement, and which may require further clarification or encouragement to improve response rates.
|
||||
|
||||
```{r response-rate, echo=FALSE, message=FALSE}
|
||||
# List of fields to check for response rates, with special handling for 'Total Number of Species Planted'
|
||||
@ -198,7 +204,8 @@ ggplot(survey_data, aes(x = `Who Planted The Tree(s)?`)) +
|
||||
"agency" = "State Agency",
|
||||
"community" = "Community Organization",
|
||||
"landowner" = "Private Landowner",
|
||||
"municipality" = "Municipal Government"
|
||||
"municipality" = "Municipal Government",
|
||||
"professional" = "Paid Professional"
|
||||
)) +
|
||||
theme_minimal(base_size = 14) +
|
||||
theme(
|
||||
@ -242,7 +249,8 @@ ggplot(summary_data, aes(x = `Who Planted The Tree(s)?`, y = total_trees)) +
|
||||
"agency" = "State Agency",
|
||||
"community" = "Community Organization",
|
||||
"landowner" = "Private Landowner",
|
||||
"municipality" = "Municipal Government"
|
||||
"municipality" = "Municipal Government",
|
||||
"professional" = "Paid Professional"
|
||||
)) +
|
||||
theme_minimal(base_size = 14) + # Adjusted base font size for clarity
|
||||
theme(
|
||||
@ -268,7 +276,8 @@ summary_data <- summary_data %>%
|
||||
"agency" = "State Agency",
|
||||
"community" = "Community Organization",
|
||||
"landowner" = "Private Landowner",
|
||||
"municipality" = "Municipal Government")
|
||||
"municipality" = "Municipal Government",
|
||||
"professional" = "Paid Professional")
|
||||
)
|
||||
|
||||
# Add percentage column
|
||||
@ -326,7 +335,7 @@ region_summary_data_formatted %>%
|
||||
## County Overview
|
||||
This section provides an overview of counties involved and response to the tree planting survey.
|
||||
|
||||
In the table below, we aggregate plantings by Region. The results are provided in descending order of Total Trees Planted.
|
||||
In the table below, we aggregate plantings by County. The results are provided in descending order of Total Trees Planted.
|
||||
```{r county-summary, echo=FALSE, warning=FALSE, message=FALSE}
|
||||
# Summarize the data by Region
|
||||
county_summary_data <- survey_data %>%
|
||||
@ -356,6 +365,37 @@ county_summary_data_formatted %>%
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Species Overview
|
||||
The following section contains details on species plantings. These results indicate the number of occurrences where the tree species was planted. They are not necessarily the number of those trees planted, but can be used to indicate popularity.
|
||||
|
||||
```{r species-detail, echo=FALSE, message=FALSE}
|
||||
# Load the required libraries
|
||||
library(tidyverse)
|
||||
# Count unique values in 'Generic.Species.of.Tree' and 'Precise.Species.of.Tree', handling NA and sorting
|
||||
generic_species_count <- species_data %>%
|
||||
count(`Generic.Species.of.Tree`) %>%
|
||||
mutate(
|
||||
`Generic.Species.of.Tree` = if_else(is.na(`Generic.Species.of.Tree`), "Null Response", `Generic.Species.of.Tree`),
|
||||
`Generic.Species.of.Tree` = str_replace_all(`Generic.Species.of.Tree`, "_", " "), # Replace underscores with spaces
|
||||
`Generic.Species.of.Tree` = str_to_title(`Generic.Species.of.Tree`) # Convert to Title Case
|
||||
) %>%
|
||||
arrange(desc(n)) # Sort by count in descending order
|
||||
|
||||
precise_species_count <- species_data %>%
|
||||
count(`Precise.Species.of.Tree`) %>%
|
||||
mutate(
|
||||
`Precise.Species.of.Tree` = if_else(is.na(`Precise.Species.of.Tree`), "Null Response", `Precise.Species.of.Tree`),
|
||||
`Precise.Species.of.Tree` = str_replace_all(`Precise.Species.of.Tree`, "_", " "), # Replace underscores with spaces
|
||||
`Precise.Species.of.Tree` = str_to_title(`Precise.Species.of.Tree`) # Convert to Title Case
|
||||
) %>%
|
||||
arrange(desc(n)) # Sort by count in descending order
|
||||
|
||||
# Print the results
|
||||
print(generic_species_count)
|
||||
print(precise_species_count)
|
||||
```
|
||||
|
||||
## Tree Count
|
||||
In this section, we present summary statistics for the number of trees planted by all participants in various tree planting surveys.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user