Update README.md
This commit is contained in:
@@ -1,76 +1,139 @@
|
|||||||
|
|
||||||
# Tree Planting Data Validator 🌳
|
# Tree Planting Data Validator 🌳
|
||||||
|
|
||||||
A lightweight Python script that validates tree planting records stored in Excel files. It reads the data, cleans trailing blank rows, counts records, and produces a clear validation report.
|
**A robust Python script to validate Excel-based tree planting records.**
|
||||||
|
|
||||||
Perfect for NGOs, reforestation projects, environmental organizations, or anyone managing large-scale tree planting data.
|
This tool ensures that tree planting data submissions meet quality standards before being imported into databases or mapping systems. It performs comprehensive checks on participant types, planting dates, geographic coordinates, and tree sources.
|
||||||
|
|
||||||
## Features
|
---
|
||||||
|
|
||||||
- Reads Excel files with customizable header row
|
## ✨ Features
|
||||||
- Automatically cleans trailing/empty rows
|
|
||||||
- Reports total records processed
|
|
||||||
- Extensible validation framework (ready for additional checks)
|
|
||||||
- Command-line interface with useful options
|
|
||||||
- Optional CSV error report export
|
|
||||||
|
|
||||||
## Requirements
|
- **Flexible column matching**: Automatically detects common column name variations
|
||||||
|
- **Data cleaning**: Removes blank rows and normalizes input
|
||||||
|
- **Comprehensive validation**:
|
||||||
|
- Required fields check
|
||||||
|
- Participant type validation
|
||||||
|
- Date range and format validation (no future dates, no pre-2024 planting)
|
||||||
|
- Geographic boundary validation (Northeast US focus)
|
||||||
|
- Tree source categorization
|
||||||
|
- **Clear, human-readable error reports** grouped by Excel row
|
||||||
|
- **Command-line interface** with optional CSV error export
|
||||||
|
- **Debug mode** for troubleshooting
|
||||||
|
|
||||||
- Python 3.8+
|
---
|
||||||
- pandas
|
|
||||||
- openpyxl (for `.xlsx` support)
|
|
||||||
|
|
||||||
## Installation
|
## 📋 Validation Rules
|
||||||
|
|
||||||
|
| Field | Requirements |
|
||||||
|
|--------------------|--------------|
|
||||||
|
| **Participant Type** | Must be one of: `landowner`, `community`, `professional`, `municipality`, `agency` |
|
||||||
|
| **Plant Start** | Required, valid date, on or after Jan 1, 2024, not in the future |
|
||||||
|
| **Plant End** | Required, valid date, not in the future |
|
||||||
|
| **Latitude** | Required, numeric, between **40.4774** and **45.01585** |
|
||||||
|
| **Longitude** | Required, numeric, between **-79.5** and **-71.1** |
|
||||||
|
| **Tree Source** | Required, one of: `local`, `regional`, `national`, `plant_sale`, `self`, `other` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Installation
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
```bash
|
```bash
|
||||||
git clone https://your-gitea-instance.com/yourusername/tree-planting-data-validator.git
|
git clone <your-repo-url>
|
||||||
cd tree-planting-data-validator
|
cd tree-planting-validator
|
||||||
|
|
||||||
# Recommended: use a virtual environment
|
```
|
||||||
|
|
||||||
|
2. (Optional but recommended) Create a virtual environment:
|
||||||
|
```bash
|
||||||
python -m venv venv
|
python -m venv venv
|
||||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
source venv/bin/activate # Linux/Mac
|
||||||
|
# venv\Scripts\activate # Windows
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Install dependencies:
|
||||||
|
```bash
|
||||||
pip install pandas openpyxl
|
pip install pandas openpyxl
|
||||||
Usagebash
|
```
|
||||||
|
|
||||||
python validate_tree_planting_data.py "path/to/your_data.xlsx"
|
---
|
||||||
|
|
||||||
OptionsArgument
|
## 📖 Usage
|
||||||
Description
|
|
||||||
Default
|
|
||||||
excel_file_path
|
|
||||||
Path to the Excel file
|
|
||||||
-
|
|
||||||
--header-row
|
|
||||||
Row number containing headers
|
|
||||||
1
|
|
||||||
--debug
|
|
||||||
Show debug information
|
|
||||||
False
|
|
||||||
--output
|
|
||||||
Save errors to CSV file
|
|
||||||
-
|
|
||||||
|
|
||||||
Example:bash
|
### Basic usage
|
||||||
|
```bash
|
||||||
|
python validate_tree_planting_data.py "path/to/your/data.xlsx"
|
||||||
|
```
|
||||||
|
|
||||||
# Basic usage
|
### With options
|
||||||
python validate_tree_planting_data.py planting_data_2025.xlsx
|
```bash
|
||||||
|
python validate_tree_planting_data.py "data.xlsx" \
|
||||||
|
--header-row 2 \
|
||||||
|
--debug \
|
||||||
|
--output errors_report.csv
|
||||||
|
```
|
||||||
|
|
||||||
# With custom header and output report
|
---
|
||||||
python validate_tree_planting_data.py data.xlsx --header-row 2 --output errors.csv --debug
|
|
||||||
|
|
||||||
Project Structure
|
## Command Line Arguments
|
||||||
|
|
||||||
tree-planting-data-validator/
|
| Argument | Description | Default |
|
||||||
├── validate_tree_planting_data.py # Main script
|
|-------------------|--------------------------------------------------|------------|
|
||||||
├── README.md
|
| `excel_file_path` | Path to the Excel file (required) | - |
|
||||||
├── LICENSE
|
| `--header-row` | Row number containing column headers | `1` |
|
||||||
└── examples/ # (optional) sample Excel files
|
| `--debug` | Enable debug output | `False` |
|
||||||
|
| `--output` | Save validation errors to a CSV file | None |
|
||||||
|
|
||||||
Roadmap / Future EnhancementsField-level validation (dates, coordinates, species names, etc.)
|
---
|
||||||
Duplicate detection
|
|
||||||
Summary statistics (trees per species, region, etc.)
|
|
||||||
Web interface / CLI dashboard
|
|
||||||
Support for CSV input
|
|
||||||
|
|
||||||
LicenseMIT License — feel free to use and adapt for your tree-planting projects.
|
## Example Output
|
||||||
|
|
||||||
|
**Success:**
|
||||||
|
```
|
||||||
|
==========================================================================================
|
||||||
|
✅ VALIDATION SUCCESSFUL
|
||||||
|
Processed 245 records
|
||||||
|
==========================================================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
**With Errors:**
|
||||||
|
```
|
||||||
|
❌ VALIDATION REPORT
|
||||||
|
Total records processed : 245
|
||||||
|
Total errors found : 12
|
||||||
|
|
||||||
|
📍 Excel Row 47
|
||||||
|
-----------------------------------------------------------------
|
||||||
|
• Plant Start..................... Future date
|
||||||
|
• Longitude....................... Out of range
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Requirements
|
||||||
|
|
||||||
|
- Excel file (`.xlsx` or `.xls`)
|
||||||
|
- Header row with recognizable column names (case-insensitive)
|
||||||
|
- At minimum: `Participant Type`, `Plant Start`, `Plant End`, `Latitude`, `Longitude`, `Tree Source`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions are welcome! Feel free to submit issues or pull requests for:
|
||||||
|
- Additional validation rules
|
||||||
|
- Support for more file formats
|
||||||
|
- Enhanced reporting features
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is open source. Feel free to use and modify as needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Made for tree planting initiatives** 🌱
|
||||||
|
|||||||
Reference in New Issue
Block a user