Improve README file.

This commit is contained in:
Nick Heppler 2025-04-15 18:01:15 -04:00
parent 6f412348fe
commit ed8647df43

190
README.md
View File

@ -1,126 +1,170 @@
![753 Data Sync logo](https://git.nickhepler.cloud/nick/753-Data-Sync/raw/branch/master/logo.png)
# 753 Data Sync
*A Python-based data ingestion tool for syncing enforcement data from a public API to ArcGIS Online.*
![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&style=for-the-badge&logo=Python)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects)
![Enhancements](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements)
![Defects](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects)
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks:
---
- **Truncate** the specified layer in ArcGIS to clear any previous features before adding new ones.
- **Fetch** data from an API in paginated form.
- **Save** data from each API response to individual JSON files.
- **Aggregate** all data from all pages into one JSON file.
- **Add** the aggregated data as features to an ArcGIS feature service.
## 🚀 Overview
## Requirements
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. It also logs the operation, saves data to JSON files, and optionally purges old files.
---
## 📦 Requirements
- Python 3.6 or higher
- Required Python packages (see `requirements.txt`)
- ArcGIS Online credentials (username and password)
- `.env` file for configuration (see below for details)
- Required packages in `requirements.txt`
- `.env` file with your configuration
- ArcGIS Online credentials
## Install Dependencies
---
To install the required dependencies, use the following command:
## 🔧 Installation
```bash
pip install -r requirements.txt
```
Alternatively, you can install the necessary packages individually:
Or install packages individually:
```bash
pip install requests
pip install python-dotenv
pip install requests python-dotenv
```
## Configuration
---
Before running the script, you need to configure some environment variables. Create a `.env` file in the root of your project with the following details:
## ⚙️ Configuration
Create a `.env` file in the root of your project:
```env
API_URL=your_api_url
AGOL_USER=your_arcgis_online_username
AGOL_PASSWORD=your_arcgis_online_password
HOSTNAME=your_arcgis_host
INSTANCE=your_arcgis_instance
API_URL=https://example.com/api
AGOL_USER=your_username
AGOL_PASSWORD=your_password
HOSTNAME=www.arcgis.com
INSTANCE=your_instance
FS=your_feature_service
LAYER=your_layer_id
LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL
LAYER=0
LOG_LEVEL=DEBUG
PURGE_DAYS=5
```
### Environment Variables:
### Required Variables
- **API_URL**: The URL of the API you are fetching data from.
- **AGOL_USER**: Your ArcGIS Online username.
- **AGOL_PASSWORD**: Your ArcGIS Online password.
- **HOSTNAME**: The hostname of your ArcGIS Online instance (e.g., `www.arcgis.com`).
- **INSTANCE**: The instance name of your ArcGIS Online service.
- **FS**: The name of the feature service you are working with.
- **LAYER**: The ID or name of the layer to truncate and add features to.
- **LOG_LEVEL**: The desired logging level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`).
| Variable | Description |
|----------------|--------------------------------------------|
| `API_URL` | The API endpoint to fetch data from |
| `AGOL_USER` | ArcGIS Online username |
| `AGOL_PASSWORD`| ArcGIS Online password |
| `HOSTNAME` | ArcGIS host (e.g., `www.arcgis.com`) |
| `INSTANCE` | ArcGIS REST instance path |
| `FS` | Feature service name |
| `LAYER` | Feature layer ID or name |
## Script Usage
### Optional Variables
You can run the script with the following command:
| Variable | Description |
|----------------|--------------------------------------------|
| `LOG_LEVEL` | Log level (`DEBUG`, `INFO`, etc.) |
| `PURGE_DAYS` | Number of days to retain logs and JSONs |
---
## 🧪 Script Usage
```bash
python 753DataSync.py --results_per_page <number_of_results_per_page>
python 753DataSync.py --results_per_page 100
```
### Arguments:
### CLI Arguments
- `--results_per_page` (optional): The number of results to fetch per page (default: 100).
| Argument | Description |
|----------------------|---------------------------------------------|
| `--results_per_page` | Optional. Number of results per API call (default: `100`) |
## Functionality
---
### 1. **Truncate Layer**:
Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data.
## 📋 Functionality
### 2. **Fetch Data**:
The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved.
1. **🔁 Truncate Layer** — Clears existing ArcGIS features.
2. **🌐 Fetch Data** — Retrieves paginated data from the API.
3. **💾 Save Data** — Writes each page to a time-stamped JSON file.
4. **📦 Aggregate Data** — Combines all pages into one file.
5. **📤 Add Features** — Sends data to ArcGIS feature layer.
6. **🧹 File Cleanup** — Deletes `.json`/`.log` files older than `PURGE_DAYS`.
7. **📑 Dynamic Logs** — Logs saved to `753DataSync_YYYY-MM-DD.log`.
### 3. **Save Data**:
Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file.
---
### 4. **Add Features**:
After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer.
## 📁 Example Output
## Example Output
```bash
📁 data/
├── enforcement_page_1_results_100_2025-03-26_14-30-45.json
├── enforcement_page_2_results_100_2025-03-26_14-31-10.json
└── aggregated_enforcement_results_2025-03-26_14-31-15.json
- Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`.
- The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`.
📄 753DataSync_2025-03-26.log
```
Logs will also be generated in the `753DataSync.log` file and printed to the console.
---
## Example Output (Log)
## 📝 Example Log
```text
2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/...
2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/...
2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100
2025-03-26 14:30:55 - INFO - Data saved to data/enforcement_page_1_results_100_2025-03-26_14-30-45.json
2025-03-26 14:30:56 - INFO - No more data to fetch, stopping pagination.
2025-03-26 14:30:57 - INFO - Data saved to data/aggregated_enforcement_results_2025-03-26_14-30-45.json
2025-03-26 14:30:45 - INFO - Attempting to truncate layer...
2025-03-26 14:30:51 - INFO - Fetching page 1 from API...
2025-03-26 14:30:55 - INFO - Saved data to data/enforcement_page_1_results_100_...
2025-03-26 14:30:57 - INFO - Aggregated data saved.
2025-03-26 14:31:00 - INFO - Features added successfully.
2025-03-26 14:31:01 - INFO - Deleted old log: 753DataSync_2025-03-19.log
```
## Error Handling
---
The script handles errors gracefully, including:
## 🛠 Troubleshooting
- If an error occurs while fetching data, the script will log the error and stop execution.
- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution.
- The script handles HTTP errors and network-related errors gracefully, ensuring that any issues are logged with detailed information.
- Set `LOG_LEVEL=DEBUG` in `.env` for detailed logs.
- Ensure `.env` has no syntax errors.
- Make sure your ArcGIS layer has permission for truncation and writes.
- Check for internet/API access and expired ArcGIS tokens.
- Logs are written to both console and daily log files.
## Troubleshooting
---
- If the script unexpectedly stops, check the logs (`753DataSync.log`) for detailed error information.
- Ensure the `.env` file is correctly configured with valid credentials and API URL.
- Confirm that your ArcGIS layer has the correct permissions to allow truncation and feature addition.
- If you encounter network issues, make sure your system has proper internet access and that the API endpoint is available.
- For debugging, ensure that you have set the `LOG_LEVEL` to `DEBUG` in your `.env` file for detailed logs.
## 🧪 Testing
## License
Currently, the script is tested manually. Automated testing may be added under a `/tests` folder in the future.
This project is licensed under the [GNU General Public License v3.0](LICENSE), which allows you to freely use, modify, and distribute the code, provided that you include the same license in derivative works.
---
## 📖 Usage Examples
```bash
# Run with default page size
python 753DataSync.py
# Run with custom page size
python 753DataSync.py --results_per_page 50
```
---
## 💬 Support
Found a bug or want to request a feature?
[Open an issue](https://git.nickhepler.cloud/nick/753-Data-Sync/issues) or contact [@nick](https://git.nickhepler.cloud/nick) directly.
---
## 📜 License
This project is licensed under the [GNU General Public License v3.0](LICENSE).
> 💡 *You are free to use, modify, and share this project as long as you preserve the same license in your changes.*