Add Configurable Log and Data Retention #8

Merged
nick merged 2 commits from feature/logfile-purge-support into master 2025-04-15 18:18:16 -04:00
2 changed files with 157 additions and 75 deletions

190
README.md
View File

@ -1,126 +1,170 @@
![753 Data Sync logo](https://git.nickhepler.cloud/nick/753-Data-Sync/raw/branch/master/logo.png) ![753 Data Sync logo](https://git.nickhepler.cloud/nick/753-Data-Sync/raw/branch/master/logo.png)
# 753 Data Sync # 753 Data Sync
*A Python-based data ingestion tool for syncing enforcement data from a public API to ArcGIS Online.*
![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&style=for-the-badge&logo=Python) ![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&style=for-the-badge&logo=Python)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements) ![Enhancements](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects) ![Defects](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects)
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks: ---
- **Truncate** the specified layer in ArcGIS to clear any previous features before adding new ones. ## 🚀 Overview
- **Fetch** data from an API in paginated form.
- **Save** data from each API response to individual JSON files.
- **Aggregate** all data from all pages into one JSON file.
- **Add** the aggregated data as features to an ArcGIS feature service.
## Requirements This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. It also logs the operation, saves data to JSON files, and optionally purges old files.
---
## 📦 Requirements
- Python 3.6 or higher - Python 3.6 or higher
- Required Python packages (see `requirements.txt`) - Required packages in `requirements.txt`
- ArcGIS Online credentials (username and password) - `.env` file with your configuration
- `.env` file for configuration (see below for details) - ArcGIS Online credentials
## Install Dependencies ---
To install the required dependencies, use the following command: ## 🔧 Installation
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
Alternatively, you can install the necessary packages individually: Or install packages individually:
```bash ```bash
pip install requests pip install requests python-dotenv
pip install python-dotenv
``` ```
## Configuration ---
Before running the script, you need to configure some environment variables. Create a `.env` file in the root of your project with the following details: ## ⚙️ Configuration
Create a `.env` file in the root of your project:
```env ```env
API_URL=your_api_url API_URL=https://example.com/api
AGOL_USER=your_arcgis_online_username AGOL_USER=your_username
AGOL_PASSWORD=your_arcgis_online_password AGOL_PASSWORD=your_password
HOSTNAME=your_arcgis_host HOSTNAME=www.arcgis.com
INSTANCE=your_arcgis_instance INSTANCE=your_instance
FS=your_feature_service FS=your_feature_service
LAYER=your_layer_id LAYER=0
LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL LOG_LEVEL=DEBUG
PURGE_DAYS=5
``` ```
### Environment Variables: ### Required Variables
- **API_URL**: The URL of the API you are fetching data from. | Variable | Description |
- **AGOL_USER**: Your ArcGIS Online username. |----------------|--------------------------------------------|
- **AGOL_PASSWORD**: Your ArcGIS Online password. | `API_URL` | The API endpoint to fetch data from |
- **HOSTNAME**: The hostname of your ArcGIS Online instance (e.g., `www.arcgis.com`). | `AGOL_USER` | ArcGIS Online username |
- **INSTANCE**: The instance name of your ArcGIS Online service. | `AGOL_PASSWORD`| ArcGIS Online password |
- **FS**: The name of the feature service you are working with. | `HOSTNAME` | ArcGIS host (e.g., `www.arcgis.com`) |
- **LAYER**: The ID or name of the layer to truncate and add features to. | `INSTANCE` | ArcGIS REST instance path |
- **LOG_LEVEL**: The desired logging level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`). | `FS` | Feature service name |
| `LAYER` | Feature layer ID or name |
## Script Usage ### Optional Variables
You can run the script with the following command: | Variable | Description |
|----------------|--------------------------------------------|
| `LOG_LEVEL` | Log level (`DEBUG`, `INFO`, etc.) |
| `PURGE_DAYS` | Number of days to retain logs and JSONs |
---
## 🧪 Script Usage
```bash ```bash
python 753DataSync.py --results_per_page <number_of_results_per_page> python 753DataSync.py --results_per_page 100
``` ```
### Arguments: ### CLI Arguments
- `--results_per_page` (optional): The number of results to fetch per page (default: 100). | Argument | Description |
|----------------------|---------------------------------------------|
| `--results_per_page` | Optional. Number of results per API call (default: `100`) |
## Functionality ---
### 1. **Truncate Layer**: ## 📋 Functionality
Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data.
### 2. **Fetch Data**: 1. **🔁 Truncate Layer** — Clears existing ArcGIS features.
The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved. 2. **🌐 Fetch Data** — Retrieves paginated data from the API.
3. **💾 Save Data** — Writes each page to a time-stamped JSON file.
4. **📦 Aggregate Data** — Combines all pages into one file.
5. **📤 Add Features** — Sends data to ArcGIS feature layer.
6. **🧹 File Cleanup** — Deletes `.json`/`.log` files older than `PURGE_DAYS`.
7. **📑 Dynamic Logs** — Logs saved to `753DataSync_YYYY-MM-DD.log`.
### 3. **Save Data**: ---
Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file.
### 4. **Add Features**: ## 📁 Example Output
After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer.
## Example Output ```bash
📁 data/
├── enforcement_page_1_results_100_2025-03-26_14-30-45.json
├── enforcement_page_2_results_100_2025-03-26_14-31-10.json
└── aggregated_enforcement_results_2025-03-26_14-31-15.json
- Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`. 📄 753DataSync_2025-03-26.log
- The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`. ```
Logs will also be generated in the `753DataSync.log` file and printed to the console. ---
## Example Output (Log) ## 📝 Example Log
```text ```text
2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/... 2025-03-26 14:30:45 - INFO - Attempting to truncate layer...
2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/... 2025-03-26 14:30:51 - INFO - Fetching page 1 from API...
2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100 2025-03-26 14:30:55 - INFO - Saved data to data/enforcement_page_1_results_100_...
2025-03-26 14:30:55 - INFO - Data saved to data/enforcement_page_1_results_100_2025-03-26_14-30-45.json 2025-03-26 14:30:57 - INFO - Aggregated data saved.
2025-03-26 14:30:56 - INFO - No more data to fetch, stopping pagination.
2025-03-26 14:30:57 - INFO - Data saved to data/aggregated_enforcement_results_2025-03-26_14-30-45.json
2025-03-26 14:31:00 - INFO - Features added successfully. 2025-03-26 14:31:00 - INFO - Features added successfully.
2025-03-26 14:31:01 - INFO - Deleted old log: 753DataSync_2025-03-19.log
``` ```
## Error Handling ---
The script handles errors gracefully, including: ## 🛠 Troubleshooting
- If an error occurs while fetching data, the script will log the error and stop execution. - Set `LOG_LEVEL=DEBUG` in `.env` for detailed logs.
- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution. - Ensure `.env` has no syntax errors.
- The script handles HTTP errors and network-related errors gracefully, ensuring that any issues are logged with detailed information. - Make sure your ArcGIS layer has permission for truncation and writes.
- Check for internet/API access and expired ArcGIS tokens.
- Logs are written to both console and daily log files.
## Troubleshooting ---
- If the script unexpectedly stops, check the logs (`753DataSync.log`) for detailed error information. ## 🧪 Testing
- Ensure the `.env` file is correctly configured with valid credentials and API URL.
- Confirm that your ArcGIS layer has the correct permissions to allow truncation and feature addition.
- If you encounter network issues, make sure your system has proper internet access and that the API endpoint is available.
- For debugging, ensure that you have set the `LOG_LEVEL` to `DEBUG` in your `.env` file for detailed logs.
## License Currently, the script is tested manually. Automated testing may be added under a `/tests` folder in the future.
This project is licensed under the [GNU General Public License v3.0](LICENSE), which allows you to freely use, modify, and distribute the code, provided that you include the same license in derivative works. ---
## 📖 Usage Examples
```bash
# Run with default page size
python 753DataSync.py
# Run with custom page size
python 753DataSync.py --results_per_page 50
```
---
## 💬 Support
Found a bug or want to request a feature?
[Open an issue](https://git.nickhepler.cloud/nick/753-Data-Sync/issues) or contact [@nick](https://git.nickhepler.cloud/nick) directly.
---
## 📜 License
This project is licensed under the [GNU General Public License v3.0](LICENSE).
> 💡 *You are free to use, modify, and share this project as long as you preserve the same license in your changes.*

42
app.py
View File

@ -17,6 +17,10 @@ load_dotenv("753DataSync.env")
BASE_URL = "{}/{}/{}" BASE_URL = "{}/{}/{}"
log_level = os.getenv('LOG_LEVEL', 'INFO').upper() log_level = os.getenv('LOG_LEVEL', 'INFO').upper()
# Get the current date for dynamic log file naming
current_date = datetime.now().strftime("%Y-%m-%d")
log_filename = f"753DataSync_{current_date}.log"
# Setup logging # Setup logging
logger = logging.getLogger() logger = logging.getLogger()
@ -34,8 +38,8 @@ elif log_level == 'CRITICAL':
else: else:
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
# File handler # File handler for dynamic log file
file_handler = logging.FileHandler('753DataSync.log') file_handler = logging.FileHandler(log_filename)
file_handler.setLevel(getattr(logging, log_level)) file_handler.setLevel(getattr(logging, log_level))
# Stream handler (console output) # Stream handler (console output)
@ -51,6 +55,35 @@ stream_handler.setFormatter(formatter)
logger.addHandler(file_handler) logger.addHandler(file_handler)
logger.addHandler(stream_handler) logger.addHandler(stream_handler)
def purge_old_files(purge_days):
"""Purge log and data files older than PURGE_DAYS from the 'data' folder."""
data_folder = 'data'
log_folder = '.' # Log files are in the current directory
if not os.path.exists(data_folder):
logger.warning(f"The '{data_folder}' folder does not exist.")
return
purge_threshold = datetime.now() - timedelta(days=purge_days)
# Delete old log files
for filename in os.listdir(log_folder):
if filename.endswith(".log"):
file_path = os.path.join(log_folder, filename)
file_modified_time = datetime.fromtimestamp(os.path.getmtime(file_path))
if file_modified_time < purge_threshold:
logger.info(f"Deleting old log file: {file_path}")
os.remove(file_path)
# Delete old data files
for filename in os.listdir(data_folder):
file_path = os.path.join(data_folder, filename)
if filename.endswith(".json"):
file_modified_time = datetime.fromtimestamp(os.path.getmtime(file_path))
if file_modified_time < purge_threshold:
logger.info(f"Deleting old data file: {file_path}")
os.remove(file_path)
def fetch_data(api_url, page_number, results_per_page): def fetch_data(api_url, page_number, results_per_page):
"""Fetches data from the API and returns the response.""" """Fetches data from the API and returns the response."""
url = BASE_URL.format(api_url, page_number, results_per_page) url = BASE_URL.format(api_url, page_number, results_per_page)
@ -266,6 +299,11 @@ def main():
try: try:
logger.info("Starting script execution.") logger.info("Starting script execution.")
# Check and purge old files before processing
purge_days = int(os.getenv("PURGE_DAYS", 30)) # Default to 30 days if not set
logger.info(f"Purging files older than {purge_days} days.")
purge_old_files(purge_days)
# Parse command-line arguments # Parse command-line arguments
results_per_page = parse_arguments() results_per_page = parse_arguments()
logger.info(f"Parsed arguments: results_per_page={results_per_page}") logger.info(f"Parsed arguments: results_per_page={results_per_page}")