Compare commits

..

No commits in common. "master" and "enhancement/log-execution-time" have entirely different histories.

2 changed files with 80 additions and 211 deletions

217
README.md
View File

@ -1,195 +1,126 @@
![753 Data Sync logo](https://git.nickhepler.cloud/nick/753-Data-Sync/raw/branch/master/logo.png)
# 753 Data Sync
*A Python-based data ingestion tool for syncing enforcement data from a public API to ArcGIS Online.*
![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&style=for-the-badge&logo=Python)
![Enhancements](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements)
![Defects](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=enhancement&style=for-the-badge&logo=Gitea&label=Enhancements)
![Gitea Issues](https://img.shields.io/gitea/issues/open/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud%2F&labels=bug&style=for-the-badge&logo=Gitea&label=Defects)
---
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks:
## 🚀 Overview
- **Truncate** the specified layer in ArcGIS to clear any previous features before adding new ones.
- **Fetch** data from an API in paginated form.
- **Save** data from each API response to individual JSON files.
- **Aggregate** all data from all pages into one JSON file.
- **Add** the aggregated data as features to an ArcGIS feature service.
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. It also logs the operation, saves data to JSON files, and optionally purges old files. Additionally, it supports reloading data from a JSON file without making API calls.
## Requirements
---
- Python 3.6 or higher
- Required Python packages (see `requirements.txt`)
- ArcGIS Online credentials (username and password)
- `.env` file for configuration (see below for details)
## 📦 Requirements
## Install Dependencies
- Python 3.6 or higher (if using the Python script)
- Required packages in `requirements.txt`
- `.env` file with your configuration
- ArcGIS Online credentials
---
## ⚙️ Installation
### Python Script
To install the required dependencies, use the following command:
```bash
pip install -r requirements.txt
```
Or install packages individually:
Alternatively, you can install the necessary packages individually:
```bash
pip install requests python-dotenv
pip install requests
pip install python-dotenv
```
### Windows Executable
## Configuration
A Windows executable is available for users who prefer not to run the script directly. You can download it from the [releases page](https://git.nickhepler.cloud/nick/753-Data-Sync/releases). This executable is compiled using PyInstaller and can be run without needing to install Python or any dependencies.
---
## ⚙️ Configuration
Create a `.env` file in the root of your project:
Before running the script, you need to configure some environment variables. Create a `.env` file in the root of your project with the following details:
```env
API_URL=https://example.com/api
AGOL_USER=your_username
AGOL_PASSWORD=your_password
HOSTNAME=www.arcgis.com
INSTANCE=your_instance
API_URL=your_api_url
AGOL_USER=your_arcgis_online_username
AGOL_PASSWORD=your_arcgis_online_password
HOSTNAME=your_arcgis_host
INSTANCE=your_arcgis_instance
FS=your_feature_service
LAYER=0
LOG_LEVEL=DEBUG
PURGE_DAYS=5
LAYER=your_layer_id
LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL
```
### Required Variables
### Environment Variables:
| Variable | Description |
|----------------|--------------------------------------------|
| `API_URL` | The API endpoint to fetch data from |
| `AGOL_USER` | ArcGIS Online username |
| `AGOL_PASSWORD`| ArcGIS Online password |
| `HOSTNAME` | ArcGIS host (e.g., `www.arcgis.com`) |
| `INSTANCE` | ArcGIS REST instance path |
| `FS` | Feature service name |
| `LAYER` | Feature layer ID or name |
- **API_URL**: The URL of the API you are fetching data from.
- **AGOL_USER**: Your ArcGIS Online username.
- **AGOL_PASSWORD**: Your ArcGIS Online password.
- **HOSTNAME**: The hostname of your ArcGIS Online instance (e.g., `www.arcgis.com`).
- **INSTANCE**: The instance name of your ArcGIS Online service.
- **FS**: The name of the feature service you are working with.
- **LAYER**: The ID or name of the layer to truncate and add features to.
- **LOG_LEVEL**: The desired logging level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`).
### Optional Variables
## Script Usage
| Variable | Description |
|----------------|--------------------------------------------|
| `LOG_LEVEL` | Log level (`DEBUG`, `INFO`, etc.) |
| `PURGE_DAYS` | Number of days to retain logs and JSONs |
---
## 🧪 Script Usage
### Python Script
You can run the script with the following command:
```bash
python 753DataSync.py --results_per_page 100
python 753DataSync.py --results_per_page <number_of_results_per_page>
```
### Windows Executable
### Arguments:
Simply double-click the executable file to run it. You can also run it from the command line with:
- `--results_per_page` (optional): The number of results to fetch per page (default: 100).
```bash
753DataSync.exe --results_per_page 100
```
## Functionality
### CLI Arguments
### 1. **Truncate Layer**:
Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data.
| Argument | Description |
|----------------------|---------------------------------------------|
| `--results_per_page` | Optional. Number of results per API call (default: `100`) |
| `--test` | Optional. If set, only fetch the first page of results. |
| `--reload` | Optional. Load data from a specified JSON file instead of fetching from the API. |
### 2. **Fetch Data**:
The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved.
---
### 3. **Save Data**:
Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file.
## 📋 Functionality
### 4. **Add Features**:
After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer.
1. **🔁 Truncate Layer** — Clears existing ArcGIS features.
2. **🌐 Fetch Data** — Retrieves paginated data from the API.
3. **💾 Save Data** — Writes each page to a time-stamped JSON file.
4. **📦 Aggregate Data** — Combines all pages into one file.
5. **📤 Add Features** — Sends data to ArcGIS feature layer.
6. **🧹 File Cleanup** — Deletes `.json`/`.log` files older than `PURGE_DAYS`.
7. **📑 Dynamic Logs** — Logs saved to `753DataSync_YYYY-MM-DD.log`.
8. **🧪 Test Mode** — Use the `--test` flag to fetch only the first page of results for testing purposes.
9. **🔄 Reload Data** — Use the `--reload` flag to truncate the feature layer and load data from a specified JSON file.
## Example Output
---
- Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`.
- The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`.
## 📁 Example Output
Logs will also be generated in the `753DataSync.log` file and printed to the console.
```bash
📁 data/
├── enforcement_page_1_results_100_2025-03-26_14-30-45.json
├── enforcement_page_2_results_100_2025-03-26_14-31-10.json
└── aggregated_enforcement_results_2025-03-26_14-31-15.json
📄 753DataSync_2025-03-26.log
```
---
## 📝 Example Log
## Example Output (Log)
```text
2025-03-26 14:30:45 - INFO - Attempting to truncate layer...
2025-03-26 14:30:51 - INFO - Fetching page 1 from API...
2025-03-26 14:30:55 - INFO - Saved data to data/enforcement_page_1_results_100_...
2025-03-26 14:30:57 - INFO - Aggregated data saved.
2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/...
2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/...
2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100
2025-03-26 14:30:55 - INFO - Data saved to data/enforcement_page_1_results_100_2025-03-26_14-30-45.json
2025-03-26 14:30:56 - INFO - No more data to fetch, stopping pagination.
2025-03-26 14:30:57 - INFO - Data saved to data/aggregated_enforcement_results_2025-03-26_14-30-45.json
2025-03-26 14:31:00 - INFO - Features added successfully.
2025-03-26 14:31:01 - INFO - Deleted old log: 753DataSync_2025-03-19.log
```
---
## Error Handling
## 🛠 Troubleshooting
The script handles errors gracefully, including:
- Set `LOG_LEVEL=DEBUG` in `.env` for detailed logs.
- Ensure `.env` has no syntax errors.
- Make sure your ArcGIS layer has permission for truncation and writes.
- Check for internet/API access and expired ArcGIS tokens.
- Logs are written to both console and daily log files.
- If an error occurs while fetching data, the script will log the error and stop execution.
- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution.
- The script handles HTTP errors and network-related errors gracefully, ensuring that any issues are logged with detailed information.
---
## Troubleshooting
## 🧪 Testing
- If the script unexpectedly stops, check the logs (`753DataSync.log`) for detailed error information.
- Ensure the `.env` file is correctly configured with valid credentials and API URL.
- Confirm that your ArcGIS layer has the correct permissions to allow truncation and feature addition.
- If you encounter network issues, make sure your system has proper internet access and that the API endpoint is available.
- For debugging, ensure that you have set the `LOG_LEVEL` to `DEBUG` in your `.env` file for detailed logs.
Currently, the script is tested manually. Automated testing may be added under a `/tests` folder in the future.
## License
---
## 📖 Usage Examples
```bash
# Run with default page size
python 753DataSync.py
# Run with custom page size
python 753DataSync.py --results_per_page 50
# Run the Windows executable with default page size
753DataSync.exe
# Run the Windows executable with custom page size
753DataSync.exe --results_per_page 50
```
---
## 💬 Support
Found a bug or want to request a feature?
[Open an issue](https://git.nickhepler.cloud/nick/753-Data-Sync/issues) or contact [@nick](https://git.nickhepler.cloud/nick) directly.
---
## 📜 License
This project is licensed under the [GNU General Public License v3.0](LICENSE).
> 💡 *You are free to use, modify, and share this project as long as you preserve the same license in your changes.*
This project is licensed under the [GNU General Public License v3.0](LICENSE), which allows you to freely use, modify, and distribute the code, provided that you include the same license in derivative works.

74
app.py
View File

@ -17,10 +17,6 @@ load_dotenv("753DataSync.env")
BASE_URL = "{}/{}/{}"
log_level = os.getenv('LOG_LEVEL', 'INFO').upper()
# Get the current date for dynamic log file naming
current_date = datetime.now().strftime("%Y-%m-%d")
log_filename = f"753DataSync_{current_date}.log"
# Setup logging
logger = logging.getLogger()
@ -38,8 +34,8 @@ elif log_level == 'CRITICAL':
else:
logger.setLevel(logging.INFO)
# File handler for dynamic log file
file_handler = logging.FileHandler(log_filename)
# File handler
file_handler = logging.FileHandler('753DataSync.log')
file_handler.setLevel(getattr(logging, log_level))
# Stream handler (console output)
@ -55,35 +51,6 @@ stream_handler.setFormatter(formatter)
logger.addHandler(file_handler)
logger.addHandler(stream_handler)
def purge_old_files(purge_days):
"""Purge log and data files older than PURGE_DAYS from the 'data' folder."""
data_folder = 'data'
log_folder = '.' # Log files are in the current directory
if not os.path.exists(data_folder):
logger.warning(f"The '{data_folder}' folder does not exist.")
return
purge_threshold = datetime.now() - timedelta(days=purge_days)
# Delete old log files
for filename in os.listdir(log_folder):
if filename.endswith(".log"):
file_path = os.path.join(log_folder, filename)
file_modified_time = datetime.fromtimestamp(os.path.getmtime(file_path))
if file_modified_time < purge_threshold:
logger.info(f"Deleting old log file: {file_path}")
os.remove(file_path)
# Delete old data files
for filename in os.listdir(data_folder):
file_path = os.path.join(data_folder, filename)
if filename.endswith(".json"):
file_modified_time = datetime.fromtimestamp(os.path.getmtime(file_path))
if file_modified_time < purge_threshold:
logger.info(f"Deleting old data file: {file_path}")
os.remove(file_path)
def fetch_data(api_url, page_number, results_per_page):
"""Fetches data from the API and returns the response."""
url = BASE_URL.format(api_url, page_number, results_per_page)
@ -146,16 +113,10 @@ def parse_arguments():
# Add arguments for results per page
parser.add_argument('--results_per_page', type=int, default=100, help="Number of results per page (default: 100)")
# Add a test flag
parser.add_argument('--test', action='store_true', help="If set, only fetch the first page of results.")
# Add a reload flag
parser.add_argument('--reload', type=str, help="If set, load data from the specified file instead of fetching from the API.")
# Parse the arguments
args = parser.parse_args()
return args.results_per_page, args.test, args.reload
return args.results_per_page
def generate_token(username, password, url="https://www.arcgis.com/sharing/rest/generateToken"):
"""Generates an authentication token."""
@ -305,14 +266,9 @@ def main():
try:
logger.info("Starting script execution.")
# Check and purge old files before processing
purge_days = int(os.getenv("PURGE_DAYS", 30)) # Default to 30 days if not set
logger.info(f"Purging files older than {purge_days} days.")
purge_old_files(purge_days)
# Parse command-line arguments
results_per_page, test_mode, reload_file = parse_arguments()
logger.info(f"Parsed arguments: results_per_page={results_per_page}, test_mode={test_mode}, reload_file={reload_file}")
results_per_page = parse_arguments()
logger.info(f"Parsed arguments: results_per_page={results_per_page}")
# Load environment variables
logger.info("Loading environment variables.")
@ -336,22 +292,9 @@ def main():
fs = os.getenv('FS')
layer = os.getenv('LAYER')
logger.info("Truncating the feature layer.")
# Truncate the layer before adding new features
truncate(token, hostname, instance, fs, layer)
# If --reload flag is set, load data from the specified file
if reload_file:
logger.info(f"Reloading data from file: {reload_file}")
# Load data from the specified file
with open(reload_file, 'r', encoding='utf-8') as f:
aggregated_data = json.load(f)
# Add the features to the feature layer
response = add_features(token, hostname, instance, fs, layer, aggregated_data)
logger.info("Data reloaded successfully from the specified file.")
return
all_data = []
page_number = 1
@ -375,11 +318,6 @@ def main():
logger.info("No more data to fetch, stopping pagination.")
break
# Break the loop if in test mode
if test_mode:
logger.info("Test mode is enabled, stopping after the first page.")
break
page_number += 1
except Exception as e:
logger.error(f"Error fetching or saving data for page {page_number}: {e}", exc_info=True)