Compare commits

..

No commits in common. "ab6a2f0ec88af9de1cfd8df31ad2dbe182a01f43" and "6bbf29493c89fd6ac72ba486f96ef43f0f1eb5a8" have entirely different histories.

3 changed files with 90 additions and 215 deletions

View File

@ -1,14 +1,12 @@
![753 Data Sync logo](https://git.nickhepler.cloud/nick/gotifyer/raw/branch/master/logo.png)
# 753 Data Sync # 753 Data Sync
![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud)
This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks: This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks:
- **Truncate** the specified layer in ArcGIS to clear any previous features before adding new ones. 1. **Truncate the specified layer** in ArcGIS to clear any previous features before adding new ones.
- **Fetch** data from an API in paginated form. 2. **Fetch data** from an API in paginated form.
- **Save** data from each API response to individual JSON files. 3. **Save data** from each API response to individual JSON files.
- **Aggregate** all data from all pages into one JSON file. 4. **Aggregate all data** from all pages into one JSON file.
- **Add** the aggregated data as features to an ArcGIS feature service. 5. **Add the aggregated data** as features to an ArcGIS feature service.
## Requirements ## Requirements
@ -17,24 +15,17 @@ This script fetches enforcement data from an external API, truncates a specified
- ArcGIS Online credentials (username and password) - ArcGIS Online credentials (username and password)
- `.env` file for configuration (see below for details) - `.env` file for configuration (see below for details)
## Install Dependencies ### Install dependencies
To install the required dependencies, use the following command: You can install the required dependencies using `pip`:
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
Alternatively, you can install the necessary packages individually:
```bash
pip install requests
pip install python-dotenv
```
## Configuration ## Configuration
Before running the script, you need to configure some environment variables. Create a `.env` file in the root of your project with the following details: Before running the script, you'll need to configure some environment variables. Create a `.env` file with the following details:
```env ```env
API_URL=your_api_url API_URL=your_api_url
@ -44,10 +35,9 @@ HOSTNAME=your_arcgis_host
INSTANCE=your_arcgis_instance INSTANCE=your_arcgis_instance
FS=your_feature_service FS=your_feature_service
LAYER=your_layer_id LAYER=your_layer_id
LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL
``` ```
### Environment Variables: ### Variables
- **API_URL**: The URL of the API you are fetching data from. - **API_URL**: The URL of the API you are fetching data from.
- **AGOL_USER**: Your ArcGIS Online username. - **AGOL_USER**: Your ArcGIS Online username.
@ -56,7 +46,6 @@ LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL
- **INSTANCE**: The instance name of your ArcGIS Online service. - **INSTANCE**: The instance name of your ArcGIS Online service.
- **FS**: The name of the feature service you are working with. - **FS**: The name of the feature service you are working with.
- **LAYER**: The ID or name of the layer to truncate and add features to. - **LAYER**: The ID or name of the layer to truncate and add features to.
- **LOG_LEVEL**: The desired logging level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`).
## Script Usage ## Script Usage
@ -66,34 +55,36 @@ You can run the script with the following command:
python 753DataSync.py --results_per_page <number_of_results_per_page> python 753DataSync.py --results_per_page <number_of_results_per_page>
``` ```
### Arguments: ### Arguments
- `--results_per_page` (optional): The number of results to fetch per page (default: 100). - **--results_per_page** (optional): The number of results to fetch per page (default: 100).
## Functionality ## Functionality
### 1. **Truncate Layer**: 1. **Truncate Layer**: Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data.
Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data.
### 2. **Fetch Data**: 2. **Fetch Data**: The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved.
The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved.
### 3. **Save Data**: 3. **Save Data**: Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file.
Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file.
### 4. **Add Features**: 4. **Add Features**: After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer.
After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer.
## Example Output ### Example Output
- Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`. - Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`.
- The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`. - The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`.
Logs will also be generated in the `753DataSync.log` file and printed to the console. Logs will also be generated in the `753DataSync.log` file and printed to the console.
## Error Handling
- If an error occurs while fetching data, the script will log the error and stop execution.
- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution.
- The script handles HTTP errors and network-related errors gracefully.
## Example Output (Log) ## Example Output (Log)
```text ```
2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/... 2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/...
2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/... 2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/...
2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100 2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100
@ -103,14 +94,6 @@ Logs will also be generated in the `753DataSync.log` file and printed to the con
2025-03-26 14:31:00 - INFO - Features added successfully. 2025-03-26 14:31:00 - INFO - Features added successfully.
``` ```
## Error Handling
The script handles errors gracefully, including:
- If an error occurs while fetching data, the script will log the error and stop execution.
- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution.
- The script handles HTTP errors and network-related errors gracefully, ensuring that any issues are logged with detailed information.
## Troubleshooting ## Troubleshooting
- If the script stops unexpectedly, check the logs (`753DataSync.log`) for detailed error information. - If the script stops unexpectedly, check the logs (`753DataSync.log`) for detailed error information.
@ -119,4 +102,4 @@ The script handles errors gracefully, including:
## License ## License
This project is licensed under the GNU General Public License v3.0 or later - see the [LICENSE](LICENSE) file for details. This project is licensed under the **GNU General Public License v3.0** or later - see the [LICENSE](LICENSE) file for details.

232
app.py
View File

@ -8,37 +8,20 @@ import argparse
import urllib.parse import urllib.parse
from dotenv import load_dotenv from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv("753DataSync.env")
# Configuration # Configuration
BASE_URL = "{}/{}/{}" BASE_URL = "{}/{}/{}"
log_level = os.getenv('LOG_LEVEL', 'INFO').upper()
# Setup logging # Setup logging
logger = logging.getLogger() logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Set the log level for the logger
if log_level == 'DEBUG':
logger.setLevel(logging.DEBUG)
elif log_level == 'INFO':
logger.setLevel(logging.INFO)
elif log_level == 'WARNING':
logger.setLevel(logging.WARNING)
elif log_level == 'ERROR':
logger.setLevel(logging.ERROR)
elif log_level == 'CRITICAL':
logger.setLevel(logging.CRITICAL)
else:
logger.setLevel(logging.INFO)
# File handler # File handler
file_handler = logging.FileHandler('753DataSync.log') file_handler = logging.FileHandler('753DataSync.log')
file_handler.setLevel(getattr(logging, log_level)) file_handler.setLevel(logging.INFO)
# Stream handler (console output) # Stream handler (console output)
stream_handler = logging.StreamHandler(sys.stdout) stream_handler = logging.StreamHandler(sys.stdout)
stream_handler.setLevel(getattr(logging, log_level)) stream_handler.setLevel(logging.INFO)
# Log format # Log format
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
@ -54,30 +37,23 @@ def fetch_data(api_url, page_number, results_per_page):
url = BASE_URL.format(api_url, page_number, results_per_page) url = BASE_URL.format(api_url, page_number, results_per_page)
try: try:
logger.info(f"Making request to: {url} with page_number={page_number} and results_per_page={results_per_page}") logger.info(f"Making request to: {url}")
response = requests.get(url) response = requests.get(url)
# Check for HTTP errors # Check for HTTP errors
response.raise_for_status() response.raise_for_status()
# Success log
logger.info(f"Successfully fetched data from {url}. Status code: {response.status_code}.")
# Debug log with additional response details
logger.debug(f"GET request to {url} completed with status code {response.status_code}. "
f"Response time: {response.elapsed.total_seconds()} seconds.")
# Return JSON data # Return JSON data
return response.json() return response.json()
except requests.exceptions.HTTPError as http_err: except requests.exceptions.HTTPError as http_err:
logger.error(f"HTTP error occurred while fetching data from {url}: {http_err}") logger.error(f"HTTP error occurred: {http_err}")
sys.exit(1) sys.exit(1)
except requests.exceptions.RequestException as req_err: except requests.exceptions.RequestException as req_err:
logger.error(f"Request error occurred while fetching data from {url}: {req_err}") logger.error(f"Request error occurred: {req_err}")
sys.exit(1) sys.exit(1)
except Exception as err: except Exception as err:
logger.exception(f"An unexpected error occurred while fetching data from {url}: {err}") logger.error(f"An unexpected error occurred: {err}")
sys.exit(1) sys.exit(1)
def save_json(data, filename): def save_json(data, filename):
@ -86,22 +62,15 @@ def save_json(data, filename):
# Ensure directory exists # Ensure directory exists
if not os.path.exists('data'): if not os.path.exists('data'):
os.makedirs('data') os.makedirs('data')
logger.info(f"Directory 'data' created.")
# Save data to file # Save data to file
with open(filename, 'w', encoding='utf-8') as f: with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4) json.dump(data, f, ensure_ascii=False, indent=4)
logger.info(f"Data successfully saved to {filename}") logger.info(f"Data saved to {filename}")
except OSError as e:
logger.error(f"OS error occurred while saving JSON data to {filename}: {e}")
sys.exit(1)
except IOError as e:
logger.error(f"I/O error occurred while saving JSON data to {filename}: {e}")
sys.exit(1)
except Exception as e: except Exception as e:
logger.error(f"Unexpected error occurred while saving JSON data to {filename}: {e}") logger.error(f"Error saving JSON data: {e}")
sys.exit(1) sys.exit(1)
def parse_arguments(): def parse_arguments():
@ -127,36 +96,14 @@ def generate_token(username, password, url="https://www.arcgis.com/sharing/rest/
'expiration': '120' 'expiration': '120'
} }
headers = {} headers = {}
try: try:
logger.info(f"Generating token for username '{username}' using URL: {url}")
response = requests.post(url, headers=headers, data=payload) response = requests.post(url, headers=headers, data=payload)
# Log the request status and response time
logger.debug(f"POST request to {url} completed with status code {response.status_code}. "
f"Response time: {response.elapsed.total_seconds()} seconds.")
response.raise_for_status() # Raise an error for bad status codes response.raise_for_status() # Raise an error for bad status codes
token = response.json()['token']
# Extract token from the response logger.info("Token generated successfully.")
token = response.json().get('token')
if token:
logger.info("Token generated successfully.")
else:
logger.error("Token not found in the response.")
sys.exit(1)
return token return token
except requests.exceptions.RequestException as e: except requests.exceptions.RequestException as e:
logger.error(f"Error generating token for username '{username}': {e}") logger.error(f"Error generating token: {e}")
sys.exit(1)
except KeyError as e:
logger.error(f"Error extracting token from the response: Missing key {e}")
sys.exit(1)
except Exception as e:
logger.exception(f"Unexpected error generating token for username '{username}': {e}")
sys.exit(1) sys.exit(1)
def truncate(token, hostname, instance, fs, layer, secure=True): def truncate(token, hostname, instance, fs, layer, secure=True):
@ -166,17 +113,10 @@ def truncate(token, hostname, instance, fs, layer, secure=True):
url = f"{protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}/truncate?token={token}&async=true&f=json" url = f"{protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}/truncate?token={token}&async=true&f=json"
try: try:
# Attempt the POST request
logging.info(f"Attempting to truncate layer {layer} on {hostname}...") logging.info(f"Attempting to truncate layer {layer} on {hostname}...")
# Debug logging for the URL being used
logging.debug(f"Truncate URL: {url}")
response = requests.post(url, timeout=30) response = requests.post(url, timeout=30)
# Log response time
logging.debug(f"POST request to {url} completed with status code {response.status_code}. "
f"Response time: {response.elapsed.total_seconds()} seconds.")
# Check for HTTP errors # Check for HTTP errors
response.raise_for_status() # Raise an exception for HTTP errors (4xx, 5xx) response.raise_for_status() # Raise an exception for HTTP errors (4xx, 5xx)
@ -184,30 +124,28 @@ def truncate(token, hostname, instance, fs, layer, secure=True):
if response.status_code == 200: if response.status_code == 200:
result = response.json() result = response.json()
if 'error' in result: if 'error' in result:
logging.error(f"Error truncating layer {layer}: {result['error']}") logging.error(f"Error truncating layer: {result['error']}")
return None return None
logging.info(f"Successfully truncated layer: {protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}.") logging.info(f"Successfully truncated layer: {protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}.")
return result return result
else: else:
logging.error(f"Unexpected response for layer {layer}: {response.status_code} - {response.text}") logging.error(f"Unexpected response: {response.status_code} - {response.text}")
return None return None
except requests.exceptions.Timeout as e:
logging.error(f"Request timed out while truncating layer {layer}: {e}")
return None
except requests.exceptions.RequestException as e: except requests.exceptions.RequestException as e:
logging.error(f"Request failed while truncating layer {layer}: {e}") # Catch network-related errors, timeouts, etc.
logging.error(f"Request failed: {e}")
return None return None
except Exception as e: except Exception as e:
logging.error(f"An unexpected error occurred while truncating layer {layer}: {e}") # Catch any other unexpected errors
logging.error(f"An unexpected error occurred: {e}")
return None return None
def add_features(token, hostname, instance, fs, layer, aggregated_data, secure=True): def add_features(token, hostname, instance, fs, layer, aggregated_data, secure=True):
"""Add features to a feature service.""" """Add features to a feature service."""
protocol = 'https://' if secure else 'http://' protocol = 'https://' if secure else 'http://'
url = f"{protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}/addFeatures?token={token}&rollbackOnFailure=true&f=json" url = f"{protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}/addFeatures?token={token}&rollbackOnFailure=true&f=json"
logger.info(f"Attempting to add features on {protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}...")
logger.info(f"Attempting to add features to {protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}...")
# Prepare features data as the payload # Prepare features data as the payload
features_json = json.dumps(aggregated_data) # Convert aggregated data to JSON string features_json = json.dumps(aggregated_data) # Convert aggregated data to JSON string
@ -221,119 +159,73 @@ def add_features(token, hostname, instance, fs, layer, aggregated_data, secure=T
} }
try: try:
# Log request details (but avoid logging sensitive data)
logger.debug(f"Request URL: {url}")
logger.debug(f"Payload size: {len(features_json)} characters")
response = requests.post(url, headers=headers, data=payload, timeout=180) response = requests.post(url, headers=headers, data=payload, timeout=180)
# Log the response time and status code
logger.debug(f"POST request to {url} completed with status code {response.status_code}. "
f"Response time: {response.elapsed.total_seconds()} seconds.")
response.raise_for_status() # Raise an error for bad status codes response.raise_for_status() # Raise an error for bad status codes
logger.info("Features added successfully.") logger.info("Features added successfully.")
# Log any successful response details
if response.status_code == 200:
logger.debug(f"Response JSON size: {len(response.text)} characters.")
return response.json() return response.json()
except requests.exceptions.Timeout as e:
logger.error(f"Request timed out while adding features: {e}")
return {'error': 'Request timed out'}
except requests.exceptions.RequestException as e: except requests.exceptions.RequestException as e:
logger.error(f"Request error occurred while adding features: {e}") logger.error(f"Request error: {e}")
return {'error': str(e)} return {'error': str(e)}
except json.JSONDecodeError as e: except json.JSONDecodeError as e:
logger.error(f"Error decoding JSON response while adding features: {e}") logger.error(f"Error decoding JSON response: {e}")
return {'error': 'Invalid JSON response'} return {'error': 'Invalid JSON response'}
except Exception as e:
logger.error(f"An unexpected error occurred while adding features: {e}")
return {'error': str(e)}
def main(): def main():
"""Main entry point for the script.""" """Main entry point for the script."""
try: # Parse command-line arguments
logger.info("Starting script execution.") results_per_page = parse_arguments()
# Parse command-line arguments load_dotenv("753DataSync.env")
results_per_page = parse_arguments() api_url = os.getenv('API_URL')
logger.info(f"Parsed arguments: results_per_page={results_per_page}")
# Load environment variables # Generate the token
logger.info("Loading environment variables.") username = os.getenv('AGOL_USER')
load_dotenv("753DataSync.env") password = os.getenv('AGOL_PASSWORD')
api_url = os.getenv('API_URL') token = generate_token(username, password)
if not api_url:
logger.error("API_URL environment variable not found.")
return
# Generate the token # Set ArcGIS host details
username = os.getenv('AGOL_USER') hostname = os.getenv('HOSTNAME')
password = os.getenv('AGOL_PASSWORD') instance = os.getenv('INSTANCE')
if not username or not password: fs = os.getenv('FS')
logger.error("Missing AGOL_USER or AGOL_PASSWORD in environment variables.") layer = os.getenv('LAYER')
return
token = generate_token(username, password)
# Set ArcGIS host details # Truncate the layer before adding new features
hostname = os.getenv('HOSTNAME') truncate(token, hostname, instance, fs, layer)
instance = os.getenv('INSTANCE')
fs = os.getenv('FS')
layer = os.getenv('LAYER')
# Truncate the layer before adding new features all_data = []
truncate(token, hostname, instance, fs, layer) page_number = 1
all_data = [] while True:
page_number = 1 # Fetch data from the API
data = fetch_data(api_url, page_number, results_per_page)
while True: # Append features data to the aggregated list
try: all_data.extend(data) # Data is now a list of features
# Fetch data from the API
data = fetch_data(api_url, page_number, results_per_page)
# Append features data to the aggregated list # Generate filename with timestamp for the individual page
all_data.extend(data) timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
page_filename = f"data/enforcement_page_{page_number}_results_{results_per_page}_{timestamp}.json"
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") # Save individual page data
page_filename = f"data/enforcement_page_{page_number}_results_{results_per_page}_{timestamp}.json" save_json(data, page_filename)
# Save individual page data # Check if the number of records is less than the results_per_page, indicating last page
if log_level == 'DEBUG': if len(data) < results_per_page:
save_json(data, page_filename) logger.info("No more data to fetch, stopping pagination.")
break
# Check if the number of records is less than the results_per_page, indicating last page page_number += 1
if len(data) < results_per_page:
logger.info("No more data to fetch, stopping pagination.")
break
page_number += 1 # Prepare aggregated data
except Exception as e: aggregated_data = all_data # Just use the collected features directly
logger.error(f"Error fetching or saving data for page {page_number}: {e}", exc_info=True)
break
# Prepare aggregated data # Save aggregated data to a single JSON file
aggregated_data = all_data # Just use the collected features directly aggregated_filename = f"data/aggregated_enforcement_results_{timestamp}.json"
save_json(aggregated_data, aggregated_filename)
# Save aggregated data to a single JSON file # Add the features to the feature layer
aggregated_filename = f"data/aggregated_enforcement_results_{timestamp}.json" response = add_features(token, hostname, instance, fs, layer, aggregated_data)
logger.info(f"Saving aggregated data to {aggregated_filename}.") logger.info(f"Add features response: {json.dumps(response, indent=2)}")
save_json(aggregated_data, aggregated_filename)
# Add the features to the feature layer
response = add_features(token, hostname, instance, fs, layer, aggregated_data)
except Exception as e:
logger.error(f"An unexpected error occurred: {e}", exc_info=True)
return
finally:
logger.info("Script execution completed.")
if __name__ == "__main__": if __name__ == "__main__":
main() main()

BIN
logo.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.7 KiB