diff --git a/README.md b/README.md index a99519c..808fe8b 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ +![753 Data Sync logo](https://git.nickhepler.cloud/nick/gotifyer/raw/branch/master/logo.png) # 753 Data Sync +![Gitea Release](https://img.shields.io/gitea/v/release/nick/753-Data-Sync?gitea_url=https%3A%2F%2Fgit.nickhepler.cloud) This script fetches enforcement data from an external API, truncates a specified feature layer in ArcGIS, and adds the fetched data as features to the layer. The script performs the following tasks: -1. **Truncate the specified layer** in ArcGIS to clear any previous features before adding new ones. -2. **Fetch data** from an API in paginated form. -3. **Save data** from each API response to individual JSON files. -4. **Aggregate all data** from all pages into one JSON file. -5. **Add the aggregated data** as features to an ArcGIS feature service. +- **Truncate** the specified layer in ArcGIS to clear any previous features before adding new ones. +- **Fetch** data from an API in paginated form. +- **Save** data from each API response to individual JSON files. +- **Aggregate** all data from all pages into one JSON file. +- **Add** the aggregated data as features to an ArcGIS feature service. ## Requirements @@ -15,17 +17,24 @@ This script fetches enforcement data from an external API, truncates a specified - ArcGIS Online credentials (username and password) - `.env` file for configuration (see below for details) -### Install dependencies +## Install Dependencies -You can install the required dependencies using `pip`: +To install the required dependencies, use the following command: ```bash pip install -r requirements.txt ``` +Alternatively, you can install the necessary packages individually: + +```bash +pip install requests +pip install python-dotenv +``` + ## Configuration -Before running the script, you'll need to configure some environment variables. Create a `.env` file with the following details: +Before running the script, you need to configure some environment variables. Create a `.env` file in the root of your project with the following details: ```env API_URL=your_api_url @@ -35,9 +44,10 @@ HOSTNAME=your_arcgis_host INSTANCE=your_arcgis_instance FS=your_feature_service LAYER=your_layer_id +LOG_LEVEL=your_log_level # e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL ``` -### Variables +### Environment Variables: - **API_URL**: The URL of the API you are fetching data from. - **AGOL_USER**: Your ArcGIS Online username. @@ -46,6 +56,7 @@ LAYER=your_layer_id - **INSTANCE**: The instance name of your ArcGIS Online service. - **FS**: The name of the feature service you are working with. - **LAYER**: The ID or name of the layer to truncate and add features to. +- **LOG_LEVEL**: The desired logging level (e.g., `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`). ## Script Usage @@ -55,36 +66,34 @@ You can run the script with the following command: python 753DataSync.py --results_per_page ``` -### Arguments +### Arguments: -- **--results_per_page** (optional): The number of results to fetch per page (default: 100). +- `--results_per_page` (optional): The number of results to fetch per page (default: 100). ## Functionality -1. **Truncate Layer**: Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data. +### 1. **Truncate Layer**: +Before fetching and adding any new data, the script will call the `truncate` function to clear out any existing features from the specified layer. This ensures that the feature layer is empty and ready for the new data. -2. **Fetch Data**: The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved. +### 2. **Fetch Data**: +The script will then fetch data from the specified API in pages. Each page is fetched sequentially until all data is retrieved. -3. **Save Data**: Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file. +### 3. **Save Data**: +Data from each page will be saved to an individual JSON file, with the filename including the page number and timestamp. The aggregated data (all pages combined) is saved to a separate file. -4. **Add Features**: After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer. +### 4. **Add Features**: +After all the data has been fetched and saved, the script will send the aggregated data as features to the specified ArcGIS feature layer. -### Example Output +## Example Output - Individual page files are saved in the `data/` directory with filenames like `enforcement_page_1_results_100_2025-03-26_14-30-45.json`. - The aggregated file is saved as `aggregated_enforcement_results_2025-03-26_14-30-45.json`. - + Logs will also be generated in the `753DataSync.log` file and printed to the console. -## Error Handling - -- If an error occurs while fetching data, the script will log the error and stop execution. -- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution. -- The script handles HTTP errors and network-related errors gracefully. - ## Example Output (Log) -``` +```text 2025-03-26 14:30:45 - INFO - Attempting to truncate layer on https://www.arcgis.com/... 2025-03-26 14:30:50 - INFO - Successfully truncated layer: https://www.arcgis.com/... 2025-03-26 14:30:51 - INFO - Making request to: https://api.example.com/1/100 @@ -94,6 +103,14 @@ Logs will also be generated in the `753DataSync.log` file and printed to the con 2025-03-26 14:31:00 - INFO - Features added successfully. ``` +## Error Handling + +The script handles errors gracefully, including: + +- If an error occurs while fetching data, the script will log the error and stop execution. +- If the `truncate` or `add_features` operations fail, the script will log the error and stop execution. +- The script handles HTTP errors and network-related errors gracefully, ensuring that any issues are logged with detailed information. + ## Troubleshooting - If the script stops unexpectedly, check the logs (`753DataSync.log`) for detailed error information. @@ -102,4 +119,4 @@ Logs will also be generated in the `753DataSync.log` file and printed to the con ## License -This project is licensed under the **GNU General Public License v3.0** or later - see the [LICENSE](LICENSE) file for details. \ No newline at end of file +This project is licensed under the GNU General Public License v3.0 or later - see the [LICENSE](LICENSE) file for details. \ No newline at end of file diff --git a/app.py b/app.py index 4729c21..69e176e 100644 --- a/app.py +++ b/app.py @@ -8,20 +8,37 @@ import argparse import urllib.parse from dotenv import load_dotenv +# Load environment variables from .env file +load_dotenv("753DataSync.env") + # Configuration BASE_URL = "{}/{}/{}" +log_level = os.getenv('LOG_LEVEL', 'INFO').upper() # Setup logging logger = logging.getLogger() -logger.setLevel(logging.INFO) + +# Set the log level for the logger +if log_level == 'DEBUG': + logger.setLevel(logging.DEBUG) +elif log_level == 'INFO': + logger.setLevel(logging.INFO) +elif log_level == 'WARNING': + logger.setLevel(logging.WARNING) +elif log_level == 'ERROR': + logger.setLevel(logging.ERROR) +elif log_level == 'CRITICAL': + logger.setLevel(logging.CRITICAL) +else: + logger.setLevel(logging.INFO) # File handler file_handler = logging.FileHandler('753DataSync.log') -file_handler.setLevel(logging.INFO) +file_handler.setLevel(getattr(logging, log_level)) # Stream handler (console output) stream_handler = logging.StreamHandler(sys.stdout) -stream_handler.setLevel(logging.INFO) +stream_handler.setLevel(getattr(logging, log_level)) # Log format formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') @@ -37,23 +54,30 @@ def fetch_data(api_url, page_number, results_per_page): url = BASE_URL.format(api_url, page_number, results_per_page) try: - logger.info(f"Making request to: {url}") + logger.info(f"Making request to: {url} with page_number={page_number} and results_per_page={results_per_page}") response = requests.get(url) # Check for HTTP errors response.raise_for_status() + # Success log + logger.info(f"Successfully fetched data from {url}. Status code: {response.status_code}.") + + # Debug log with additional response details + logger.debug(f"GET request to {url} completed with status code {response.status_code}. " + f"Response time: {response.elapsed.total_seconds()} seconds.") + # Return JSON data return response.json() except requests.exceptions.HTTPError as http_err: - logger.error(f"HTTP error occurred: {http_err}") + logger.error(f"HTTP error occurred while fetching data from {url}: {http_err}") sys.exit(1) except requests.exceptions.RequestException as req_err: - logger.error(f"Request error occurred: {req_err}") + logger.error(f"Request error occurred while fetching data from {url}: {req_err}") sys.exit(1) except Exception as err: - logger.error(f"An unexpected error occurred: {err}") + logger.exception(f"An unexpected error occurred while fetching data from {url}: {err}") sys.exit(1) def save_json(data, filename): @@ -62,15 +86,22 @@ def save_json(data, filename): # Ensure directory exists if not os.path.exists('data'): os.makedirs('data') - + logger.info(f"Directory 'data' created.") + # Save data to file with open(filename, 'w', encoding='utf-8') as f: json.dump(data, f, ensure_ascii=False, indent=4) - logger.info(f"Data saved to {filename}") + logger.info(f"Data successfully saved to {filename}") + except OSError as e: + logger.error(f"OS error occurred while saving JSON data to {filename}: {e}") + sys.exit(1) + except IOError as e: + logger.error(f"I/O error occurred while saving JSON data to {filename}: {e}") + sys.exit(1) except Exception as e: - logger.error(f"Error saving JSON data: {e}") + logger.error(f"Unexpected error occurred while saving JSON data to {filename}: {e}") sys.exit(1) def parse_arguments(): @@ -96,14 +127,36 @@ def generate_token(username, password, url="https://www.arcgis.com/sharing/rest/ 'expiration': '120' } headers = {} + try: + logger.info(f"Generating token for username '{username}' using URL: {url}") response = requests.post(url, headers=headers, data=payload) + + # Log the request status and response time + logger.debug(f"POST request to {url} completed with status code {response.status_code}. " + f"Response time: {response.elapsed.total_seconds()} seconds.") + response.raise_for_status() # Raise an error for bad status codes - token = response.json()['token'] - logger.info("Token generated successfully.") + + # Extract token from the response + token = response.json().get('token') + + if token: + logger.info("Token generated successfully.") + else: + logger.error("Token not found in the response.") + sys.exit(1) + return token + except requests.exceptions.RequestException as e: - logger.error(f"Error generating token: {e}") + logger.error(f"Error generating token for username '{username}': {e}") + sys.exit(1) + except KeyError as e: + logger.error(f"Error extracting token from the response: Missing key {e}") + sys.exit(1) + except Exception as e: + logger.exception(f"Unexpected error generating token for username '{username}': {e}") sys.exit(1) def truncate(token, hostname, instance, fs, layer, secure=True): @@ -113,10 +166,17 @@ def truncate(token, hostname, instance, fs, layer, secure=True): url = f"{protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}/truncate?token={token}&async=true&f=json" try: - # Attempt the POST request logging.info(f"Attempting to truncate layer {layer} on {hostname}...") + + # Debug logging for the URL being used + logging.debug(f"Truncate URL: {url}") + response = requests.post(url, timeout=30) + # Log response time + logging.debug(f"POST request to {url} completed with status code {response.status_code}. " + f"Response time: {response.elapsed.total_seconds()} seconds.") + # Check for HTTP errors response.raise_for_status() # Raise an exception for HTTP errors (4xx, 5xx) @@ -124,28 +184,30 @@ def truncate(token, hostname, instance, fs, layer, secure=True): if response.status_code == 200: result = response.json() if 'error' in result: - logging.error(f"Error truncating layer: {result['error']}") + logging.error(f"Error truncating layer {layer}: {result['error']}") return None logging.info(f"Successfully truncated layer: {protocol}{hostname}/{instance}/arcgis/rest/admin/services/{fs}/FeatureServer/{layer}.") return result else: - logging.error(f"Unexpected response: {response.status_code} - {response.text}") + logging.error(f"Unexpected response for layer {layer}: {response.status_code} - {response.text}") return None + except requests.exceptions.Timeout as e: + logging.error(f"Request timed out while truncating layer {layer}: {e}") + return None except requests.exceptions.RequestException as e: - # Catch network-related errors, timeouts, etc. - logging.error(f"Request failed: {e}") + logging.error(f"Request failed while truncating layer {layer}: {e}") return None except Exception as e: - # Catch any other unexpected errors - logging.error(f"An unexpected error occurred: {e}") + logging.error(f"An unexpected error occurred while truncating layer {layer}: {e}") return None def add_features(token, hostname, instance, fs, layer, aggregated_data, secure=True): """Add features to a feature service.""" protocol = 'https://' if secure else 'http://' url = f"{protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}/addFeatures?token={token}&rollbackOnFailure=true&f=json" - logger.info(f"Attempting to add features on {protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}...") + + logger.info(f"Attempting to add features to {protocol}{hostname}/{instance}/arcgis/rest/services/{fs}/FeatureServer/{layer}...") # Prepare features data as the payload features_json = json.dumps(aggregated_data) # Convert aggregated data to JSON string @@ -159,73 +221,119 @@ def add_features(token, hostname, instance, fs, layer, aggregated_data, secure=T } try: + # Log request details (but avoid logging sensitive data) + logger.debug(f"Request URL: {url}") + logger.debug(f"Payload size: {len(features_json)} characters") + response = requests.post(url, headers=headers, data=payload, timeout=180) + + # Log the response time and status code + logger.debug(f"POST request to {url} completed with status code {response.status_code}. " + f"Response time: {response.elapsed.total_seconds()} seconds.") + response.raise_for_status() # Raise an error for bad status codes + logger.info("Features added successfully.") + + # Log any successful response details + if response.status_code == 200: + logger.debug(f"Response JSON size: {len(response.text)} characters.") + return response.json() + + except requests.exceptions.Timeout as e: + logger.error(f"Request timed out while adding features: {e}") + return {'error': 'Request timed out'} + except requests.exceptions.RequestException as e: - logger.error(f"Request error: {e}") + logger.error(f"Request error occurred while adding features: {e}") return {'error': str(e)} + except json.JSONDecodeError as e: - logger.error(f"Error decoding JSON response: {e}") + logger.error(f"Error decoding JSON response while adding features: {e}") return {'error': 'Invalid JSON response'} + except Exception as e: + logger.error(f"An unexpected error occurred while adding features: {e}") + return {'error': str(e)} + def main(): """Main entry point for the script.""" - # Parse command-line arguments - results_per_page = parse_arguments() + try: + logger.info("Starting script execution.") - load_dotenv("753DataSync.env") - api_url = os.getenv('API_URL') + # Parse command-line arguments + results_per_page = parse_arguments() + logger.info(f"Parsed arguments: results_per_page={results_per_page}") - # Generate the token - username = os.getenv('AGOL_USER') - password = os.getenv('AGOL_PASSWORD') - token = generate_token(username, password) + # Load environment variables + logger.info("Loading environment variables.") + load_dotenv("753DataSync.env") + api_url = os.getenv('API_URL') + if not api_url: + logger.error("API_URL environment variable not found.") + return - # Set ArcGIS host details - hostname = os.getenv('HOSTNAME') - instance = os.getenv('INSTANCE') - fs = os.getenv('FS') - layer = os.getenv('LAYER') + # Generate the token + username = os.getenv('AGOL_USER') + password = os.getenv('AGOL_PASSWORD') + if not username or not password: + logger.error("Missing AGOL_USER or AGOL_PASSWORD in environment variables.") + return + token = generate_token(username, password) - # Truncate the layer before adding new features - truncate(token, hostname, instance, fs, layer) + # Set ArcGIS host details + hostname = os.getenv('HOSTNAME') + instance = os.getenv('INSTANCE') + fs = os.getenv('FS') + layer = os.getenv('LAYER') - all_data = [] - page_number = 1 + # Truncate the layer before adding new features + truncate(token, hostname, instance, fs, layer) - while True: - # Fetch data from the API - data = fetch_data(api_url, page_number, results_per_page) + all_data = [] + page_number = 1 - # Append features data to the aggregated list - all_data.extend(data) # Data is now a list of features + while True: + try: + # Fetch data from the API + data = fetch_data(api_url, page_number, results_per_page) - # Generate filename with timestamp for the individual page - timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") - page_filename = f"data/enforcement_page_{page_number}_results_{results_per_page}_{timestamp}.json" - - # Save individual page data - save_json(data, page_filename) + # Append features data to the aggregated list + all_data.extend(data) - # Check if the number of records is less than the results_per_page, indicating last page - if len(data) < results_per_page: - logger.info("No more data to fetch, stopping pagination.") - break + timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + page_filename = f"data/enforcement_page_{page_number}_results_{results_per_page}_{timestamp}.json" + + # Save individual page data + if log_level == 'DEBUG': + save_json(data, page_filename) - page_number += 1 + # Check if the number of records is less than the results_per_page, indicating last page + if len(data) < results_per_page: + logger.info("No more data to fetch, stopping pagination.") + break - # Prepare aggregated data - aggregated_data = all_data # Just use the collected features directly + page_number += 1 + except Exception as e: + logger.error(f"Error fetching or saving data for page {page_number}: {e}", exc_info=True) + break - # Save aggregated data to a single JSON file - aggregated_filename = f"data/aggregated_enforcement_results_{timestamp}.json" - save_json(aggregated_data, aggregated_filename) + # Prepare aggregated data + aggregated_data = all_data # Just use the collected features directly - # Add the features to the feature layer - response = add_features(token, hostname, instance, fs, layer, aggregated_data) - logger.info(f"Add features response: {json.dumps(response, indent=2)}") + # Save aggregated data to a single JSON file + aggregated_filename = f"data/aggregated_enforcement_results_{timestamp}.json" + logger.info(f"Saving aggregated data to {aggregated_filename}.") + save_json(aggregated_data, aggregated_filename) + + # Add the features to the feature layer + response = add_features(token, hostname, instance, fs, layer, aggregated_data) + except Exception as e: + logger.error(f"An unexpected error occurred: {e}", exc_info=True) + return + finally: + logger.info("Script execution completed.") if __name__ == "__main__": - main() + main() \ No newline at end of file diff --git a/logo.png b/logo.png new file mode 100644 index 0000000..93a31f0 Binary files /dev/null and b/logo.png differ