Met data Automation Report

Met data Automation Report

A.    Project Overview

The project automates the process of downloading climate data, extracting essential weather parameters (Tmax, Tmin, Rainfall), and generating insightful graphs to monitor trends. The project includes:

  • Automated daily PDF downloads from the meteorology website https://meteo.gov.lk/index.php?lang=en.
  • Extraction of Tmax, Tmin, and Rainfall data from predefined locations and hydro catchment areas.
  • Zone-wise calculations for 8-day averages of Tmin, Tmax, and Rainfall.
  • Graphical representation of trends for the past 30 days.
  • Full integration with GitHub Actions for seamless automation.

B.    Files and Functionalities

 Core Scripts

1.     main.py:

·        Purpose: Downloads daily climate data PDFs from the meteorology website.

·        Key Features:

o   Scrapes the webpage for the latest PDF link.

o   Validates if the PDF is new by comparing SHA256 hashes of files.

o   Saves the downloaded PDF in the metdata/ folder.

·        Output: Daily climate PDFs (e.g., daily_climate_update_YYYY-MM-DD.pdf).

2.     extract_metadata.py:

·        Purpose: Extracts Tmax, Tmin, and Rainfall data from the daily climate PDF.

·        Key Features:

o   Handles missing PDFs by filling extracted data with NA values.

o   Extracts data for predefined locations.

o   Saves extracted data into extracted_data/extracted_climate_metdata.csv.

·        Output: extracted_data/extracted_climate_metdata.csv with columns for Date, Variable (Tmax, Tmin, Rainfall), and data for each predefined location.

3.     hydro_catchment_extract.py:

·        Purpose: Extracts rainfall data specific to hydro catchment stations.

·        Key Features:

o   Extracts rainfall for predefined hydro catchment stations.

o   Handles missing or corrupted PDFs by filling extracted data with NA values.

·        Output: extracted_data/hydro_catchment_data.csv with columns for Date and rainfall data for each hydro catchment station.

Analysis Scripts

1.     metstation_8days_Tmin.py:

·        Purpose: Extracts Tmin data and calculates 8-day averages.

·        Key Features:

o   Processes Tmin data for predefined locations.

o   Calculates zone-wise averages using predefined zones (e.g., Northern Plains, Eastern Plains).

o   Appends daily Tmin data and 8-day averages to the CSV file.

·        Output: extracted_data/metstation_tmin_data.csv with columns for Date, Variable (Tmin), location data, and zone-wise 8-day averages.

 

2.     metstation_8days_Tmax.py:

·        Purpose: Extracts Tmax data and calculates 8-day averages.

·        Key Features:

o   Processes Tmax data for predefined locations.

o   Calculates zone-wise averages for Tmax.

o   Appends daily Tmax data and 8-day averages to the CSV file.

·        Output: extracted_data/metstation_tmax_data.csv with similar structure to metstation_tmin_data.csv.

 

3.     metstation_8days_rainfall.py:

·        Purpose: Extracts rainfall data and calculates 8-day averages.

·        Key Features:

o   Processes rainfall data for predefined locations.

o   Calculates zone-wise averages for rainfall.

o   Appends daily rainfall data and 8-day averages to the CSV file.

·        Output: extracted_data/metstation_rainfall_data.csv with similar structure to the Tmin and Tmax data files.

 

 

Visualization Scripts

1.     daily_rainfall_avg_graph.py:

·        Purpose: Generates a bar chart for the daily average rainfall over the past 30 days.

·        Key Features:

o   Filters data from the last 30 days.

o   Plots daily rainfall as a bar chart.

o   Saves the graph in the Graphs/Avg_RF/ folder.

·        Output: Graphs/Avg_RF/daily_rainfall_average_past_30_days_YYYYMMDD_HHMMSS.png.

 

2.     daily_tmin_avg_graph.py:

·        Purpose: Generates a line graph for the daily average Tmin over the past 30 days.

·        Key Features:

o   Filters data from the last 30 days.

o   Plots daily Tmin as a line graph.

o   Saves the graph in the Graphs/Avg_Tmin/ folder.

·        Output: Graphs/Avg_Tmin/daily_tmin_average_past_30_days_YYYYMMDD_HHMMSS.png.

 

3.     daily_tmax_avg_graph.py:

·        Purpose: Generates a line graph for the daily average Tmax over the past 30 days.

·        Key Features:

o   Filters data from the last 30 days.

o   Plots daily Tmax as a line graph.

o   Saves the graph in the Graphs/Avg_Tmax/ folder.

·        Output: Graphs/Avg_Tmax/daily_tmax_average_past_30_days_YYYYMMDD_HHMMSS.png.


 C.    Data Flow

1.     Input:

Daily climate PDFs downloaded from the meteorology website.

2.      Processing:

a.      main.py: Downloads PDFs.

b.      extract_metdata.py and hydro_catchment_extract.py: Extract Tmax, Tmin, and Rainfall data.

c.      metstation_8days_* Scripts: Calculate daily and 8-day averages for each variable.

3.      Output:

a.      Extracted Data:

                                                                        i.     CSV files stored in the extracted_data/ directory.

b.      Graphs:

                                                                        i.     Graphs for daily rainfall, Tmin, and Tmax stored in respective subdirectories within Graphs/.

D.    Dependencies

·        Python Libraries:

o   pandas, matplotlib, pdfplumber, requests, BeautifulSoup4, os, datetime.

·        Directory Structure:

o   metdata/: Stores downloaded PDFs.

o   extracted_data/: Stores CSV files with extracted data.

o   Graphs/: Stores graphs categorized by type:

§  Avg_RF/: Rainfall graphs.

§  Avg_Tmin/: Tmin graphs.

§  Avg_Tmax/: Tmax graphs.

E.    Automation Workflow

·       Daily Automation:

o   Triggered by GitHub Actions to:

§  Download the daily PDF.

§  Extract and process data.

§  Generate graphs.

·       Artifacts:

o   Save CSV files and graphs as artifacts for easy access.

·       Weekly Summary:

o   Generate cumulative rainfall reports and other insights weekly.

 

·       main.yml - Download PDF and Upload to Metdata Folder:

o   Schedule: Daily at 2:30 PM SLT.

o   Steps:

§  Downloads the daily climate update PDF using main.py.

§  Saves the PDF in the metdata/ folder.

§  Commits and pushes changes to the repository.

·       met_extract.yml - Extract Data from PDF:

o   Schedule: Daily at 4:30 PM SLT.

o   Steps:

§  Runs the extract_metdata.py script to extract Tmax, Tmin, and Rainfall data.

§  Commits the extracted data to extracted_data/.

·       hydro_catchment_extract.yml - Extract Hydro Catchment Data:

o   Schedule: Daily at 3:50 PM SLT.

o   Steps:

§  Runs the hydro_catchment_extract.py script.

§  Commits the extracted hydro catchment data to extracted_data/.

·       metstation_8days_rainfall.yml - Extract Metstation Rainfall Data:

o   Schedule: Daily at 3:30 PM SLT.

o   Steps:

§  Runs the metstation_8days_rainfall.py script.

§  Commits 8-day rainfall averages to extracted_data/.

·       metstation_8days_tmin.yml - Extract Metstation Tmin Data:

o   Schedule: Daily at 3:50 PM SLT.

o   Steps:

§  Runs the metstation_8days_tmin.py script.

§  Commits 8-day tmin averages to extracted_data/.

·       metstation_8days_tmax.yml - Extract Metstation Tmax Data:

o   Schedule: Daily at 3:40 PM SLT.

o   Steps:

§  Runs the metstation_8days_tmax.py script.

§  Commits 8-day tmax averages to extracted_data/.

·       rainfall_graph.yml - Generate Rainfall Graph:

o   Schedule: Weekly on Thursday at 5:00 PM SLT.

o   Steps:

§  Runs the daily_rainfall_avg_graph.py script to generate a rainfall graph.

§  Commits and pushes the graph to Graphs/Avg_RF/.

·       tmin_graph.yml - Generate Tmin Graph:

o   Schedule: Weekly on Thursday at 5:00 PM SLT.

o   Steps:

§  Runs the daily_tmin_avg_graph.py script to generate a tmin graph.

§  Commits and pushes the graph to Graphs/Avg_Tmin/.

·       tmax_graph.yml - Generate Tmax Graph:

o   Schedule: Weekly on Thursday at 5:30 PM SLT.

o   Steps:

§  Runs the daily_Tmax_avg_graph.py script to generate a Tmax graph.

§  Commits and pushes the graph to Graphs/Avg_Tmax/.


 Here you can find the source code of all the scripts: Click here

 


Comments

Popular posts from this blog

Extracting Data points using Get Data Graph Digitizer

Digitizing Weekly Epidemiological Reports (WER)

How to open a data file in CDT