Met data Automation Report
Met data Automation Report
A.
Project Overview
The project automates the process of downloading climate data,
extracting essential weather parameters (Tmax, Tmin, Rainfall), and generating
insightful graphs to monitor trends. The project includes:
- Automated daily PDF downloads from the meteorology website https://meteo.gov.lk/index.php?lang=en.
- Extraction of Tmax, Tmin, and Rainfall data from predefined
locations and hydro catchment areas.
- Zone-wise calculations for 8-day averages of Tmin, Tmax, and
Rainfall.
- Graphical representation of trends for the past 30 days.
- Full integration with GitHub Actions for seamless automation.
B.
Files
and Functionalities
Core Scripts
1.
main.py:
·
Purpose: Downloads daily climate data PDFs from the
meteorology website.
·
Key
Features:
o
Scrapes the webpage for the latest PDF
link.
o
Validates if the PDF is new by comparing
SHA256 hashes of files.
o
Saves the downloaded PDF in the metdata/
folder.
·
Output: Daily climate PDFs (e.g., daily_climate_update_YYYY-MM-DD.pdf
).
2.
extract_metadata.py:
·
Purpose: Extracts Tmax, Tmin, and Rainfall data from
the daily climate PDF.
·
Key
Features:
o
Handles missing PDFs by filling
extracted data with NA
values.
o
Extracts data for predefined
locations.
o
Saves extracted data into extracted_data/extracted_climate_metdata.csv
.
·
Output: extracted_data/extracted_climate_metdata.csv
with columns for Date, Variable (Tmax, Tmin,
Rainfall), and data for each predefined location.
3.
hydro_catchment_extract.py:
·
Purpose: Extracts rainfall data specific to hydro
catchment stations.
·
Key
Features:
o
Extracts rainfall for predefined hydro
catchment stations.
o
Handles missing or corrupted PDFs by
filling extracted data with NA
values.
·
Output: extracted_data/hydro_catchment_data.csv
with columns for Date and rainfall data for each hydro catchment station.
Analysis Scripts
1. metstation_8days_Tmin.py:
·
Purpose: Extracts Tmin data and calculates 8-day
averages.
·
Key
Features:
o
Processes Tmin data for predefined
locations.
o
Calculates zone-wise averages using
predefined zones (e.g., Northern Plains, Eastern Plains).
o
Appends daily Tmin data and 8-day
averages to the CSV file.
·
Output: extracted_data/metstation_tmin_data.csv
with columns for Date, Variable (Tmin), location data, and zone-wise 8-day
averages.
2. metstation_8days_Tmax.py:
·
Purpose: Extracts Tmax data and calculates 8-day
averages.
·
Key
Features:
o
Processes Tmax data for predefined
locations.
o
Calculates zone-wise averages for
Tmax.
o
Appends daily Tmax data and 8-day
averages to the CSV file.
·
Output: extracted_data/metstation_tmax_data.csv
with similar structure to metstation_tmin_data.csv
.
3. metstation_8days_rainfall.py:
·
Purpose: Extracts rainfall data and calculates 8-day
averages.
·
Key
Features:
o
Processes rainfall data for predefined
locations.
o
Calculates zone-wise averages for
rainfall.
o
Appends daily rainfall data and 8-day
averages to the CSV file.
·
Output: extracted_data/metstation_rainfall_data.csv
with similar structure to the Tmin and Tmax data files.
Visualization Scripts
1. daily_rainfall_avg_graph.py:
·
Purpose: Generates a bar chart for the daily average
rainfall over the past 30 days.
·
Key
Features:
o
Filters data from the last 30 days.
o
Plots daily rainfall as a bar chart.
o
Saves the graph in the Graphs/Avg_RF/
folder.
·
Output: Graphs/Avg_RF/daily_rainfall_average_past_30_days_YYYYMMDD_HHMMSS.png
.
2. daily_tmin_avg_graph.py:
·
Purpose: Generates a line graph for the daily average
Tmin over the past 30 days.
·
Key
Features:
o
Filters data from the last 30 days.
o
Plots daily Tmin as a line graph.
o
Saves the graph in the Graphs/Avg_Tmin/
folder.
·
Output: Graphs/Avg_Tmin/daily_tmin_average_past_30_days_YYYYMMDD_HHMMSS.png
.
3. daily_tmax_avg_graph.py:
·
Purpose: Generates a line graph for the daily average
Tmax over the past 30 days.
·
Key
Features:
o
Filters data from the last 30 days.
o
Plots daily Tmax as a line graph.
o
Saves the graph in the Graphs/Avg_Tmax/
folder.
·
Output: Graphs/Avg_Tmax/daily_tmax_average_past_30_days_YYYYMMDD_HHMMSS.png
.
C .
Data Flow
1.
Input:
Daily climate PDFs downloaded from the meteorology website.
2.
Processing:
a.
main.py: Downloads PDFs.
b.
extract_metdata.py
and hydro_catchment_extract.py: Extract Tmax, Tmin, and Rainfall data.
c.
metstation_8days_*
Scripts: Calculate
daily and 8-day averages for each variable.
3.
Output:
a.
Extracted
Data:
i. CSV files stored in the extracted_data/ directory.
b.
Graphs:
i. Graphs for daily rainfall, Tmin, and Tmax stored in
respective subdirectories within Graphs/.
D.
Dependencies
·
Python
Libraries:
o
pandas
, matplotlib
, pdfplumber
, requests
, BeautifulSoup4
, os
, datetime
.
·
Directory
Structure:
o
metdata/: Stores downloaded PDFs.
o
extracted_data/: Stores CSV files with extracted
data.
o
Graphs/: Stores graphs categorized by
type:
§ Avg_RF/
: Rainfall graphs.
§ Avg_Tmin/
: Tmin graphs.
§ Avg_Tmax/
: Tmax graphs.
E.
Automation Workflow
·
Daily
Automation:
o
Triggered by
GitHub Actions to:
§ Download the daily PDF.
§ Extract and process data.
§ Generate graphs.
·
Artifacts:
o
Save CSV
files and graphs as artifacts for easy access.
·
Weekly
Summary:
o
Generate
cumulative rainfall reports and other insights weekly.
·
main.yml - Download PDF and Upload to Metdata Folder:
o
Schedule: Daily at 2:30 PM SLT.
o
Steps:
§ Downloads the
daily climate update PDF using main.py
.
§ Saves the PDF
in the metdata/
folder.
§ Commits and
pushes changes to the repository.
·
met_extract.yml - Extract Data from PDF:
o
Schedule: Daily at 4:30 PM SLT.
o
Steps:
§ Runs the extract_metdata.py
script to extract Tmax, Tmin, and
Rainfall data.
§ Commits the
extracted data to extracted_data/
.
·
hydro_catchment_extract.yml - Extract Hydro Catchment Data:
o
Schedule: Daily at 3:50 PM SLT.
o
Steps:
§ Runs the hydro_catchment_extract.py
script.
§ Commits the
extracted hydro catchment data to extracted_data/
.
·
metstation_8days_rainfall.yml - Extract
Metstation Rainfall Data:
o
Schedule: Daily at 3:30 PM SLT.
o
Steps:
§ Runs the metstation_8days_rainfall.py
script.
§ Commits 8-day
rainfall averages to extracted_data/
.
·
metstation_8days_tmin.yml - Extract Metstation Tmin Data:
o
Schedule: Daily at 3:50 PM SLT.
o
Steps:
§ Runs the metstation_8days_
tmin
.py
script.
§ Commits 8-day
tmin averages to extracted_data/
.
·
metstation_8days_tmax.yml - Extract Metstation Tmax Data:
o
Schedule: Daily at 3:40 PM SLT.
o
Steps:
§ Runs the metstation_8days_
tmax
.py
script.
§ Commits 8-day
tmax averages to extracted_data/
.
·
rainfall_graph.yml - Generate Rainfall Graph:
o
Schedule: Weekly on Thursday at 5:00 PM SLT.
o
Steps:
§ Runs the daily_rainfall_avg_graph.py
script to generate a
rainfall graph.
§ Commits and
pushes the graph to Graphs/Avg_RF/
.
·
tmin_graph.yml - Generate Tmin Graph:
o
Schedule: Weekly on Thursday at 5:00 PM SLT.
o
Steps:
§ Runs the daily_
tmin
_avg_graph.py
script to generate a tmin graph.
§ Commits and
pushes the graph to Graphs/Avg_
Tmin
/
.
·
tmax_graph.yml - Generate Tmax Graph:
o
Schedule: Weekly on Thursday at 5:30 PM SLT.
o
Steps:
§ Runs the daily_
Tmax
_avg_graph.py
script to generate a Tmax graph.
§ Commits and
pushes the graph to Graphs/Avg_
Tmax
/
.
Comments
Post a Comment