Author Topic: MySQL help, shrink database by combining multiple rows into max/min/average  (Read 1799 times)

0 Members and 1 Guest are viewing this topic.

Offline lightmaster

  • MB-HWS Template Dev
  • Forecaster
  • *****
  • Posts: 324
    • Baxley,GA Weather
I have a database that has my weather written to it every 10 seconds. After around 10 months, I now have a little over 1.4 million records, so I'd like to try to compress that a bit to improve performance and reduce the size over long term. I'd like to shrink the data that's older than 1 month by combining the data into 5 min intervals, using things like the max temp in the 5 min window, average wind direction, etc. So in a 10 min period, instead of having 60 records with 10 seconds between them, I'd have 2 records with 5 minutes between them.

I've used queries like this to see data based on a time range, but not sure how to do a query that loops based on the timestamp and creates another table with that new data.

Code: [Select]
SELECT MIN(TempInCur), MAX(TempInCur), MIN(HumInCur), max(HumInCur), min(PressCur), max(PressCur), min(TempOutCur), max(TempOutCur), min(HumOutCur), max(HumOutCur), min(DewCur), max(DewCur), min(HeatIdxCur), max(HeatIdxCur), max(WindSpeedCur), max(WindGust10), max(WindAvgSpeedCur), min(WindChillCur), max(WindChillCur), MAX(RainYear), MAX(RainRateCur), max(UVIndex), MAX(Evapotranspiration), MAX(Radiation) FROM `KGABAXLE9` WHERE `DateTime` >= '2019-01-01' ORDER BY `DateTime` DESC LIMIT 1

If it helps, here's the format of my table:

Code: [Select]
CREATE TABLE `KGABAXLE9` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `DateTime` datetime NOT NULL COMMENT 'Date and Time of Readings',
 `TempOutCur` decimal(4,1) NOT NULL COMMENT 'Current Outdoor Temperature',
 `HumOutCur` int(11) NOT NULL COMMENT 'Current Outdoor Humidity',
 `PressCur` decimal(4,2) NOT NULL COMMENT 'Current Barometric Pressure',
 `DewCur` decimal(4,1) NOT NULL COMMENT 'Current Dew Point',
 `HeatIdxCur` decimal(4,1) NOT NULL COMMENT 'Current Heat Index',
 `WindChillCur` decimal(4,1) NOT NULL COMMENT 'Current Wind Chill',
 `TempInCur` decimal(4,1) NOT NULL COMMENT 'Current Indoor Temperature',
 `HumInCur` int(11) NOT NULL COMMENT 'Current Indoor Humidity',
 `WindSpeedCur` decimal(4,1) NOT NULL COMMENT 'Current Wind Speed',
 `WindAvgSpeedCur` decimal(4,1) NOT NULL COMMENT 'Current Average Wind Speed',
 `WindDirCur` int(11) NOT NULL COMMENT 'Current Wind Direction (Degrees)',
 `WindDirCurEng` varchar(3) NOT NULL COMMENT 'Current Wind Direction (English)',
 `WindGust10` decimal(4,1) NOT NULL COMMENT 'Max Wind Gust for Past 10 Mins',
 `WindDirAvg10` int(11) NOT NULL COMMENT 'Average Wind Direction (Degrees) for Past 10 Mins',
 `WindDirAvg10Eng` varchar(3) NOT NULL COMMENT 'Average Wind Direction (English) for Past 10 Mins',
 `RainRateCur` decimal(5,2) NOT NULL COMMENT 'Current Rain Rate',
 `RainDay` decimal(4,2) NOT NULL COMMENT 'Total Rain for Today',
 `RainYest` decimal(4,2) NOT NULL COMMENT 'Total Rain for Yesterday',
 `RainMonth` decimal(5,2) NOT NULL COMMENT 'Total Rain this Month',
 `RainYear` decimal(5,2) NOT NULL COMMENT 'Total Rain this Year',
 `Radiation` decimal(6,2) NOT NULL COMMENT 'Current amount of Solar Radiation (W/m^2)',
 `Evapotranspiration` decimal(5,2) NOT NULL COMMENT 'Solar Evapotranspiration',
 `UVIndex` decimal(3,1) NOT NULL COMMENT 'Current UV Index (uvi)',
 `ForecastRule` int(11) NOT NULL COMMENT 'Forecast number from Davis',
 PRIMARY KEY (`ID`),
 KEY `TempInCur` (`TempInCur`),
 KEY `TempOutCur` (`TempOutCur`),
 KEY `HumOutCur` (`HumOutCur`,`PressCur`,`DewCur`,`HeatIdxCur`,`WindChillCur`,`HumInCur`,`WindSpeedCur`,`WindGust10`,`RainDay`,`RainYest`,`RainMonth`,`RainYear`),
 KEY `DateTime` (`DateTime`)
) ENGINE=InnoDB AUTO_INCREMENT=1410532 DEFAULT CHARSET=utf8

Offline davidg_nz

  • Member
  • *
  • Posts: 34
    • Sandy Bay Weather
I don't know how well MySQL implements SQL92 date/time stuff but this is pretty easy to do in PostgreSQL. I use a query similar to below in my own software for computing 30 minute averages where the time-stamp is the time at the start of the 30 minute period (the javascript chart library I used was struggling with a month of data at 5 minute resolution).

Code: [Select]
with weather_data as (
    select
        *,
        (extract(epoch from "DateTime"::date) + extract(hour from "DateTime")::int) * 10 + extract(minute from "DateTime")::int / 5 as period
    from "KGABAXLE9"
)
SELECT
    min("DateTime"),
MIN("TempInCur"),
MAX("TempInCur"),
    MIN("HumInCur"),
    max("HumInCur"),
    min("PressCur"),
    max("PressCur"),
    min("TempOutCur"),
    max("TempOutCur"),
    min("HumOutCur"),
    max("HumOutCur"),
    min("DewCur"),
    max("DewCur"),
    min("HeatIdxCur"),
    max("HeatIdxCur"),
    max("WindSpeedCur"),
    max("WindGust10"),
    max("WindAvgSpeedCur"),
    min("WindChillCur"),
    max("WindChillCur"),
    MAX("RainYear"),
    MAX("RainRateCur"),
    max("UVIndex"),
    MAX("Evapotranspiration"),
    MAX("Radiation")
FROM weather_data
group by period
order by period desc

Note that the :: syntax above ("DateTime"::date) is a Postgres-specific cast shortcut - the correct way in most other database engines would be cast("DateTime" as date).
Port Charles, Coromandel, New Zealand | Wired Vantage Pro2+, Ubiquiti AirCam
Hamilton, Waikato, New Zealand | Wireless Vantage Pro2+, 24h FARS, Leaf+Soil Station, Envoy