Author Topic: What Wx Data backup method(s) should I be doing?  (Read 3716 times)

0 Members and 1 Guest are viewing this topic.

Offline galfert

  • Global Moderator
  • Forecaster
  • *****
  • Posts: 6822
What Wx Data backup method(s) should I be doing?
« on: October 08, 2018, 01:45:30 PM »
I just started with my first weather station this year. Who knows where the hobby will take me. I went with the Meteobridge because it seemed like a simple easy solution to publish data to many places. I want to continue to test with other software. I'll probably give Weather Display a go and WeeWx also. And who knows what else.

My biggest concern is finding something I really like at some point in the future and having missed out on having historical data because I didn't find or implement that software soon enough. Or it could be that some software not created yet comes out out. Or maybe a new online service comes out that allows uploading historical data.

So what backup safeguard methods can I do now to ensure having historical data to import, in an easy to use manner?

I'm currently uploading to an SQL database from the Meteobridge to my Synology NAS (running PHP and MariaDB 10). I'm not yet doing anything with that data. I wonder if that would be sufficient backup? My entire NAS (weather data, movies, pictures, files) is also being backed up remotely.

Should I also be saving to some other format? How easy would it be to convert my SQL data to some other format? Rather than worry about converting I'm wondering if I should for example also be doing some sort of FTP to a  .csv file?  If yes, then using what criteria for the data, frequency, units, etc. Is there a good standard? It would probably be easier to be saving now to multiple formats than worry about converting later maybe?

I'm also wondering if my SQL should be saving to multiple units. I just realized I'm only saving imperial but with hpa (mb) pressure because that is my preference. Maybe it needs to be either all imperial or all metric, or I'll have import issues? Or maybe I should save both units? Or should it be to two separate databases for each unit type? I may be overthinking and worrying about nothing since I'm sure it can be converted but I want to minimize problems down the line.

I'm particularly concerned to have this historical data backup strategy in place by January 1st 2019 so that I can have complete yearly rainfall starting then. So I need to figure this out before year's end.

Also for those of you doing a similar backup what frequency are you using to send backup data? I'm currently sending to my SQL only every 5 minutes. Should that be every 1 minute? Or is that too much? Since every popular online service seems to use 5 minutes (WU, CWOP, AWEKAS) I thought 5 minutes was good.

« Last Edit: October 11, 2018, 08:15:32 PM by galfert »
Ecowitt GW1000 | Meteobridge on Raspberry Pi
WU: KFLWINTE111  |  PWSweather: KFLWINTE111
CWOP: FW3708  |  AWEKAS: 14814
Windy: pws-f075acbe
Weather Underground Issue Tracking
Tele-Pole

Offline dupreezd

  • Forecaster
  • *****
  • Posts: 512
Re: What Wx Data backup method(s) should I be doing?
« Reply #1 on: October 08, 2018, 02:40:18 PM »
Galfert, I don't have any experience with MySQL, but MS SQL allows you to export to many formats. .csv is the most common and from there you can import to excel for example. You can also specify which columns to export as well as a host of other options.

The problem comes when you have funny delimiters/characters  in your data which you then have to escape using quotes or regex.

Once you have done it, it get easier every time.

Create a user in the database with read only access and use that account to play with to ensure you don't accidently overwrite, change or delete data.

Davis VP2 6163 | WiFi Logger
CWOP - FW0717
Blitzortung 2100

Offline bchwdlks

  • Senior Contributor
  • ****
  • Posts: 196
Re: What Wx Data backup method(s) should I be doing?
« Reply #2 on: October 09, 2018, 02:34:56 PM »
If you are collecting 5 minute interval data into a MySQL, you should have enough granularity to do almost any analysis. The data really should be in a single format, either metric or imperial. If you want to view fields differently, it is very easy to use the SQL select statement to convert the data as you select it.

The templates that I have used are Meteotemplate and Home Weather Station, but either of these read from an MySQL database and use parameters and configuration screens to customize your data display AND both are relatively easy to get started with. Either one would be a good starting point for you to test how useful your data records are going to be. Both of them are Meteobridge friendly.

The Meteotemplate package has a LOT of tools for viewing/editing the database tables. My concern with your mixed data would be letting the template know that one field is metric and another is imperial. I believe that the configuration tools would allow you to do this, but I believe it is not a good practice. I spent 20+ years using relational databases, and I would have a real problem with a developer that mixed standards in single data record. To me that is as bad a habit as storing timestamps as local timezone instead of zulu time.  :?  It makes calculation of elapsed time very messy.

Also - make sure that you have included a station ID in your record - you probably will end up with multiple stations  :lol:


Offline galfert

  • Global Moderator
  • Forecaster
  • *****
  • Posts: 6822
Re: What Wx Data backup method(s) should I be doing?
« Reply #3 on: October 09, 2018, 05:42:51 PM »
bchwdlks,
Thank you for your input. I'll change my SQL to to be all imperial then. And good 5 minutes it is.

I see your point in having a station ID when you have more than one station, with one database. But if I were to have another station I would just create a separate database instance rather than dump the data into the same bucket.

Thanks for the tip on the Meteotemplate tools.

I'm going to fight you on the UTC time. I need the data to make sense to me. Rain totals reset at midnight. And rain totals for the year or day start and end on my time not UTC time. Plus having the time stamp as UTC just makes it that much harder to generate reports. If I want a report for a given day if you use UTC then now you can't just look at the date stamp as you would also always need to consider the time to get the desired report results. I can see though UTC being used as a standard in other database applications.
« Last Edit: October 09, 2018, 05:58:52 PM by galfert »
Ecowitt GW1000 | Meteobridge on Raspberry Pi
WU: KFLWINTE111  |  PWSweather: KFLWINTE111
CWOP: FW3708  |  AWEKAS: 14814
Windy: pws-f075acbe
Weather Underground Issue Tracking
Tele-Pole

Offline TomKeffer

  • Member
  • *
  • Posts: 37
  • Author of weeWX
Re: What Wx Data backup method(s) should I be doing?
« Reply #4 on: October 10, 2018, 08:45:14 AM »
I'd say you are both right.

When I wrote WeeWX, I decided to use the wview database schema. Now, with nearly 10 years of experience behind me, I would do things slightly differently.
  • All observation types in a record should use the same unit system (US Customary, metric, whatever). Fortunately, wview does this.
  • The primary key of a record should be a UTC timestamp. Wview does this as well.
  • However, I would add a field UTCOffset, with the local time offset from UTC. This would make it a lot easier to calculate midnight-to-midnight statistics, particularly across DST boundaries (which are a nightmare).
  • If you really want to future proof your schema, add fields for latitude and longitude, in case you want to support mobile devices.

Another option for your application is to use one of the purpose-built "time-series databases" that have come out recently. I used InfluxDB for WeeRT, the realtime version of WeeWX. I really like it, although it still doesn't offer you any help when dealing with local times. You still have to calculate midnight-to-midnight yourself. Hopefully, they'll fix that some day.

-tk
(author of WeeWX)

Offline Bushman

  • Forecaster
  • *****
  • Posts: 7549
    • Eagle Bay Weather
Re: What Wx Data backup method(s) should I be doing?
« Reply #5 on: October 10, 2018, 12:40:43 PM »
I gotta disagree with you on #2 - timestamp  is not a good primary key.  Best practice is an immutable, nonsensical (from a human perspective), unique key.  And before you say timestamp is unique, if your DBMS gathers data from multiple stations you could have two or more identical primary keys.
Need low cost IP monitoring?  http://wirelesstag.net/wta.aspx?link=NisJxz6FhUa4V67/cwCRWA or PM me for 50% off Wirelesstags!!

Offline TomKeffer

  • Member
  • *
  • Posts: 37
  • Author of weeWX
Re: What Wx Data backup method(s) should I be doing?
« Reply #6 on: October 10, 2018, 08:49:42 PM »
Take a look at the InfluxDB documentation. It's a fine example of using timestamps as a primary key.

To solve your posed problem, they also index "tags," which, in your case, would include a station ID. It's exceedingly efficient, and much easier to understand than using opaque UUIDs or something similar.

This is essentially what WeeRT does.

Cheers,

-tk
« Last Edit: October 10, 2018, 09:51:48 PM by TomKeffer »

Offline 92merc

  • BismarckWeather.net
  • Forecaster
  • *****
  • Posts: 1308
  • BismarckWeather.net
    • BismarckWeather.net
Re: What Wx Data backup method(s) should I be doing?
« Reply #7 on: October 11, 2018, 09:23:10 AM »
I know everyone has been talking RPI and weewx backups.  But I thought I would throw out my solution.

I have a Windows PC running Cumulus MX.  I'm using a product called Folder Clone.  It copies the files up to my network drive.  It does a basic time stamp sync of the files.  It does it fairly quick as well.  For the price, I have 3 copies.  I have it installed on my main PC and the wife's as well to do data backups.

http://www.folderclone.com/
https://www.BismarckWeather.net
Davis VP2, Cumulus, WeatherDisplay, Blitzortung, Saratoga Scripts, NOAA Stream via PI

Offline Bushman

  • Forecaster
  • *****
  • Posts: 7549
    • Eagle Bay Weather
Re: What Wx Data backup method(s) should I be doing?
« Reply #8 on: October 11, 2018, 02:38:06 PM »
Take a look at the InfluxDB documentation. It's a fine example of using timestamps as a primary key.

To solve your posed problem, they also index "tags," which, in your case, would include a station ID. It's exceedingly efficient, and much easier to understand than using opaque UUIDs or something similar.

This is essentially what WeeRT does.

Cheers,

-tk

I see what you are saying, but to solve the problem you then have a compound key.  Not best practice.  Especially if you need to link systems.  I deal in petabyte databases and they would die with a timestamp only.
Need low cost IP monitoring?  http://wirelesstag.net/wta.aspx?link=NisJxz6FhUa4V67/cwCRWA or PM me for 50% off Wirelesstags!!

Offline 92merc

  • BismarckWeather.net
  • Forecaster
  • *****
  • Posts: 1308
  • BismarckWeather.net
    • BismarckWeather.net
Re: What Wx Data backup method(s) should I be doing?
« Reply #9 on: October 11, 2018, 04:09:48 PM »
For those Windows users, I ran across another utility built into Windows.  You could make a regular windows scheduled task to run it at selected times.

https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy

RoboCopy, or as MS calls it "Robust File Copy"
https://www.BismarckWeather.net
Davis VP2, Cumulus, WeatherDisplay, Blitzortung, Saratoga Scripts, NOAA Stream via PI

Offline dupreezd

  • Forecaster
  • *****
  • Posts: 512
Re: What Wx Data backup method(s) should I be doing?
« Reply #10 on: October 11, 2018, 04:23:35 PM »
For those Windows users, I ran across another utility built into Windows.  You could make a regular windows scheduled task to run it at selected times.

https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy

RoboCopy, or as MS calls it "Robust File Copy"

+1, a great utility. Even copies MS security permissions.
When setup as a backup option, it will only copy files that changed since last copy.
« Last Edit: October 11, 2018, 04:26:41 PM by dupreezd »
Davis VP2 6163 | WiFi Logger
CWOP - FW0717
Blitzortung 2100

Offline TomKeffer

  • Member
  • *
  • Posts: 37
  • Author of weeWX
Re: What Wx Data backup method(s) should I be doing?
« Reply #11 on: October 11, 2018, 07:53:46 PM »
I see what you are saying, but to solve the problem you then have a compound key.  Not best practice.  Especially if you need to link systems.  I deal in petabyte databases and they would die with a timestamp only.

Interesting discussion!

It all depend on use cases. Time-series data is seldom joined via a primary key. It is almost always used to generate statistics: highest, lowest, average, RMS, etc. For example, atmospheric eddy heat flux is calculated by multiplying a time series of velocities against a time series of temperature deviations, point by point. You need to make sure velocity at t is multiplied against temperature at t (or, at least, near t). For this use case, a timestamp is the perfect primary key. Data frequently comes in at a very high-speed, but are also generally immutable. Data are frequently retained for only a set period of time. All these properties make timestamp a great primary key. There is no need to "link systems." Having to consult external tables through a UUID would be slow and awkward.

A purpose-built time-series database, such as InfluxDB, has been optimized around these characteristics and can easily handle trillions of rows of data, albeit in the very simple table form sufficient for time-series data.

In summary, the use cases are quite different from more traditional applications, such as CRMs.


Offline Bushman

  • Forecaster
  • *****
  • Posts: 7549
    • Eagle Bay Weather
Re: What Wx Data backup method(s) should I be doing?
« Reply #12 on: October 12, 2018, 12:16:09 AM »
Go to https://docs.influxdata.com/influxdb/v0.13/concepts/crosswalk/and retrieve a single record using ONLY timestamp as the primary key.   :)

I can assure that while influx might work for regular time series data it will fail for disaggregated  data like that found in geophysical or medical imaging databases.

I don;t think there is anything wrong with influx per se, but it is limited to specific time series and even then, the lack of an inviolate key will cause issues when you try to JOIN (which is not supported) to say, a georeferenced DB.

Like you say, interesting discussion.  :)
Need low cost IP monitoring?  http://wirelesstag.net/wta.aspx?link=NisJxz6FhUa4V67/cwCRWA or PM me for 50% off Wirelesstags!!

 

anything