Good Morning!
I will try to keep this as brief as possible, but you mention a couple of points here that I would like to chime in on.
For item #2, these of course are your data files. When new observations are recorded, data is always recorded to the latest YYYY-MM.wlk file. You mention something that I have always find peculiar, the first data file always gets modified in this process too. In my situation, my first data file is 2008-04. I've never been clear as to why this file would get continuously touched as well. When April gets here, you should then notice that your 2017-02 and 2017-04 files will get modified. Like I said, I'm still curious as to why this happens, so if anyone else has a theory, I'd love to hear it.
Personally, I am paranoid about data loss. At home, on my importance scale, my weather data finishes a close 2nd next to my family pictures. It's all about how much resources you throw at the problem, but how much of the data are you prepared to lose? I try to take a multi-tier approach to protecting my weather data.
I will keep the shameless plugging to a minimum, but I wrote a powershell script to help with my weather data backup strategy. My strategy looks like this:
- On my home server, my live weather data exists on two mirrored SSDs.
- At the top of every hour, my script will copy the entire weatherlink folder to a temporary, single SSD. Once copied to this temporary SSD, the script will call 7-zip to compress it. This takes the file size from ~300Mb to ~30Mb. The script will then copy the data to large, slower drives in a RAID 5 configuration for archiving. I have it configured to keep backups every hour, for 30 days.
- Every month I then make a complete backup of the entire server to an external drive that I keep off-site.
- My data does exist in a cloud-based SQL server. While it is not complete, I am working on a new feature of the program I use for storing the data in a database that would allow me to recreate WLK files from the observations in the database. When it is done, this will be another avenue for recovery. (capability to repair/edit files is a goal as well)
In the event of drive failures, I feel that I am at risk of losing only up to an hour of data. The chances are small that I would lose both the mirrored set and the RAID 5 at the same time.
What I am not well protected against now is the event that my home is destroyed, or even if my house is broken into and my equipment is stolen. I haven't got around to adding this to the backup script yet, but to protect myself against those scenarios, I am going to add a step that would allow you to copy and archive the data to a cloud-based storage solution like dropbox or something. Since the compressed backup is only ~30Mb, this should not take up a lot of bandwidth or cloud storage.
Sorry for the rambling, this is something I unfortunately think about all the time.