Author Topic: Intermittant pressure WS-1400-IP (Read 4277 times)

daman · « **Reply #25 on:** March 08, 2017, 10:23:27 AM »

Quote from: btw-nc on March 06, 2017, 10:44:41 AM

This is the note on the website, which specifically mentions the 1400-IP:

On March 1, 2017, it appears that Amazon Web Services changed how the ObserverIP opens and closes ports on the server. This only occurs on the WS-1400-IP system. This causes other tasks to slow down, and manifests itself in the indoor temperature-humidity-barometer to act sporadically. We are working on resolving the issue. Any updates will be posted on this page.

So you members know also the AcuRite smarthub is experiencing the same problem with gaps and WU. Were still having issues there working on the problem they say.

greatg · « **Reply #26 on:** March 08, 2017, 11:20:30 AM »

My WU data is still very sporadic after the update. This is my station IWELLAND9.

btw-nc · « **Reply #27 on:** March 08, 2017, 12:08:02 PM »

Your graphs are not like mine. The only one that I had with breaks in it was the pressure graph. All the others looked unbroken.

BroPete · « **Reply #28 on:** March 08, 2017, 12:12:48 PM »

My station is KTXSTEPH27. I just updated my firmware and already the pressure graph is unbroken. Earlier today it was very choppy. (The big pressure drop was before I reset the relative offset.)

Yank61s · « **Reply #29 on:** March 08, 2017, 12:28:45 PM »

I updated the firmware this morning.
seems to be working for now

greatg · « **Reply #30 on:** March 08, 2017, 04:02:52 PM »

A question for all those folks that have the sporadic data problem fixed with the March 1 update - are you on DHCP or Static IP?

Rychu · « **Reply #31 on:** March 08, 2017, 04:24:37 PM »

Quote from: greatg on March 08, 2017, 04:02:52 PM

A question for all those folks that have the sporadic data problem fixed with the March 1 update - are you on DHCP or Static IP?

DHCP ... But my breaks because I do not have the firmware for clone WS1400 IP - the WH2600 (in Europe)

For static IP update 3.1.2 or higher wrote Ambient - but blocked all the updates below 3.1.3.

It follows that 3.1.3 is probably now for DHCP and static IP.

BroPete · « **Reply #32 on:** March 08, 2017, 04:32:39 PM »

DCHP my data stream looks unbroken since updating.

greatg · « **Reply #33 on:** March 08, 2017, 04:38:22 PM »

Then why is mine still broken

I'm using DHCP; connected through switch (tried directly to router); all lights on Observer are blue for indoor & outdoor sensors; RF sensor is blue and sometimes flashes; noticed a couple of observer reboots and server light blinks on and off?

Rychu · « **Reply #34 on:** March 08, 2017, 04:56:25 PM »

Quote from: greatg on March 08, 2017, 04:38:22 PM

(...) RF sensor is blue and sometimes flashes; noticed a couple of observer reboots and server light blinks on and off?

If the LED is blinking server it means that ObserwerIP not merged with sever for 10 minutes.
RF blinks once after receiving a signal from your sensor - it should be.

If you have IP1400 is why you do not see Solar and UV?

greatg · « **Reply #35 on:** March 08, 2017, 05:02:06 PM »

My problem is not the solar or UV. It's the sporadic pressure readings and sometimes temp is also sporadic. This is my station IWELLAND9.

Rychu · « **Reply #36 on:** March 08, 2017, 05:07:59 PM »

OK, but why not see for WU - solar and UV?

IWELLAND9

Edited my post above - applies to LED and RF Server - Read

greatg · « **Reply #37 on:** March 08, 2017, 05:11:14 PM »

Thanks, but its the Pressure that I don't see.

btw-nc · « **Reply #38 on:** March 08, 2017, 08:52:27 PM »

It is odd that you don't have data showing on WU for UV.

annetoal · « **Reply #39 on:** March 09, 2017, 07:52:27 AM »

They just posted a firmware upgrade to version 3.1.3. Seems Amazon changed something in their servers which was messing up the timing with the way the ObserverIP sent data. The problems people are posting here--indoor sensor data gaps--are what resulted from Amazon's change. http://www.ambientweather.com/wswsfiup.html

Anne
KTXEDINB9

Rychu · « **Reply #40 on:** March 09, 2017, 08:32:15 AM »

Anne,

KTXEDINB9 did you update to 3.1.3?

btw-nc · « **Reply #41 on:** March 09, 2017, 09:20:53 AM »

We were discussing this on page 1 of the topic.

Quote from: annetoal on March 09, 2017, 07:52:27 AM

They just posted a firmware upgrade to version 3.1.3. Seems Amazon changed something in their servers which was messing up the timing with the way the ObserverIP sent data.

BroPete · « **Reply #42 on:** March 09, 2017, 09:51:21 AM »

I upgraded (Ambient calls it and upgrade, I believe a resolution of a problem should be an update) I think that may be the solution for a lot of problems people have had.

Since I've upgraded/dated I've had no problems for almost 24 hours. I hope that more people find this solution because there are a fair number of people whose stations are showing the same characteristics.

Something is happening this morning where Weather Underground is reporting that somewhere in the path there is an interruption. Occasionally when I switch between the airport and my weather station, it says that there are no other stations available in my area. I am optimistic and hoping it is no more than an intermittent problem soon to be resolved.

Rychu · « **Reply #43 on:** March 09, 2017, 10:51:55 AM »

Quote from: BroPete on March 09, 2017, 09:51:21 AM

I upgraded (...)

You did upgrade to 3.1.3 ? You can specify the ID station WS1400 IP ?

BroPete · « **Reply #44 on:** March 09, 2017, 12:17:51 PM »

Yes, Rychu, I upgraded to 3.1.3 my station id is KTXSTEPH27.

Rychu · « **Reply #45 on:** March 09, 2017, 12:24:56 PM »

Quote from: BroPete on March 09, 2017, 12:17:51 PM

(...) my station id is KTXSTEPH27.

I had not noticed, entered ID above

KTXSTEPH27

dolfs · « **Reply #46 on:** March 09, 2017, 10:03:43 PM »

I too had this problem and updated (to 3.1.3). Since I've had the problem once in the past 20 hours. I believe the solution is not complete. Here is what I know based on communication from Ambient:

The code in the firmware is single threaded. This means that while an HTTP request is outstanding to (for example) Wunderground.com and the reply has not yet been received, no other code runs. This includes code to receive data wirelessly from the indoor sensor package.
To prevent an unresponsive server from completely stopping all code forever, a timeout is built in so that if no response is received for a certain time interval, the request is considered failed and reporting will have to wait until the next scheduled attempt. I am not 100% sure what interval the ObserverIP is using to update WU. Likewise I do not know their (fixed) timeout interval.
Inspecting network traffic I can see that in my case the WU update frequency is about once every two seconds (version 3.1.3 firmware). Oddly enough it uses the parameter "rtfreq=5" to WU which I would have expected to be 2 as well then. Or, in other words, why attempt this every two seconds when you are indicating a resolution no better than 5? I suspect mine ends up being 2 seconds because my requests fail (see below).
The move, on or around March 1, of the Wunderground infrastructure from private physical servers to the Amazon Web Services infrastructure introduced different delays in the API request/response cycles, one of which is used by the ObserverIP to deliver sensor data to your account. This delay is likely a function of running on different hardware (virtualized), and possibly not enough scaling so that high (transient) load can cause longer delays than what we were used to on the private infrastructure.

So now, consider the following scenario with firmware ≤ 3.1.0, which has a relatively short timeout interval. At time 0 the ObserverIP attempts to report its currently know set of sensor values (indoor+outdoor) to Wunderground. It starts a request. The new Amazon infrastructure is busy or, in general, takes a little longer that the timeout interval (which had been previously set to a value that would rarely, if ever, occur). The request fails and is immediately retried (or at least is retried before the indoor sensors are serviced). After a few retries, the request finally succeeds. Meanwhile, however, no indoor sensor data was collected (because of the single threaded issue). Because of the delays, it is already time for the next scheduled upload to WU, and a new request is done, but it now contains "--" for the indoor sensor values (I have observed this only for pressure though).

So, Ambient's solution in the new firmware is to increase the timeout interval. While this appears to solve the issue, it really does not do so in all cases. Here is why. You would have to make the timeout large enough to handle the worst case scenario for the Amazon infrastructure, or the above scenario repeats. Nobody can of course be sure in the absolute sense of what this largest value should be, but assume something is found that covers 95% of the new scenarios. That means that occasionally the problem can still occur. If they choose a larger value, it might be 99% or 99.99%, but...

The longer the timeout, if there is an actual long delay, the code will do nothing else until this now much longer time has expired. This increases the likelihood the indoor sensors are not serviced. Once the timeout happens, a retry still prevents handling of the indoor sensors like above. If retries do not happen immediately (I don't know if this is the case or not), things can still fail, depending on how the handling of the indoor sensor data retrieval is handled. If this is scheduled on a time basis, the delay may have caused the collection time to have passed, and we'll have to wait for the next scheduled one. Meanwhile, the WU request might be scheduled again. Simply manipulating the timeout value cannot completely solve the problem and the single-threaded nature makes things worse.

Note, BTW that while such delays happen, everything slows down. It also explains why sometimes the Live Data page seems sluggish...

While I was investigating, I noticed another problem too. I actually use MeteoBridge to read values from the ObserverIP and send them to WU and other places. Consequently I have not configured my ObserverIP to send to WU by leaving station ID and password blank. When I first read about the problems and the new firmware I thought I would not be affected because of this. My ObserverIP had no reason to interact with WU. Boy was I wrong. I inspected network traffic through my router and discovered that the ObserverIP nevertheless sends a sensor data upload request to WU at regular intervals. The stationID and password are passed as empty, so this request is bound to fail, and it does. However, because of this I (and in fact all of us) am still subject to the delay induced issues. Of course it also wastes bandwidth. In my case request plus fail response consume about 0.5KB and doing this every two seconds means 43,200 times a day, using 21MB each day. I have a great connection, so not that big a deal, but for some remote locations this might matter. I have proposed to Ambient that if either stationID or password is empty they not attempt this at all.

I have written to Ambient and proposed a better solution for the timeout issue by implementing a dynamically controlled timeout interval (similar to what the TCP protocol does) so that the timeout is minimal until problems develop and then it scales up (to a certain maximum) to see if that solves the issue and scales back down when appropriate. They may also have to make changes to the scheduling of the indoor unit values retrieval scheduling to guarantee it always happens before a WU request is done.

Even if the suggestion is implemented, users like myself may have a problem because the MeteoBridge (and also Ambient's version of this) gets values by requesting the Live Data page and scraping the values out of it. It is, therefore, possible to request these before the indoor data has been grabbed. If Ambient implements the optimization around empty stationID, the odds of this happening will be severely reduced because this request is the main source of delays.

So the combination of both suggestions, if implemented, will likely bring all of us back to a normal or much improved scenario.

btw-nc · « **Reply #47 on:** March 10, 2017, 10:12:15 AM »

Very interesting research, Dolf. It looks like the Ambient firmware has a LOT of room for improvement.

greatg · « **Reply #48 on:** March 10, 2017, 10:18:48 AM »

Quote from: dolfs on March 09, 2017, 10:03:43 PM

I too had this problem and updated (to 3.1.3). Since I've had the problem once in the past 20 hours. I believe the solution is not complete. Here is what I know based on communication from Ambient:
The code in the firmware is single threaded. This means that while an HTTP request is outstanding to (for example) Wunderground.com and the reply has not yet been received, no other code runs. This includes code to receive data wirelessly from the indoor sensor package.
To prevent an unresponsive server from completely stopping all code forever, a timeout is built in so that if no response is received for a certain time interval, the request is considered failed and reporting will have to wait until the next scheduled attempt. I am not 100% sure what interval the ObserverIP is using to update WU. Likewise I do not know their (fixed) timeout interval.
Inspecting network traffic I can see that in my case the WU update frequency is about once every two seconds (version 3.1.3 firmware). Oddly enough it uses the parameter "rtfreq=5" to WU which I would have expected to be 2 as well then. Or, in other words, why attempt this every two seconds when you are indicating a resolution no better than 5? I suspect mine ends up being 2 seconds because my requests fail (see below).
The move, on or around March 1, of the Wunderground infrastructure from private physical servers to the Amazon Web Services infrastructure introduced different delays in the API request/response cycles, one of which is used by the ObserverIP to deliver sensor data to your account. This delay is likely a function of running on different hardware (virtualized), and possibly not enough scaling so that high (transient) load can cause longer delays than what we were used to on the private infrastructure.

So now, consider the following scenario with firmware ≤ 3.1.0, which has a relatively short timeout interval. At time 0 the ObserverIP attempts to report its currently know set of sensor values (indoor+outdoor) to Wunderground. It starts a request. The new Amazon infrastructure is busy or, in general, takes a little longer that the timeout interval (which had been previously set to a value that would rarely, if ever, occur). The request fails and is immediately retried (or at least is retried before the indoor sensors are serviced). After a few retries, the request finally succeeds. Meanwhile, however, no indoor sensor data was collected (because of the single threaded issue). Because of the delays, it is already time for the next scheduled upload to WU, and a new request is done, but it now contains "--" for the indoor sensor values (I have observed this only for pressure though).

So, Ambient's solution in the new firmware is to increase the timeout interval. While this appears to solve the issue, it really does not do so in all cases. Here is why. You would have to make the timeout large enough to handle the worst case scenario for the Amazon infrastructure, or the above scenario repeats. Nobody can of course be sure in the absolute sense of what this largest value should be, but assume something is found that covers 95% of the new scenarios. That means that occasionally the problem can still occur. If they choose a larger value, it might be 99% or 99.99%, but...

The longer the timeout, if there is an actual long delay, the code will do nothing else until this now much longer time has expired. This increases the likelihood the indoor sensors are not serviced. Once the timeout happens, a retry still prevents handling of the indoor sensors like above. If retries do not happen immediately (I don't know if this is the case or not), things can still fail, depending on how the handling of the indoor sensor data retrieval is handled. If this is scheduled on a time basis, the delay may have caused the collection time to have passed, and we'll have to wait for the next scheduled one. Meanwhile, the WU request might be scheduled again. Simply manipulating the timeout value cannot completely solve the problem and the single-threaded nature makes things worse.

Note, BTW that while such delays happen, everything slows down. It also explains why sometimes the Live Data page seems sluggish...

While I was investigating, I noticed another problem too. I actually use MeteoBridge to read values from the ObserverIP and send them to WU and other places. Consequently I have not configured my ObserverIP to send to WU by leaving station ID and password blank. When I first read about the problems and the new firmware I thought I would not be affected because of this. My ObserverIP had no reason to interact with WU. Boy was I wrong. I inspected network traffic through my router and discovered that the ObserverIP nevertheless sends a sensor data upload request to WU at regular intervals. The stationID and password are passed as empty, so this request is bound to fail, and it does. However, because of this I (and in fact all of us) am still subject to the delay induced issues. Of course it also wastes bandwidth. In my case request plus fail response consume about 0.5KB and doing this every two seconds means 43,200 times a day, using 21MB each day. I have a great connection, so not that big a deal, but for some remote locations this might matter. I have proposed to Ambient that if either stationID or password is empty they not attempt this at all.

I have written to Ambient and proposed a better solution for the timeout issue by implementing a dynamically controlled timeout interval (similar to what the TCP protocol does) so that the timeout is minimal until problems develop and then it scales up (to a certain maximum) to see if that solves the issue and scales back down when appropriate. They may also have to make changes to the scheduling of the indoor unit values retrieval scheduling to guarantee it always happens before a WU request is done.

Even if the suggestion is implemented, users like myself may have a problem because the MeteoBridge (and also Ambient's version of this) gets values by requesting the Live Data page and scraping the values out of it. It is, therefore, possible to request these before the indoor data has been grabbed. If Ambient implements the optimization around empty stationID, the odds of this happening will be severely reduced because this request is the main source of delays.

So the combination of both suggestions, if implemented, will likely bring all of us back to a normal or much improved scenario.

Well, I have spent this week emailing Ambient (Ed) about the ongoing sporadic issues. As usual, it's a WU problem. In addition, Ambient tells me that AcuRite users are having the same issue. That's great, but I'm not an AcuRite customer! So, let's hope they listen to this and make the necessary improvements!

Rychu · « **Reply #49 on:** March 10, 2017, 03:56:41 PM »

My WH2600 (clone WS1400 IP) already suitable for WU without interruption. There were no software update (I have 2.1.9 - Europe). Return to normal operation lasted two days - 9 and 10 March . You could see how in the next hour, there are fewer interruptions. Amazon restored the server settings to the state as servers WU.

Pressure on March 8 - state during failure

Pressure on March 9 - state during repair

Pressure on March 10 - state after the completion of repair

Still working on soft 2.1.9 - Amazon matched to the client and not the client for Amazon

News:

Author Topic: Intermittant pressure WS-1400-IP (Read 4277 times)