100% agreed
When we work at a pro level here at work, we never base comparison on position. Well actually we do, but what is important probably even more is
elevation. And even that is not enough, other factors include urban/rural station, exposure to wind (depends on the site of the station, if it is in an open field on an airport then obviously the wind for example is going to be different compared to a pro station in the city center build-up zone).
You are absolutely right, they should more look at spikes in the data within that one station, if it was 20° and next minute 5, then something is wrong.
I talked about this before - for example this trend of more and more people buying NetAtmo is dangerous if data validation is done this way. Think of this scenario. You have a very high-quality and well sited accurate station. Your neighbor who has no clue about weather, buys a NetAtmo (or a different sort of these rather design gadgets) and starts sending data to WU as well. Then your other neighbor does the same thing. They both see it more like fun, have it somewhere close to their heated outside walls of the house, exposed to Sun... a cool gadget in their home. Now WU receives data from 3 stations very close to each other - yours and the two NetAtmos. The two NetAtmos are likely to show similar nonsense. Now... if they use the algorithm they use (comparison with neighboring stations), which station´s data do you think will be evaluated as inaccurate... and there is absolutely nothing you can do about it... unless you want to buy additional 3 accurate stations and place them all in your garden to override the others