Not sure if this is a reasonable suggestion but I was thinking of reasons why runs at different times of the day might have better or worse verification stats as posted above.
I assume that exactly the same computer model is used each time and the data input is from the same sources, but I wonder whether the fact that the new data is collected at a different time of the day for each run has some influence which cannot be fully corrected for.
Obviously temperatures vary diurnally as well as local wind patterns, maybe the corrections that are applied to account for these or similar factors are not quite sufficient?
This may be a complete load of rubbish, feels free to correct me in that case