Magnus Hagander <magnus@xxxxxxxxxxxx> writes: > On Fri, Jan 26, 2007 at 09:55:39AM -0500, Tom Lane wrote: >> Keep in mind also that we have seen the stats-test failure on >> non-Windows machines, so we still need to explain that ... > Yeah. But it *could* be two different stats issues lurking. Perhaps the > issue we've seen on non-windows can be fixed by the settings Alvaro had > me try (increasing autovacuum_vacuum_cost_delay or the delay in the > regression test). I had a sudden thought about that: the stats machinery is designed to be non-reliable, ie, drop messages under load. Maybe the occasional stats failures we see are just an artifact of that happening. It would be pretty unfortunate if the stats test and autovacuum together were sufficient load to cause message drops, but I doubt that's the explanation. I think the important change here has been the default enablement of stats_row_level. That means that some of the tests terminating just before the stats test starts may still be trying to dump statistics out to the collector at the same time the stats test is. (Keep in mind that psql does not wait around for the backend to be actually gone before it exits, hence backend-exit cleanup is very likely to happen in parallel with the start of the next test.) This idea explains why we mostly see the failure in parallel tests not serial: in the serial schedule there's no opportunity to have a gang of backends all exiting at the critical time. If this theory is correct, then we can improve the reliability of the stats test a good deal if we put a sleep() at the *start* of the test, to let any old backends get out of the way. It seems worth a try anyway. I'll add this to HEAD and if the stats failure noise seems to go down, we can back-port it. regards, tom lane