On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote: > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing > that one of the stack traces had virtio balloon in the trace, I > realized that when I switched the GCE VM type from e1-standard-2 to > n1-standard-2 (where e1 VM's are cheaper because they use > virtio-balloon to better manage host OS memory utilization), problem > has become, much, *much* rarer (and possibly has gone away, although > I'm going to want to run a lot more tests before I say that > conclusively) on my test setup. At the very least, using an n1 VM > (which doesn't have virtio-balloon enabled in the hypervisor) is > enough to unblock ext4 development. .... and I spoke too soon. A number of runs using -rc6 are now failing even with the n1-standard-2 VM, so virtio-ballon may not be an indicator. This is why debugging this is frustrating; it is very much a heisenbug --- although 5.8 seems to work completely reliably, as does commits before 37f4a24c2469. Anything after that point will show random failures. :-( - Ted