On Thu, Jul 21, 2022 at 03:26:05PM +0800, Boyang Xue wrote: > > > I find generic/476 easily goes into an infinite run on top of NFS. When it > > > > Infinite? It's only supposed to start 25000*nr_cpus*TIME_FACTOR > > operations, so it /should/ conclude eventually. That includes driving > > the filesystem completel out of space, but there ought to be enough > > unlink/rmdir/truncate calls to free up space every now and then... > > Yes. I'm not sure the calculations inside, but when the size of the > scratch device < 27GB (can be 26GB when the backing storage is ext4 > rather than xfs), the test runs infinitely. I'm aware that the test > should be slow, especially on NFS, but I see the test never finishes > after multi-days. This problem happens in both localhost exported NFS > and remote exported NFS configurations. I can partially confirm this. I had noted a few weeks ago that I needed to exclude generic/476 or the test VM would hang for over 24 hours, a which point I lost patience and terminated the VM. I had gotten as far as gce-xfstests -c nfs -g auto -X generic/476 (which is a loopback config) using 5.19-rc4 in order to get a test run to complete. Note: this was also triggering failures of generic/426 and generic/551, which I also haven't had time to investigate, not being an NFS developer. :-) I wasn't sure whether generic/476 never terminating was caused by a loopback-triggered deadlock, or something else. But it sounds like you've isolated it to the scratch device *too* small, and since that the failure occurred even on a configuration where the client and server were on different machines/VM's, correct? > > > _require_scratch > > > +_require_scratch_size $((27 * 1024 * 1024)) # 27GB > > > > ...so IDGI, this test works as intended. Are you saying that NFS > > command overhead is so high that this test takes too long? I interpreted this as "if the drive is too small, we're hitting some kind of problem". This *could* be some kind of problem which triggers on ENOSPC; perhaps it's just much more likely on a smaller device? So it's possible this is not a test bug, but an NFS problem. Perhaps we should forward this off to the NFS folks first? - Ted