Hi Jeff, On Wed, Jan 10, 2024 at 9:30 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > I've been seeing some failures of generic/465 across all NFS versions > for a long time. I finally had some time to track down the cause, but > I'm not quite sure whether it's fixable. > > The test failures usually look like this (though often at a random > offset): > > SECTION -- default > FSTYP -- nfs > PLATFORM -- Linux/x86_64 kdevops-nfs-default 6.7.0-g2f76af849100 #80 SMP PREEMPT_DYNAMIC Wed Jan 10 06:33:59 EST 2024 > MKFS_OPTIONS -- kdevops-nfsd:/export/kdevops-nfs-default-fstests-s > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 kdevops-nfsd:/export/kdevops-nfs-default-fstests-s /media/scratch > > generic/465 8s ... - output mismatch (see /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad) > --- tests/generic/465.out 2024-01-10 06:39:53.500389434 -0500 > +++ /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad 2024-01-10 08:57:00.536146701 -0500 > @@ -1,3 +1,4 @@ > QA output created by 465 > non-aio dio test > +encounter an error: block 117 offset 0, content 0 Looking through my test history, I have this one mostly passing but with the occasional failure that looks like this. > aio-dio test > ... > (Run 'diff -u /data/fstests-install/xfstests/tests/generic/465.out /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad' to see the entire diff) > Ran: generic/465 > Failures: generic/465 > Failed 1 of 1 tests > > The test kicks off a thread that tries to read the file using DIO while > the parent task writes 1M blocks of 'a' to it sequentially using DIO. It > expects that the reader will always see 'a' in the read result, or a > short read. In the above case, it got back a read with '\0' in it. > > The blocks in this test are 1M, so this block starts at offset > 122683392. Looking at a capture, I caught this: > > 65161 40.392338 192.168.122.173 → 192.168.122.227 NFS 3702 V4 Call WRITE StateID: 0x9e68 Offset: 123207680 Len: 524288 ; V4 Call READ_PLUS StateID: 0x9e68 Offset: 122683392 Len: 524288 ; V4 Call READ_PLUS StateID: 0x9e68 Offset: 123207680 Len: 524288 > 65171 40.393230 192.168.122.173 → 192.168.122.227 NFS 3286 V4 Call WRITE StateID: 0x9e68 Offset: 122683392 Len: 524288 > 65172 40.393401 192.168.122.227 → 192.168.122.173 NFS 182 V4 Reply (Call In 65161) WRITE > 65181 40.394844 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply (Call In 65161) READ_PLUS > 65195 40.395506 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply (Call In 65161) READ_PLUS > > It looks like the DIO writes got reordered here so the size of the file > probably increased briefly before the second write got processed, and > the read_plus got processed in between the two. > > While we might be able to force the client to send the WRITEs in order > of increasing offset in this situation, the server is under no > obligation to process concurrent RPCs in any particular order. I don't > think this is fundamentally fixable due to that. > > Am I wrong? If not, then I'll plan to send an fstests patch to skip this > test on NFS. I'm cool with this one being skipped. I'm assuming it passing in my setup is mostly accidental, which means it's not a very useful test. Anna > -- > Jeff Layton <jlayton@xxxxxxxxxx>