Re: fstests generic/465 failures on NFS

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, 2024-01-10 at 09:30 -0500, Jeff Layton wrote:
> I've been seeing some failures of generic/465 across all NFS versions
> for a long time. I finally had some time to track down the cause, but
> I'm not quite sure whether it's fixable.
> 
> The test failures usually look like this (though often at a random
> offset):
> 
> SECTION       -- default
> FSTYP         -- nfs
> PLATFORM      -- Linux/x86_64 kdevops-nfs-default 6.7.0-g2f76af849100
> #80 SMP PREEMPT_DYNAMIC Wed Jan 10 06:33:59 EST 2024
> MKFS_OPTIONS  -- kdevops-nfsd:/export/kdevops-nfs-default-fstests-s
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 kdevops-
> nfsd:/export/kdevops-nfs-default-fstests-s /media/scratch
> 
> generic/465 8s ... - output mismatch (see /data/fstests-
> install/xfstests/results/kdevops-nfs-default/6.7.0-
> g2f76af849100/default/generic/465.out.bad)
>     --- tests/generic/465.out	2024-01-10 06:39:53.500389434 -0500
>     +++ /data/fstests-install/xfstests/results/kdevops-nfs-
> default/6.7.0-g2f76af849100/default/generic/465.out.bad	2024-01-10
> 08:57:00.536146701 -0500
>     @@ -1,3 +1,4 @@
>      QA output created by 465
>      non-aio dio test
>     +encounter an error: block 117 offset 0, content 0
>      aio-dio test
>     ...
>     (Run 'diff -u /data/fstests-
> install/xfstests/tests/generic/465.out /data/fstests-
> install/xfstests/results/kdevops-nfs-default/6.7.0-
> g2f76af849100/default/generic/465.out.bad'  to see the entire diff)
> Ran: generic/465
> Failures: generic/465
> Failed 1 of 1 tests
> 
> The test kicks off a thread that tries to read the file using DIO
> while
> the parent task writes 1M blocks of 'a' to it sequentially using DIO.
> It
> expects that the reader will always see 'a' in the read result, or a
> short read. In the above case, it got back a read with '\0' in it.
> 
> The blocks in this test are 1M, so this block starts at offset
> 122683392. Looking at a capture, I caught this:
> 
> 65161  40.392338 192.168.122.173 → 192.168.122.227 NFS 3702 V4 Call
> WRITE StateID: 0x9e68 Offset: 123207680 Len: 524288 ; V4 Call
> READ_PLUS StateID: 0x9e68 Offset: 122683392 Len: 524288  ; V4 Call
> READ_PLUS StateID: 0x9e68 Offset: 123207680 Len: 524288
> 65171  40.393230 192.168.122.173 → 192.168.122.227 NFS 3286 V4 Call
> WRITE StateID: 0x9e68 Offset: 122683392 Len: 524288
> 65172  40.393401 192.168.122.227 → 192.168.122.173 NFS 182 V4 Reply
> (Call In 65161) WRITE
> 65181  40.394844 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply
> (Call In 65161) READ_PLUS
> 65195  40.395506 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply
> (Call In 65161) READ_PLUS
> 
> It looks like the DIO writes got reordered here so the size of the
> file
> probably increased briefly before the second write got processed, and
> the read_plus got processed in between the two.
> 
> While we might be able to force the client to send the WRITEs in
> order
> of increasing offset in this situation, the server is under no
> obligation to process concurrent RPCs in any particular order. I
> don't
> think this is fundamentally fixable due to that.
> 
> Am I wrong? If not, then I'll plan to send an fstests patch to skip
> this
> test on NFS.

This is an issue that we've known about for a while now. I regularly
see this test failing, and it is indeed because we do not serialise
reads vs writes in O_DIRECT. Until someone describes an application
that relies on that behaviour to, I'd prefer to avoid having to call
inode_dio_wait(inode) on the off chance that someone is doing an
O_DIRECT read from the same set of data that they are doing an O_DIRECT
write to.

IOW: I'd be happy to see us just skip that test.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux