Re: fstests generic/465 failures on NFS

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



Hi Jeff,

On Wed, Jan 10, 2024 at 9:30 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> I've been seeing some failures of generic/465 across all NFS versions
> for a long time. I finally had some time to track down the cause, but
> I'm not quite sure whether it's fixable.
>
> The test failures usually look like this (though often at a random
> offset):
>
> SECTION       -- default
> FSTYP         -- nfs
> PLATFORM      -- Linux/x86_64 kdevops-nfs-default 6.7.0-g2f76af849100 #80 SMP PREEMPT_DYNAMIC Wed Jan 10 06:33:59 EST 2024
> MKFS_OPTIONS  -- kdevops-nfsd:/export/kdevops-nfs-default-fstests-s
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 kdevops-nfsd:/export/kdevops-nfs-default-fstests-s /media/scratch
>
> generic/465 8s ... - output mismatch (see /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad)
>     --- tests/generic/465.out   2024-01-10 06:39:53.500389434 -0500
>     +++ /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad      2024-01-10 08:57:00.536146701 -0500
>     @@ -1,3 +1,4 @@
>      QA output created by 465
>      non-aio dio test
>     +encounter an error: block 117 offset 0, content 0

Looking through my test history, I have this one mostly passing but
with the occasional failure that looks like this.

>      aio-dio test
>     ...
>     (Run 'diff -u /data/fstests-install/xfstests/tests/generic/465.out /data/fstests-install/xfstests/results/kdevops-nfs-default/6.7.0-g2f76af849100/default/generic/465.out.bad'  to see the entire diff)
> Ran: generic/465
> Failures: generic/465
> Failed 1 of 1 tests
>
> The test kicks off a thread that tries to read the file using DIO while
> the parent task writes 1M blocks of 'a' to it sequentially using DIO. It
> expects that the reader will always see 'a' in the read result, or a
> short read. In the above case, it got back a read with '\0' in it.
>
> The blocks in this test are 1M, so this block starts at offset
> 122683392. Looking at a capture, I caught this:
>
> 65161  40.392338 192.168.122.173 → 192.168.122.227 NFS 3702 V4 Call WRITE StateID: 0x9e68 Offset: 123207680 Len: 524288 ; V4 Call READ_PLUS StateID: 0x9e68 Offset: 122683392 Len: 524288  ; V4 Call READ_PLUS StateID: 0x9e68 Offset: 123207680 Len: 524288
> 65171  40.393230 192.168.122.173 → 192.168.122.227 NFS 3286 V4 Call WRITE StateID: 0x9e68 Offset: 122683392 Len: 524288
> 65172  40.393401 192.168.122.227 → 192.168.122.173 NFS 182 V4 Reply (Call In 65161) WRITE
> 65181  40.394844 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply (Call In 65161) READ_PLUS
> 65195  40.395506 192.168.122.227 → 192.168.122.173 NFS 6794 V4 Reply (Call In 65161) READ_PLUS
>
> It looks like the DIO writes got reordered here so the size of the file
> probably increased briefly before the second write got processed, and
> the read_plus got processed in between the two.
>
> While we might be able to force the client to send the WRITEs in order
> of increasing offset in this situation, the server is under no
> obligation to process concurrent RPCs in any particular order. I don't
> think this is fundamentally fixable due to that.
>
> Am I wrong? If not, then I'll plan to send an fstests patch to skip this
> test on NFS.

I'm cool with this one being skipped. I'm assuming it passing in my
setup is mostly accidental, which means it's not a very useful test.

Anna

> --
> Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux