> On Apr 6, 2022, at 8:18 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > On Wed, 6 Apr 2022, Chuck Lever III wrote: > >> Good day, Hugh- > > Huh! If you were really wishing me a good day, would you tell me this ;-? > >> >> I noticed that several fsx-related tests in the xfstests suite are >> failing after updating my NFS server to v5.18-rc1. I normally test >> against xfs, ext4, btrfs, and tmpfs exports. tmpfs is the only export >> that sees these new failures: >> >> generic/075 2s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/075.out.bad) >> --- tests/generic/075.out 2014-02-13 15:40:45.000000000 -0500 >> +++ /home/cel/src/xfstests/results//generic/075.out.bad 2022-04-05 16:39:59.145991520 -0400 >> @@ -4,15 +4,5 @@ >> ----------------------------------------------- >> fsx.0 : -d -N numops -S 0 >> ----------------------------------------------- >> - >> ------------------------------------------------ >> -fsx.1 : -d -N numops -S 0 -x >> ------------------------------------------------ >> ... >> (Run 'diff -u /home/cel/src/xfstests/tests/generic/075.out /home/cel/src/xfstests/results//generic/075.out.bad' to see the entire diff) >> >> generic/091 9s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/091.out.bad) >> --- tests/generic/091.out 2014-02-13 15:40:45.000000000 -0500 >> +++ /home/cel/src/xfstests/results//generic/091.out.bad 2022-04-05 16:41:24.329063277 -0400 >> @@ -1,7 +1,75 @@ >> QA output created by 091 >> fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W >> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W >> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W >> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W >> -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W >> -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W >> ... >> (Run 'diff -u /home/cel/src/xfstests/tests/generic/091.out /home/cel/src/xfstests/results//generic/091.out.bad' to see the entire diff) >> >> generic/112 2s ... [failed, exit status 1]- output mismatch (see /home/cel/src/xfstests/results//generic/112.out.bad) >> --- tests/generic/112.out 2014-02-13 15:40:45.000000000 -0500 >> +++ /home/cel/src/xfstests/results//generic/112.out.bad 2022-04-05 16:41:38.511075170 -0400 >> @@ -4,15 +4,4 @@ >> ----------------------------------------------- >> fsx.0 : -A -d -N numops -S 0 >> ----------------------------------------------- >> - >> ------------------------------------------------ >> -fsx.1 : -A -d -N numops -S 0 -x >> ------------------------------------------------ >> ... >> (Run 'diff -u /home/cel/src/xfstests/tests/generic/112.out /home/cel/src/xfstests/results//generic/112.out.bad' to see the entire diff) >> >> generic/127 49s ... - output mismatch (see /home/cel/src/xfstests/results//generic/127.out.bad) >> --- tests/generic/127.out 2016-08-28 12:16:20.000000000 -0400 >> +++ /home/cel/src/xfstests/results//generic/127.out.bad 2022-04-05 16:42:07.655099652 -0400 >> @@ -4,10 +4,198 @@ >> === FSX Light Mode, Memory Mapping === >> All 100000 operations completed A-OK! >> === FSX Standard Mode, No Memory Mapping === >> -All 100000 operations completed A-OK! >> +ltp/fsx -q -l 262144 -o 65536 -S 191110531 -N 100000 -R -W fsx_std_nommap >> +READ BAD DATA: offset = 0x9cb7, size = 0xfae3, fname = /tmp/mnt/manet.ib-2323703/fsx_std_nommap >> +OFFSET GOOD BAD RANGE >> ... >> (Run 'diff -u /home/cel/src/xfstests/tests/generic/127.out /home/cel/src/xfstests/results//generic/127.out.bad' to see the entire diff) >> >> I bisected the problem to: >> >> 56a8c8eb1eaf ("tmpfs: do not allocate pages on read") >> >> generic/075 fails almost immediately without any NFS-level errors. >> Likely this is data corruption rather than an overt I/O error. > > That's sad. Thanks for bisecting and reporting. Sorry for the nuisance. > > I suspect this patch is heading for a revert, because I shall not have > time to debug and investigate. Cc'ing fsdevel and a few people who have > an interest in it, to warn of that likely upcoming revert. > > But if it's okay with everyone, please may we leave it in for -rc2? > Given that having it in -rc1 already smoked out another issue (problem > of SetPageUptodate(ZERO_PAGE(0)) without CONFIG_MMU), I think keeping > it in a little longer might smoke out even more. > > The xfstests info above doesn't actually tell very much, beyond that > generic/075 generic/091 generic/112 generic/127, each a test with fsx, > all fall at their first hurdle. If you have time, please rerun and > tar up the results/generic directory (maybe filter just those failing) > and send as attachment. But don't go to any trouble, it's unlikely > that I shall even untar it - it would be mainly to go on record if > anyone has time to look into it later. And, frankly, it's unlikely > to tell us anything more enlightening, than that the data seen was > not as expected: which we do already know. > > I've had no problem with xfstests generic 075,091,112,127 testing > tmpfs here, not before and not in the month or two I've had that > patch in: so it's something in the way that NFS exercises tmpfs > that reveals it. If I had time to duplicate your procedure, I'd be > asking for detailed instructions: but no, I won't have a chance. > > But I can sit here and try to guess. I notice fs/nfsd checks > file->f_op->splice_read, and employs fallback if not available: > if you have time, please try rerunning those xfstests on an -rc1 > kernel, but with mm/shmem.c's .splice_read line commented out. > My guess is that will then pass the tests, and we shall know more. This seemed like the most probative next step, so I commented out the .splice_read call-out in mm/shmem.c and ran the tests again. Yes, that change enables the fsx-related tests to pass as expected. > What could be going wrong there? I've thought of two possibilities. > A minor, hopefully easily fixed, issue would be if fs/nfsd has > trouble with seeing the same page twice in a row: since tmpfs is > now using the ZERO_PAGE(0) for all pages of a hole, and I think I > caught sight of code which looks to see if the latest page is the > same as the one before. It's easy to imagine that might go wrong. Are you referring to this function in fs/nfsd/vfs.c ? 847 static int 848 nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, 849 struct splice_desc *sd) 850 { 851 struct svc_rqst *rqstp = sd->u.data; 852 struct page **pp = rqstp->rq_next_page; 853 struct page *page = buf->page; 854 855 if (rqstp->rq_res.page_len == 0) { 856 svc_rqst_replace_page(rqstp, page); 857 rqstp->rq_res.page_base = buf->offset; 858 } else if (page != pp[-1]) { 859 svc_rqst_replace_page(rqstp, page); 860 } 861 rqstp->rq_res.page_len += sd->len; 862 863 return sd->len; 864 } rq_next_page should point to the first unused element of rqstp->rq_pages, so IIUC that check is looking for the final page that is part of the READ payload. But that does suggest that if page -> ZERO_PAGE and so does pp[-1], then svc_rqst_replace_page() would not be invoked. > A more difficult issue would be, if fsx is racing writes and reads, > in a way that it can guarantee the correct result, but that correct > result is no longer delivered: because the writes go into freshly > allocated tmpfs cache pages, while reads are still delivering > stale ZERO_PAGEs from the pipe. I'm hazy on the guarantees there. > > But unless someone has time to help out, we're heading for a revert. > > Thanks, > Hugh -- Chuck Lever