On Fri, 28 Jul 2023, Chuck Lever wrote: > From: David Howells <dhowells@xxxxxxxxxx> > > nfsd_splice_actor() has a clause in its loop that chops up a compound page > into individual pages such that if the same page is seen twice in a row, it > is discarded the second time. This is a problem with the advent of > shmem_splice_read() as that inserts zero_pages into the pipe in lieu of > pages that aren't present in the pagecache. > > Fix this by assuming that the last page is being extended only if the > currently stored length + starting offset is not currently on a page > boundary. > > This can be tested by NFS-exporting a tmpfs filesystem on the test machine > and truncating it to more than a page in size (eg. truncate -s 8192) and > then reading it by NFS. The first page will be all zeros, but thereafter > garbage will be read. > > Note: I wonder if we can ever get a situation now where we get a splice > that gives us contiguous parts of a page in separate actor calls. As NFSD > can only be splicing from a file (I think), there are only three sources of > the page: copy_splice_read(), shmem_splice_read() and file_splice_read(). > The first allocates pages for the data it reads, so the problem cannot > occur; the second should never see a partial page; and the third waits for > each page to become available before we're allowed to read from it. > > Fixes: bd194b187115 ("shmem: Implement splice-read") > Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Signed-off-by: David Howells <dhowells@xxxxxxxxxx> > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > cc: Hugh Dickins <hughd@xxxxxxxxxx> > cc: Jens Axboe <axboe@xxxxxxxxx> > cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > cc: linux-nfs@xxxxxxxxxxxxxxx > cc: linux-fsdevel@xxxxxxxxxxxxxxx > cc: linux-mm@xxxxxxxxx > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfsd/vfs.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 59b7d60ae33e..ee3bbaa79478 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -956,10 +956,13 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > /* > - * Skip page replacement when extending the contents > - * of the current page. > + * Skip page replacement when extending the contents of the > + * current page. But note that we may get two zero_pages in a > + * row from shmem. > */ > - if (page == *(rqstp->rq_next_page - 1)) > + if (page == *(rqstp->rq_next_page - 1) && > + offset_in_page(rqstp->rq_res.page_base + > + rqstp->rq_res.page_len)) This seems fragile in that it makes assumptions about the pages being sent and their alignment. Given that it was broken by the splice-read change, that confirms it is fragile. Maybe we could make the code a bit more explicit about what is expected. Also, I don't think this test can ever be relevant after the first time through the loop. So I think it would be clearest to have the interesting case outside the loop. page += offset / PAGE_SIZE; if (rqstp->rq_res.pages_len > 0) { /* appending to page list - check alignment */ if (offset % PAGE_SIZE != (rqstp->rq_res.page_base + rqstp-.rq_res.page_len) % PAGE_SIZE) return -EIO; if (offset % PAGE_SIZE != 0) { /* continuing previous page */ if (page != rqstp->rq_next_page[-1]) return -EIO; page += 1; } } else /* Starting new page list */ rqstp->rq_res.page_base = offset % PAGE_SIZE; for ( ; page <= last_page ; page++) if (unlikely(!svc_rqst_replace_page(rqstp, page))) return -EIO; rqstp->rq_res.page_len += sd->len; return sd->len; Also, the name "svc_rqst_replace_page" doesn't give any hint that the next_page pointer is advanced. Maybe svc_rqst_add_page() ??? Not great I admit. NeilBrown > continue; > if (unlikely(!svc_rqst_replace_page(rqstp, page))) > return -EIO; > > >