On Sat, Jul 29, 2023 at 09:54:58AM +1000, NeilBrown wrote: > On Fri, 28 Jul 2023, Chuck Lever wrote: > > From: David Howells <dhowells@xxxxxxxxxx> > > > > nfsd_splice_actor() has a clause in its loop that chops up a compound page > > into individual pages such that if the same page is seen twice in a row, it > > is discarded the second time. This is a problem with the advent of > > shmem_splice_read() as that inserts zero_pages into the pipe in lieu of > > pages that aren't present in the pagecache. > > > > Fix this by assuming that the last page is being extended only if the > > currently stored length + starting offset is not currently on a page > > boundary. > > > > This can be tested by NFS-exporting a tmpfs filesystem on the test machine > > and truncating it to more than a page in size (eg. truncate -s 8192) and > > then reading it by NFS. The first page will be all zeros, but thereafter > > garbage will be read. > > > > Note: I wonder if we can ever get a situation now where we get a splice > > that gives us contiguous parts of a page in separate actor calls. As NFSD > > can only be splicing from a file (I think), there are only three sources of > > the page: copy_splice_read(), shmem_splice_read() and file_splice_read(). > > The first allocates pages for the data it reads, so the problem cannot > > occur; the second should never see a partial page; and the third waits for > > each page to become available before we're allowed to read from it. > > > > Fixes: bd194b187115 ("shmem: Implement splice-read") > > Reported-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > Signed-off-by: David Howells <dhowells@xxxxxxxxxx> > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > > cc: Hugh Dickins <hughd@xxxxxxxxxx> > > cc: Jens Axboe <axboe@xxxxxxxxx> > > cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > cc: linux-nfs@xxxxxxxxxxxxxxx > > cc: linux-fsdevel@xxxxxxxxxxxxxxx > > cc: linux-mm@xxxxxxxxx > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > --- > > fs/nfsd/vfs.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 59b7d60ae33e..ee3bbaa79478 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -956,10 +956,13 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > > last_page = page + (offset + sd->len - 1) / PAGE_SIZE; > > for (page += offset / PAGE_SIZE; page <= last_page; page++) { > > /* > > - * Skip page replacement when extending the contents > > - * of the current page. > > + * Skip page replacement when extending the contents of the > > + * current page. But note that we may get two zero_pages in a > > + * row from shmem. > > */ > > - if (page == *(rqstp->rq_next_page - 1)) > > + if (page == *(rqstp->rq_next_page - 1) && > > + offset_in_page(rqstp->rq_res.page_base + > > + rqstp->rq_res.page_len)) > > This seems fragile in that it makes assumptions about the pages being > sent and their alignment. > Given that it was broken by the splice-read change, that confirms it is > fragile. Maybe we could make the code a bit more explicit about what is > expected. Indeed, this code is brittle. This is not even the only time the actor has been broken in the past four or five kernel releases. IMO the problem is that there is no API contract or documentation for splice actors. And as far as I am aware, only a few other examples are in use to learn from. > Also, I don't think this test can ever be relevant after the first time > through the loop. So I think it would be clearest to have the > interesting case outside the loop. > > page += offset / PAGE_SIZE; > if (rqstp->rq_res.pages_len > 0) { > /* appending to page list - check alignment */ > if (offset % PAGE_SIZE != (rqstp->rq_res.page_base + > rqstp-.rq_res.page_len) % PAGE_SIZE) > return -EIO; > if (offset % PAGE_SIZE != 0) { > /* continuing previous page */ > if (page != rqstp->rq_next_page[-1]) > return -EIO; > page += 1; > } > } else > /* Starting new page list */ > rqstp->rq_res.page_base = offset % PAGE_SIZE; > > for ( ; page <= last_page ; page++) > if (unlikely(!svc_rqst_replace_page(rqstp, page))) > return -EIO; > > rqstp->rq_res.page_len += sd->len; > return sd->len; > > > Also, the name "svc_rqst_replace_page" doesn't give any hint that the > next_page pointer is advanced. Maybe svc_rqst_add_page() ??? Not great > I admit. All reasonable suggestions. However, I'm getting ready to replace the splice read code with... je ne ce pas. - There are reports that splice read doesn't perform well - It's a brittle piece of engineering, as observed - The "zero copy" read path will need to support folios, hopefully sooner rather than later - We want the server's read path to use iomap when that is more broadly available in local filesystems - This fix is destined for 6.5-rc, which limits the amount of clean up and optimization we should be doing I'd like to apply David's fix as-is, unless it's truly broken or someone has a better quick solution. > > continue; > > if (unlikely(!svc_rqst_replace_page(rqstp, page))) > > return -EIO; > > > > > > > -- Chuck Lever