Re: [PATCH 3/3] NFS: Avoid copy of xdr padding in read()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-11-19 at 17:58 -0500, Chuck Lever wrote:
> 
> 
> > On Nov 19, 2020, at 9:34 AM, Trond Myklebust <
> > trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2020-11-19 at 09:31 -0500, Chuck Lever wrote:
> > > 
> > > 
> > > > On Nov 19, 2020, at 9:30 AM, Trond Myklebust <
> > > > trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > On Thu, 2020-11-19 at 09:17 -0500, Chuck Lever wrote:
> > > > > 
> > > > > 
> > > > > > On Nov 18, 2020, at 5:19 PM, trondmy@xxxxxxxxxx wrote:
> > > > > > 
> > > > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > > 
> > > > > > When doing a read() into a page, we also don't care if the
> > > > > > nul
> > > > > > padding
> > > > > > stays in that last page when the data length is not 32-bit
> > > > > > aligned.
> > > > > 
> > > > > What if the READ payload lands in the middle of a file? The
> > > > > pad on the end will overwrite file content just past where
> > > > > the READ payload lands.
> > > > 
> > > > If the size > buf->page_len, then it gets truncated in
> > > > xdr_align_pages() afaik.
> > > 
> > > I will need to check how RPC/RDMA behaves. It might build a
> > > chunk that includes the pad in this case, which would break
> > > things.
> > 
> > That would be a bug in the existing code too, then. It shouldn't be
> > writing beyond the buffer size we set in the NFS layer.
> 
> Testing now with xfstests, which should include fsx with direct
> I/O of odd sizes. So far I haven't seen any unexpected behavior.
> 
> But I'm not sure what copy you're trying to avoid. This one in
> xdr_align_pages() ?
> 
> 1189         else if (nwords < xdr->nwords) {
> 1190                 /* Truncate page data and move it into the tail
> */
> 1191                 offset = buf->page_len - len;
> 1192                 copied = xdr_shrink_pagelen(buf, offset);
> 1193                 trace_rpc_xdr_alignment(xdr, offset, copied);
> 1194                 xdr->nwords = XDR_QUADLEN(buf->len - cur);
> 1195         }
> 
> We set up the receive buffer already to avoid this copy. It should
> rarely, if ever, happen. That's the point of
> rpc_prepare_reply_pages().


...and the point of padding here is to avoid unaligned access to
memory. That is completely broken by this whole mechanism, which causes
us to place the real data in the tail in an unaligned buffer that
follows this padding.

Furthermore, rpc_prepare_reply_pages() only ever places the padding in
the tail _if_ our buffer size is already not 32-bit aligned. Otherwise,
we're engaging in this pointless exercise of making the tail buffer
data unaligned after the fact.

> 
> > > > > > Signed-off-by: Trond Myklebust
> > > > > > <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > > fs/nfs/nfs2xdr.c | 2 +-
> > > > > > fs/nfs/nfs3xdr.c | 2 +-
> > > > > > fs/nfs/nfs4xdr.c | 2 +-
> > > > > > 3 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> > > > > > index db9c265ad9e1..468bfbfe44d7 100644
> > > > > > --- a/fs/nfs/nfs2xdr.c
> > > > > > +++ b/fs/nfs/nfs2xdr.c
> > > > > > @@ -102,7 +102,7 @@ static int decode_nfsdata(struct
> > > > > > xdr_stream
> > > > > > *xdr, struct nfs_pgio_res *result)
> > > > > >         if (unlikely(!p))
> > > > > >                 return -EIO;
> > > > > >         count = be32_to_cpup(p);
> > > > > > -       recvd = xdr_read_pages(xdr, count);
> > > > > > +       recvd = xdr_read_pages(xdr, xdr_align_size(count));
> > > > > >         if (unlikely(count > recvd))
> > > > > >                 goto out_cheating;
> > > > > > out:
> > > > > > diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> > > > > > index d3e1726d538b..8ef7c961d3e2 100644
> > > > > > --- a/fs/nfs/nfs3xdr.c
> > > > > > +++ b/fs/nfs/nfs3xdr.c
> > > > > > @@ -1611,7 +1611,7 @@ static int decode_read3resok(struct
> > > > > > xdr_stream *xdr,
> > > > > >         ocount = be32_to_cpup(p++);
> > > > > >         if (unlikely(ocount != count))
> > > > > >                 goto out_mismatch;
> > > > > > -       recvd = xdr_read_pages(xdr, count);
> > > > > > +       recvd = xdr_read_pages(xdr, xdr_align_size(count));
> > > > > >         if (unlikely(count > recvd))
> > > > > >                 goto out_cheating;
> > > > > > out:
> > > > > > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > > > > > index 755b556e85c3..5baa767106dc 100644
> > > > > > --- a/fs/nfs/nfs4xdr.c
> > > > > > +++ b/fs/nfs/nfs4xdr.c
> > > > > > @@ -5202,7 +5202,7 @@ static int decode_read(struct
> > > > > > xdr_stream
> > > > > > *xdr, struct rpc_rqst *req,
> > > > > >                 return -EIO;
> > > > > >         eof = be32_to_cpup(p++);
> > > > > >         count = be32_to_cpup(p);
> > > > > > -       recvd = xdr_read_pages(xdr, count);
> > > > > > +       recvd = xdr_read_pages(xdr, xdr_align_size(count));
> > > > > >         if (count > recvd) {
> > > > > >                 dprintk("NFS: server cheating in read
> > > > > > reply: "
> > > > > >                                 "count %u > recvd %u\n",
> > > > > > count,
> > > > > > recvd);
> > > > > > -- 
> > > > > > 2.28.0
> > 
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@xxxxxxxxxxxxxxx
> > 
> > 
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux