Re: [RFC PATCH] NFSD: Encode COMPOUND operation status on page boundaries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 22, 2024 at 5:25 PM Rick Macklem <rick.macklem@xxxxxxxxx> wrote:
>
> On Sun, Dec 22, 2024 at 4:10 PM <cel@xxxxxxxxxx> wrote:
> >
> > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >
> > J. David reports an odd corruption of a READDIR reply sent to a
> > FreeBSD client.
> >
> > xdr_reserve_space() has to do a special trick when the @nbytes value
> > requests more space than there is in the current page of the XDR
> > buffer.
> >
> > In that case, xdr_reserve_space() returns a pointer to the start of
> > the next page, and then the next call to xdr_reserve_space() invokes
> > __xdr_commit_encode() to copy enough of the data item back into the
> > previous page to make that data item contiguous across the page
> > boundary.
> >
> > But we need to be careful in the case where buffer space is reserved
> > early for a data item that will be inserted into the buffer later.
> >
> > One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> > encoding buffer for each COMPOUND operation. However, a READDIR
> > result can sometimes encode file names so that there are only 4
> > bytes left at the end of the current XDR buffer page (though plenty
> > of pages are left to handle the remaining encoding tasks).
> >
> > If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> > then nfsd4_encode_operation() will reserve 8 bytes for the op number
> > (9) and the op status (usually NFS4_OK). In this weird case,
> > xdr_reserve_space() returns a pointer to byte zero of the next buffer
> > page, as it assumes the data item will be copied back into place (in
> > the previous page) on the next call to xdr_reserve_space().
> >
> > nfsd4_encode_operation() writes the op num into the buffer, then
> > saves the next 4-byte location for the op's status code. The next
> > xdr_reserve_space() call is part of GETATTR encoding, so the op num
> > gets copied back into the previous page, but the saved location for
> > the op status continues to point to the wrong spot in the current
> > XDR buffer page because __xdr_commit_encode() moved that data item.
> >
> > After GETATTR encoding is complete, nfsd4_encode_operation() writes
> > the op status over the first XDR data item in the GETATTR result.
> > The NFS4_OK status code (0) makes it look like there are zero items
> > in the GETATTR's attribute bitmask.
> I can confirm that this patch fixes the test case I have, which is based on
> J. David's reproducer.
>
> I also think the analysis sounds right. I had gotten to the point where
> I thought it was some oddball case related to xdr_reserve_space() and
> saw that the pointer used by GETATTR's nfsd4_encode_bitmap4() was
> 4 bytes into a page, for the broken case.
> (As an aside, it appears that "%p" is broken for 32bit arches. I needed to
> use "%x" with a (unsigned int) cast to printk() pointers. I doubt anyone  much
> cares about 32bits any more, but I might look to see if I can fix it.)
>
> Good work Chuck!
>
> >
> > The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> > across page boundaries") [2014] remarks that NFSD "can't handle a
> > new operation starting close to the end of a page." This behavior
> > appears to be one reason for that remark.
> >
> > Break up the reservation of the COMPOUND op num and op status data
> > items into two distinct 4-octet reservations. Thanks to XDR data
> > item alignment restrictions, a 4-octet buffer reservation can never
> > straddle a page boundary.
> I don't know enough about the code to comment w.r.t. whether or not this
> is a correct fix, although it seems to work for the test case.
>
> I will note that it is a pretty subtle trap and it would be nice if something
> could be done to avoid this coding mistake from happening again.
> All I can think of is defining a new function that must be used instead of
> xdr_reserve_space() if there will be other encoding calls done before the
> pointer is used. Something like xdr_reserve_u32_long_term(). (Chuck is
> much better at naming stuff than I am.)
> --> If your fix is correct, in general, this function would just call
> xdr_reserve_space(xdr, XDR_UNIT),
>       but it could also set a flag in the xdr structure so that it can
> only be done once until
>       the flag is cleared (by a function call when the code is done
> with the pointer), or something like that?
I take back this latter bit w.r.t. a flag. If the function only
allocates 4bytes,
it will never straddle pages and cause problems.

My suggestion would just clarify that "long term" pointers can only be
used for a 4 byte (XDR_UNIT) allocation.

rick

>
> Anyhow, others probably can suggest better changes that would avoid this trap?
>
> Good work, rick
>
> >
> > Reported-by: J David <j.david.lists@xxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > ---
> >  fs/nfsd/nfs4xdr.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 53fac037611c..8780da884197 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -5764,10 +5764,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> >         nfsd4_enc encoder;
> >         __be32 *p;
> >
> > -       p = xdr_reserve_space(xdr, 8);
> > +       if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> > +               goto release;
> > +       p = xdr_reserve_space(xdr, XDR_UNIT);
> >         if (!p)
> >                 goto release;
> > -       *p++ = cpu_to_be32(op->opnum);
> >         post_err_offset = xdr->buf->len;
> >
> >         if (op->opnum == OP_ILLEGAL)
> > --
> > 2.47.0
> >





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux