Re: [RFC PATCH] NFSD: Encode COMPOUND operation status on page boundaries

Rick Macklem <rick.macklem@xxxxxxxxx> · Sun, 22 Dec 2024 17:25:45 -0800

On Sun, Dec 22, 2024 at 4:10 PM <cel@xxxxxxxxxx> wrote:
>
> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>
> J. David reports an odd corruption of a READDIR reply sent to a
> FreeBSD client.
>
> xdr_reserve_space() has to do a special trick when the @nbytes value
> requests more space than there is in the current page of the XDR
> buffer.
>
> In that case, xdr_reserve_space() returns a pointer to the start of
> the next page, and then the next call to xdr_reserve_space() invokes
> __xdr_commit_encode() to copy enough of the data item back into the
> previous page to make that data item contiguous across the page
> boundary.
>
> But we need to be careful in the case where buffer space is reserved
> early for a data item that will be inserted into the buffer later.
>
> One such caller, nfsd4_encode_operation(), reserves 8 bytes in the
> encoding buffer for each COMPOUND operation. However, a READDIR
> result can sometimes encode file names so that there are only 4
> bytes left at the end of the current XDR buffer page (though plenty
> of pages are left to handle the remaining encoding tasks).
>
> If a COMPOUND operation follows the READDIR result (say, a GETATTR),
> then nfsd4_encode_operation() will reserve 8 bytes for the op number
> (9) and the op status (usually NFS4_OK). In this weird case,
> xdr_reserve_space() returns a pointer to byte zero of the next buffer
> page, as it assumes the data item will be copied back into place (in
> the previous page) on the next call to xdr_reserve_space().
>
> nfsd4_encode_operation() writes the op num into the buffer, then
> saves the next 4-byte location for the op's status code. The next
> xdr_reserve_space() call is part of GETATTR encoding, so the op num
> gets copied back into the previous page, but the saved location for
> the op status continues to point to the wrong spot in the current
> XDR buffer page because __xdr_commit_encode() moved that data item.
>
> After GETATTR encoding is complete, nfsd4_encode_operation() writes
> the op status over the first XDR data item in the GETATTR result.
> The NFS4_OK status code (0) makes it look like there are zero items
> in the GETATTR's attribute bitmask.
I can confirm that this patch fixes the test case I have, which is based on
J. David's reproducer.

I also think the analysis sounds right. I had gotten to the point where
I thought it was some oddball case related to xdr_reserve_space() and
saw that the pointer used by GETATTR's nfsd4_encode_bitmap4() was
4 bytes into a page, for the broken case.
(As an aside, it appears that "%p" is broken for 32bit arches. I needed to
use "%x" with a (unsigned int) cast to printk() pointers. I doubt anyone  much
cares about 32bits any more, but I might look to see if I can fix it.)

Good work Chuck!

>
> The patch description of commit 2825a7f90753 ("nfsd4: allow encoding
> across page boundaries") [2014] remarks that NFSD "can't handle a
> new operation starting close to the end of a page." This behavior
> appears to be one reason for that remark.
>
> Break up the reservation of the COMPOUND op num and op status data
> items into two distinct 4-octet reservations. Thanks to XDR data
> item alignment restrictions, a 4-octet buffer reservation can never
> straddle a page boundary.
I don't know enough about the code to comment w.r.t. whether or not this
is a correct fix, although it seems to work for the test case.

I will note that it is a pretty subtle trap and it would be nice if something
could be done to avoid this coding mistake from happening again.
All I can think of is defining a new function that must be used instead of
xdr_reserve_space() if there will be other encoding calls done before the
pointer is used. Something like xdr_reserve_u32_long_term(). (Chuck is
much better at naming stuff than I am.)
--> If your fix is correct, in general, this function would just call
xdr_reserve_space(xdr, XDR_UNIT),
      but it could also set a flag in the xdr structure so that it can
only be done once until
      the flag is cleared (by a function call when the code is done
with the pointer), or something like that?

Anyhow, others probably can suggest better changes that would avoid this trap?

Good work, rick

>
> Reported-by: J David <j.david.lists@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> ---
>  fs/nfsd/nfs4xdr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 53fac037611c..8780da884197 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -5764,10 +5764,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>         nfsd4_enc encoder;
>         __be32 *p;
>
> -       p = xdr_reserve_space(xdr, 8);
> +       if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT)
> +               goto release;
> +       p = xdr_reserve_space(xdr, XDR_UNIT);
>         if (!p)
>                 goto release;
> -       *p++ = cpu_to_be32(op->opnum);
>         post_err_offset = xdr->buf->len;
>
>         if (op->opnum == OP_ILLEGAL)
> --
> 2.47.0
>