On Sun, Dec 22, 2024 at 5:25 PM Rick Macklem <rick.macklem@xxxxxxxxx> wrote: > > On Sun, Dec 22, 2024 at 4:10 PM <cel@xxxxxxxxxx> wrote: > > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > J. David reports an odd corruption of a READDIR reply sent to a > > FreeBSD client. > > > > xdr_reserve_space() has to do a special trick when the @nbytes value > > requests more space than there is in the current page of the XDR > > buffer. > > > > In that case, xdr_reserve_space() returns a pointer to the start of > > the next page, and then the next call to xdr_reserve_space() invokes > > __xdr_commit_encode() to copy enough of the data item back into the > > previous page to make that data item contiguous across the page > > boundary. > > > > But we need to be careful in the case where buffer space is reserved > > early for a data item that will be inserted into the buffer later. > > > > One such caller, nfsd4_encode_operation(), reserves 8 bytes in the > > encoding buffer for each COMPOUND operation. However, a READDIR > > result can sometimes encode file names so that there are only 4 > > bytes left at the end of the current XDR buffer page (though plenty > > of pages are left to handle the remaining encoding tasks). > > > > If a COMPOUND operation follows the READDIR result (say, a GETATTR), > > then nfsd4_encode_operation() will reserve 8 bytes for the op number > > (9) and the op status (usually NFS4_OK). In this weird case, > > xdr_reserve_space() returns a pointer to byte zero of the next buffer > > page, as it assumes the data item will be copied back into place (in > > the previous page) on the next call to xdr_reserve_space(). > > > > nfsd4_encode_operation() writes the op num into the buffer, then > > saves the next 4-byte location for the op's status code. The next > > xdr_reserve_space() call is part of GETATTR encoding, so the op num > > gets copied back into the previous page, but the saved location for > > the op status continues to point to the wrong spot in the current > > XDR buffer page because __xdr_commit_encode() moved that data item. > > > > After GETATTR encoding is complete, nfsd4_encode_operation() writes > > the op status over the first XDR data item in the GETATTR result. > > The NFS4_OK status code (0) makes it look like there are zero items > > in the GETATTR's attribute bitmask. > I can confirm that this patch fixes the test case I have, which is based on > J. David's reproducer. > > I also think the analysis sounds right. I had gotten to the point where > I thought it was some oddball case related to xdr_reserve_space() and > saw that the pointer used by GETATTR's nfsd4_encode_bitmap4() was > 4 bytes into a page, for the broken case. > (As an aside, it appears that "%p" is broken for 32bit arches. I needed to > use "%x" with a (unsigned int) cast to printk() pointers. I doubt anyone much > cares about 32bits any more, but I might look to see if I can fix it.) > > Good work Chuck! > > > > > The patch description of commit 2825a7f90753 ("nfsd4: allow encoding > > across page boundaries") [2014] remarks that NFSD "can't handle a > > new operation starting close to the end of a page." This behavior > > appears to be one reason for that remark. > > > > Break up the reservation of the COMPOUND op num and op status data > > items into two distinct 4-octet reservations. Thanks to XDR data > > item alignment restrictions, a 4-octet buffer reservation can never > > straddle a page boundary. > I don't know enough about the code to comment w.r.t. whether or not this > is a correct fix, although it seems to work for the test case. > > I will note that it is a pretty subtle trap and it would be nice if something > could be done to avoid this coding mistake from happening again. > All I can think of is defining a new function that must be used instead of > xdr_reserve_space() if there will be other encoding calls done before the > pointer is used. Something like xdr_reserve_u32_long_term(). (Chuck is > much better at naming stuff than I am.) > --> If your fix is correct, in general, this function would just call > xdr_reserve_space(xdr, XDR_UNIT), > but it could also set a flag in the xdr structure so that it can > only be done once until > the flag is cleared (by a function call when the code is done > with the pointer), or something like that? I take back this latter bit w.r.t. a flag. If the function only allocates 4bytes, it will never straddle pages and cause problems. My suggestion would just clarify that "long term" pointers can only be used for a 4 byte (XDR_UNIT) allocation. rick > > Anyhow, others probably can suggest better changes that would avoid this trap? > > Good work, rick > > > > > Reported-by: J David <j.david.lists@xxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > --- > > fs/nfsd/nfs4xdr.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > index 53fac037611c..8780da884197 100644 > > --- a/fs/nfsd/nfs4xdr.c > > +++ b/fs/nfsd/nfs4xdr.c > > @@ -5764,10 +5764,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > > nfsd4_enc encoder; > > __be32 *p; > > > > - p = xdr_reserve_space(xdr, 8); > > + if (xdr_stream_encode_u32(xdr, op->opnum) != XDR_UNIT) > > + goto release; > > + p = xdr_reserve_space(xdr, XDR_UNIT); > > if (!p) > > goto release; > > - *p++ = cpu_to_be32(op->opnum); > > post_err_offset = xdr->buf->len; > > > > if (op->opnum == OP_ILLEGAL) > > -- > > 2.47.0 > >