On Fri, Jun 24, 2022 at 1:40 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > Hi Anna! > > > On Jun 24, 2022, at 11:47 AM, Anna Schumaker <anna@xxxxxxxxxx> wrote: > > > > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > > > If we were in a HOLE segement, but vfs_llseek() claimed we were encoding > > DATA then we would switch over to the DATA encoding function. This > > conflicts with Chuck's latest xdr cleanup patches and can result in a > > crash or silent hang. Let's just return nfserr_io if we find we are in > > this situation, which will cause the encoder to return to the client > > with the number of segments already encoded. The client can then try the > > READ_PLUS call again. > > > > Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()) > > checkpatch complains about this tag. Okay. I'll look into that. > > Also, Bruce said he wasn't able to reproduce the issue on > 6c254bf3b637, only on the whole series. Were you able to hit > it with just this commit applied? I've been having trouble hitting it in general. I think I only got an oops once, but most of the time it just silently hangs. I'll try a bit longer to see if I can hit it again. > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > I find this somewhat problematic. I can't apply this patch in > good conscience: > > * The usual guideline for applying a workaround upstream is > that there's been a demonstration that it will be impossible > to find and fix the real problem. I don't see that here. > > * We still don't understand the failure. The XDR code itself > might be broken? Therefore we don't understand if this > workaround is 100% effective > > * Usually with a workaround, there's a hint of a long-term > fix... is this the long-term fix? No, the long term fix is my new READ_PLUS code. I'm hoping to have a v2 ready soon, but I tested v1 and didn't have any problem (no oops or silent hang) > > In other words, I might give this patch to a customer who > needed to get back on her feet quickly. I'm hesitant to take > it as an upstream change without further justification. > > IMHO a fix in an XDR consumer (like READ_PLUS) needs to > demonstrate that the current XDR code is working as designed. > Otherwise, the XDR code itself is what needs to be fixed. > > > > --- > > fs/nfsd/nfs4xdr.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > index 61b2aae81abb..dc97d7c7e595 100644 > > --- a/fs/nfsd/nfs4xdr.c > > +++ b/fs/nfsd/nfs4xdr.c > > @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, > > if (data_pos == -ENXIO) > > data_pos = f_size; > > else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) > > - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); > > + return nfserr_io; > > count = data_pos - read->rd_offset; > > > > /* Content type, offset, byte count */ > > -- > > 2.36.1 > > > > -- > Chuck Lever > > >