> On Jan 26, 2018, at 12:13 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > On Fri, Jan 26, 2018 at 2:29 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >> >> >>> On Jan 26, 2018, at 11:05 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>> >>> On Fri, Jan 26, 2018 at 1:23 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>> >>>>> On Jan 26, 2018, at 9:24 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>>> >>>>> Hi Bruce/Chuck, >>>>> >>>>> There is a problem with nfsd doing a WRITE of size 4093,4094,4095. To >>>>> reproduce, mount with RDMA and do a simple dd "dd if=/dev/zero >>>>> of=/mnt/testfile bs=4093 count=1". What happens is that nfsd fails to >>>>> parse GETATTR after the WRITE in the compound. It fails the operation >>>>> with ERR_OP_ILLEGAL. >>>>> >>>>> The problem happens for the values where XDR round up ends up rounding >>>>> up to the page size. I don't know if my fix is the appropriate way to >>>>> fix this but with it I don't get the error: >>>>> >>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >>>>> index 2c61c6b..a8489c3 100644 >>>>> --- a/fs/nfsd/nfs4xdr.c >>>>> +++ b/fs/nfsd/nfs4xdr.c >>>>> @@ -1289,11 +1289,12 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compounda >>>>> >>>>> len = XDR_QUADLEN(write->wr_buflen) << 2; >>>>> if (len >= avail) { >>>>> - int pages; >>>>> + int pages = 0; >>>>> >>>>> len -= avail; >>>>> >>>>> - pages = len >> PAGE_SHIFT; >>>>> + if (write->wr_buflen >= PAGE_SIZE) >>>>> + pages = len >> PAGE_SHIFT; >>>>> argp->pagelist += pages; >>>>> argp->pagelen -= pages * PAGE_SIZE; >>>>> len -= pages * PAGE_SIZE; >>>>> >>>>> So the problem is the using "len" instead of "write->wr_buflen" leads >>>>> for the values 4093,4094,4095 that are rounded up to 4096, it makes >>>>> pages=1 and the argp->pagelen ends up being a negative value leading >>>>> to problems of parsing the GETATTR. >>>> >>>> Would this also be a problem near any page boundary, say, a >>>> write length of 8191 bytes? >>>> >>>> Instead of using the rounded up "len", why not try this: >>>> >>>> - pages = len >> PAGE_SHIFT; >>>> + pages = write->wr_buflen >> PAGE_SHIFT; >>> >>> You are right. It needs to be that. Otherwise 8191 fails the same way. >>> >>>> And be sure to test with TCP as well. >>> >>> Sigh. It breaks normal (non-RDMA) mounts. I'll figure out why. >> >> OK. >> >> Remember that a Read chunk's length does not have to be >> rounded up. Maybe the transport needs to round up the >> length of the unmarshaled data content on behalf of the >> NFSv4 write decoder. >> > > The problem of simply taking write->wr_buflen was that len before that > could have been adjusted by avail value in then non-RDAM mounts. > > Again, I'm not sure if this is the right fix. But this one works for > both non-RDMA and RDMA mounts. > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 2c61c6b..3178997 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -1293,7 +1293,10 @@ static __be32 nfsd4_decode_opaque(struct > nfsd4_compoundargs *argp, struct xdr_ne > > len -= avail; > > - pages = len >> PAGE_SHIFT; > + if (!avail) > + pages = write->wr_buflen >> PAGE_SHIFT; > + else > + pages = len >> PAGE_SHIFT; > argp->pagelist += pages; > argp->pagelen -= pages * PAGE_SIZE; > len -= pages * PAGE_SIZE; This code has been around since 2012. Has it always been broken for RDMA? I suspect the root problem is in the transport, not here in the decoder. Can you bisect to find out where the problem started to occur? >>>>> If this looks OK, I can send a patch. >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html