Re: problem on nfsd doing RDMA write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 26, 2018, at 12:13 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Fri, Jan 26, 2018 at 2:29 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>>> On Jan 26, 2018, at 11:05 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> 
>>> On Fri, Jan 26, 2018 at 1:23 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>>> On Jan 26, 2018, at 9:24 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>> 
>>>>> Hi Bruce/Chuck,
>>>>> 
>>>>> There is a problem with nfsd doing a WRITE of size 4093,4094,4095. To
>>>>> reproduce, mount with RDMA and do a simple dd "dd if=/dev/zero
>>>>> of=/mnt/testfile bs=4093 count=1". What happens is that nfsd fails to
>>>>> parse GETATTR after the WRITE in the compound. It fails the operation
>>>>> with ERR_OP_ILLEGAL.
>>>>> 
>>>>> The problem happens for the values where XDR round up ends up rounding
>>>>> up to the page size. I don't know if my fix is the appropriate way to
>>>>> fix this but with it I don't get the error:
>>>>> 
>>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>>>> index 2c61c6b..a8489c3 100644
>>>>> --- a/fs/nfsd/nfs4xdr.c
>>>>> +++ b/fs/nfsd/nfs4xdr.c
>>>>> @@ -1289,11 +1289,12 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compounda
>>>>> 
>>>>>      len = XDR_QUADLEN(write->wr_buflen) << 2;
>>>>>      if (len >= avail) {
>>>>> -               int pages;
>>>>> +               int pages = 0;
>>>>> 
>>>>>              len -= avail;
>>>>> 
>>>>> -               pages = len >> PAGE_SHIFT;
>>>>> +               if (write->wr_buflen >= PAGE_SIZE)
>>>>> +                       pages = len >> PAGE_SHIFT;
>>>>>              argp->pagelist += pages;
>>>>>              argp->pagelen -= pages * PAGE_SIZE;
>>>>>              len -= pages * PAGE_SIZE;
>>>>> 
>>>>> So the problem is the using "len" instead of "write->wr_buflen" leads
>>>>> for the values 4093,4094,4095 that are rounded up to 4096, it makes
>>>>> pages=1 and the argp->pagelen ends up being a negative value leading
>>>>> to problems of parsing the GETATTR.
>>>> 
>>>> Would this also be a problem near any page boundary, say, a
>>>> write length of 8191 bytes?
>>>> 
>>>> Instead of using the rounded up "len", why not try this:
>>>> 
>>>> -               pages = len >> PAGE_SHIFT;
>>>> +               pages = write->wr_buflen >> PAGE_SHIFT;
>>> 
>>> You are right. It needs to be that. Otherwise 8191 fails the same way.
>>> 
>>>> And be sure to test with TCP as well.
>>> 
>>> Sigh. It breaks normal (non-RDMA) mounts. I'll figure out why.
>> 
>> OK.
>> 
>> Remember that a Read chunk's length does not have to be
>> rounded up. Maybe the transport needs to round up the
>> length of the unmarshaled data content on behalf of the
>> NFSv4 write decoder.
>> 
> 
> The problem of simply taking write->wr_buflen was that len before that
> could have been adjusted by avail value in then non-RDAM mounts.
> 
> Again, I'm not sure if this is the right fix. But this one works for
> both non-RDMA and RDMA mounts.
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 2c61c6b..3178997 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1293,7 +1293,10 @@ static __be32 nfsd4_decode_opaque(struct
> nfsd4_compoundargs *argp, struct xdr_ne
> 
>                len -= avail;
> 
> -               pages = len >> PAGE_SHIFT;
> +               if (!avail)
> +                       pages = write->wr_buflen >> PAGE_SHIFT;
> +               else
> +                       pages = len >> PAGE_SHIFT;
>                argp->pagelist += pages;
>                argp->pagelen -= pages * PAGE_SIZE;
>                len -= pages * PAGE_SIZE;

This code has been around since 2012. Has it always been
broken for RDMA? I suspect the root problem is in the
transport, not here in the decoder. Can you bisect to
find out where the problem started to occur?


>>>>> If this looks OK, I can send a patch.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux