Actually, this was a co-incidence, which means that buf points to more or less a random place. Tigran. ----- Original Message ----- > From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> > To: "trondmy" <trondmy@xxxxxxxxxxxxxxx> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>, "Anna Schumaker" <anna.schumaker@xxxxxxxxxx> > Sent: Tuesday, 1 December, 2020 11:59:22 > Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS file+flexfiles data channels > More investigation... > > After xdr_read_pages call the xdr stream points to the > beginning of RPC package and return the xid value on > the next read (which should be the size of the notification bitmap). > > > Regards, > Tigran. > > ----- Original Message ----- >> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >> To: "trondmy" <trondmy@xxxxxxxxxxxxxxx> >> Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>, "Anna Schumaker" >> <anna.schumaker@xxxxxxxxxx> >> Sent: Thursday, 26 November, 2020 18:17:41 >> Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS file+flexfiles data >> channels > >> I have added some debug info Ind got this: >> >> decode_getdeviceinfo: layout type 4 >> decode_getdeviceinfo: layout size 64 >> decode_getdeviceinfo: layout notification bitmap size 1094994757 >> >> So it looks like that xdr_read_pages set xdr pointer to a wrong position >> and next read, which is bitmap size >> >> 5857 pdev->mincount = be32_to_cpup(p); >> 5858 dprintk("%s: layout size %u\n", __func__, pdev->mincount); >> 5859 if (xdr_read_pages(xdr, pdev->mincount) != pdev->mincount) >> 5860 return -EIO; >> 5861 >> 5862 /* Parse notification bitmap, verifying that it is zero. */ >> 5863 p = xdr_inline_decode(xdr, 4); >> 5864 if (unlikely(!p)) >> 5865 return -EIO; >> 5866 len = be32_to_cpup(p); >> 5867 dprintk("%s: layout notification bitmap size %u\n", __func__, len); >> 5868 if (len) { >> 5869 uint32_t i; >> >> >> Tigran. >> >> >> ----- Original Message ----- >>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>> To: "trondmy" <trondmy@xxxxxxxxxxxxxxx> >>> Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx>, "Anna Schumaker" >>> <anna.schumaker@xxxxxxxxxx> >>> Sent: Tuesday, 17 November, 2020 15:50:57 >>> Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS file+flexfiles data >>> channels >> >>> Here is the result: >>> >>> >>> $ git bisect bad >>> c567552612ece787b178e3b147b5854ad422a836 is the first bad commit >>> commit c567552612ece787b178e3b147b5854ad422a836 >>> Author: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> >>> Date: Wed May 28 13:41:22 2014 -0400 >>> >>> NFS: Add READ_PLUS data segment support >>> >>> This patch adds client support for decoding a single NFS4_CONTENT_DATA >>> segment returned by the server. This is the simplest implementation >>> possible, since it does not account for any hole segments in the reply. >>> >>> Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> >>> >>> fs/nfs/nfs42xdr.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++ >>> fs/nfs/nfs4client.c | 2 + >>> fs/nfs/nfs4proc.c | 43 +++++++++++++- >>> fs/nfs/nfs4xdr.c | 1 + >>> include/linux/nfs4.h | 2 +- >>> include/linux/nfs_fs_sb.h | 1 + >>> include/linux/nfs_xdr.h | 2 +- >>> 7 files changed, 187 insertions(+), 5 deletions(-) >>> >>> >>> Regards, >>> Tigran. >>> >>> >>> >>> ----- Original Message ----- >>>> From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>>> To: "trondmy" <trondmy@xxxxxxxxxxxxxxx> >>>> Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> >>>> Sent: Monday, 16 November, 2020 21:55:50 >>>> Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS file+flexfiles data >>>> channels >>> >>>> Hi Trond, >>>> >>>> I am afraid, that the fix didn't work. I bisecting it.... >>>> >>>> >>>> Tigran. >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "trondmy" <trondmy@xxxxxxxxxxxxxxx> >>>>> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>>>> Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> >>>>> Sent: Saturday, 14 November, 2020 15:29:01 >>>>> Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS file+flexfiles data >>>>> channels >>>> >>>>> On Sat, 2020-11-14 at 00:46 +0100, Mkrtchyan, Tigran wrote: >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>> > From: "trondmy" <trondmy@xxxxxxxxxxxxxxx> >>>>>> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> >>>>>> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> >>>>>> > Sent: Friday, 13 November, 2020 23:45:00 >>>>>> > Subject: Re: [PATCH v3 00/11] Add RDMA support to the pNFS >>>>>> > file+flexfiles data channels >>>>>> >>>>>> > On Fri, 2020-11-13 at 22:30 +0100, Mkrtchyan, Tigran wrote: >>>>>> > > >>>>>> > > After more testing, it looks like that client doesn't like >>>>>> > > notification bitmap: >>>>>> > > >>>>>> > > >>>>>> > > [31576.789492] --> _nfs4_proc_getdeviceinfo >>>>>> > > [31576.789503] --> nfs41_call_sync_prepare data->seq_server >>>>>> > > 000000001d17c43e >>>>>> > > [31576.789507] --> nfs4_alloc_slot used_slots=0000 >>>>>> > > highest_used=4294967295 max_slots=16 >>>>>> > > [31576.789510] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 >>>>>> > > slotid=0 >>>>>> > > [31576.789527] encode_sequence: >>>>>> > > sessionid=2910695007:150995712:0:16777216 seqid=92462 slotid=0 >>>>>> > > max_slotid=0 cache_this=0 >>>>>> > > [31576.789991] decode_getdeviceinfo: unsupported notification >>>>>> > >>>>>> > According to this, you appear to be returning a deviceinfo bitmap >>>>>> > with >>>>>> > at least one non-zero entry that is not in the first 32-bit word. >>>>>> > We >>>>>> > only ask for notifications for NOTIFY_DEVICEID4_CHANGE and >>>>>> > NOTIFY_DEVICEID4_DELETE, so we only expect bitmap[0] to have non- >>>>>> > zero >>>>>> > entries. >>>>>> >>>>>> >>>>>> according to packet capture only bitmap[0] has non zero bits set. >>>>>> This is the reply of compound starting from nfs staus code, tag >>>>>> length and so on. >>>>>> >>>>>> >>>>>> 0000 00 00 00 00 00 00 00 00 00 00 00 02 00 00 00 35 >>>>>> 0010 00 00 00 00 5f ae 7d ad 00 03 00 09 00 00 00 00 >>>>>> 0020 00 00 00 01 00 00 00 4c 00 00 00 00 00 00 00 0f >>>>>> 0030 00 00 00 0f 00 00 00 00 00 00 00 2f 00 00 00 00 >>>>>> 0040 00 00 00 04 00 00 00 40 00 00 00 01 00 00 00 03 >>>>>> 0050 74 63 70 00 00 00 00 16 31 33 31 2e 31 36 39 2e >>>>>> 0060 31 39 31 2e 31 34 33 2e 31 32 35 2e 34 39 00 00 >>>>>> 0070 00 00 00 01 00 00 00 04 00 00 00 01 00 10 00 00 >>>>>> 0080 00 10 00 00 00 00 00 01 00 00 00 02 00 00 00 06 >>>>>> 0090 00 00 00 00 >>>>>> >>>>>> >>>>>> the last 12 bytes : bitmap size, bitmap[0], bitmap[1] >>>>>> >>>>>> >>>>>> This part of code in the didn't change since 2010, and I >>>>>> have no issues to use 5.8 kernel. I am pretty sure, that >>>>>> tests with 5.9 did pass as expected. I will try to bisec it. >>>>> >>>>> I don't think I've introduced this bug. I did not touch anything in the >>>>> getdeviceinfo proc or XDR code. >>>>> Does the following patch help? >>>>> >>>>> 8<------------------------------------------------------- >>>>> From e92b2d4e39e91d379ec1147115820ab5dfe4c89a Mon Sep 17 00:00:00 2001 >>>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>> Date: Fri, 13 Nov 2020 21:42:16 -0500 >>>>> Subject: [PATCH] NFSv4: Fix the alignment of page data in the getdeviceinfo >>>>> reply >>>>> >>>>> We can fit the device_addr4 opaque data padding in the pages. >>>>> >>>>> Fixes: cf500bac8fd4 ("SUNRPC: Introduce rpc_prepare_reply_pages()") >>>>> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >>>>> --- >>>>> fs/nfs/nfs4xdr.c | 14 ++++++++++---- >>>>> 1 file changed, 10 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c >>>>> index c6dbfcae7517..c8714381d511 100644 >>>>> --- a/fs/nfs/nfs4xdr.c >>>>> +++ b/fs/nfs/nfs4xdr.c >>>>> @@ -3009,15 +3009,19 @@ static void nfs4_xdr_enc_getdeviceinfo(struct rpc_rqst >>>>> *req, >>>>> struct compound_hdr hdr = { >>>>> .minorversion = nfs4_xdr_minorversion(&args->seq_args), >>>>> }; >>>>> + uint32_t replen; >>>>> >>>>> encode_compound_hdr(xdr, req, &hdr); >>>>> encode_sequence(xdr, &args->seq_args, &hdr); >>>>> + >>>>> + replen = hdr.replen + op_decode_hdr_maxsz; >>>>> + >>>>> encode_getdeviceinfo(xdr, args, &hdr); >>>>> >>>>> - /* set up reply kvec. Subtract notification bitmap max size (2) >>>>> - * so that notification bitmap is put in xdr_buf tail */ >>>>> + /* set up reply kvec. device_addr4 opaque data is read into the >>>>> + * pages */ >>>>> rpc_prepare_reply_pages(req, args->pdev->pages, args->pdev->pgbase, >>>>> - args->pdev->pglen, hdr.replen - 2); >>>>> + args->pdev->pglen, replen + 2); >>>>> encode_nops(&hdr); >>>>> } >>>>> >>>>> @@ -5848,7 +5852,9 @@ static int decode_getdeviceinfo(struct xdr_stream *xdr, >>>>> * and places the remaining xdr data in xdr_buf->tail >>>>> */ >>>>> pdev->mincount = be32_to_cpup(p); >>>>> - if (xdr_read_pages(xdr, pdev->mincount) != pdev->mincount) >>>>> + /* Calculate padding */ >>>>> + len = xdr_align_size(pdev->mincount); >>>>> + if (xdr_read_pages(xdr, len) != len) >>>>> return -EIO; >>>>> >>>>> /* Parse notification bitmap, verifying that it is zero. */ >>>>> -- >>>>> 2.28.0 >>>>> >>>>> >>>>> >>>>> -- >>>>> Trond Myklebust >>>>> Linux NFS client maintainer, Hammerspace > > > > > trond.myklebust@xxxxxxxxxxxxxxx