----- Original Message ----- > From: "Max Gurtovoy" <maxg@xxxxxxxxxxxx> > To: "Laurence Oberman" <loberman@xxxxxxxxxx> > Cc: "Leon Romanovsky" <leon@xxxxxxxxxx>, "Bart Van Assche" <Bart.VanAssche@xxxxxxxxxxx>, hch@xxxxxx, > israelr@xxxxxxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, dledford@xxxxxxxxxx > Sent: Tuesday, February 14, 2017 12:15:20 PM > Subject: Re: v4.10-rc SRP + mlx5 regression > > > > On 2/14/2017 3:31 PM, Laurence Oberman wrote: > > > > > > ----- Original Message ----- > >> From: "Max Gurtovoy" <maxg@xxxxxxxxxxxx> > >> To: "Leon Romanovsky" <leon@xxxxxxxxxx>, "Laurence Oberman" > >> <loberman@xxxxxxxxxx> > >> Cc: "Bart Van Assche" <Bart.VanAssche@xxxxxxxxxxx>, hch@xxxxxx, > >> israelr@xxxxxxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, > >> dledford@xxxxxxxxxx > >> Sent: Tuesday, February 14, 2017 5:00:04 AM > >> Subject: Re: v4.10-rc SRP + mlx5 regression > >> > >> Hi Laurence, > >> can you specify the test that repro these failures ? > >> have you tried running with CX5 HCA or only CX4 ? > >> I think this commit is right and we have issues in other places. > >> > >> > >> On 2/14/2017 8:39 AM, Leon Romanovsky wrote: > >>> On Mon, Feb 13, 2017 at 09:19:54PM -0500, Laurence Oberman wrote: > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> From: "Bart Van Assche" <Bart.VanAssche@xxxxxxxxxxx> > >>>>> To: leon@xxxxxxxxxx, loberman@xxxxxxxxxx > >>>>> Cc: hch@xxxxxx, maxg@xxxxxxxxxxxx, israelr@xxxxxxxxxxxx, > >>>>> linux-rdma@xxxxxxxxxxxxxxx, dledford@xxxxxxxxxx > >>>>> Sent: Monday, February 13, 2017 4:52:28 PM > >>>>> Subject: Re: v4.10-rc SRP + mlx5 regression > >>>>> > >>>>> On Mon, 2017-02-13 at 16:46 -0500, Laurence Oberman wrote: > >>>>>> I will have to run through this again and see where the bisect went > >>>>>> wrong. > >>>>> > >>>>> Hello Laurence, > >>>>> > >>>>> If you would be considering to repeat the bisect, did you know that a > >>>>> bisect > >>>>> can be sped up by specifying the names of the files and/or directories > >>>>> that > >>>>> are suspected? An example: > >>>>> > >>>>> git bisect start */infiniband */net > >>>>> > >>>>> Bart.-- > >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >>>>> in > >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >>>> > >>>> Hello Bart, > >>>> > >>>> Much better news this time :), worked late on this but got it figured > >>>> out. > >>>> > >>>> OK, so we got to this one, which makes a lot more sense and is right in > >>>> the area where we are having issues. > >>>> I must have answered wrong to one of the steps the first time I did the > >>>> bisect. > >>>> > >>>> Reverted this in the master tree of rc8 and rebuilt the kernel > >>>> Now all tests pass on Linus's tree - 4.10.0_rc8+ > >>>> > >>>> The interesting point here is that this commit is in rc5 but rc5 was not > >>>> failing so we have an interoperability issue with this commit > >>>> > >>>> > >>>> [loberman@ibclient linux]$ git bisect good > >>>> Bisecting: 0 revisions left to test after this (roughly 1 step) > >>>> [ad8e66b4a80182174f73487ed25fd2140cf43361] IB/srp: fix mr allocation > >>>> when > >>>> the device supports sg gaps > >>>> > >>>> [loberman@ibclient linux]$ git show > >>>> ad8e66b4a80182174f73487ed25fd2140cf43361 > >>>> commit ad8e66b4a80182174f73487ed25fd2140cf43361 > >>>> Author: Israel Rukshin <israelr@xxxxxxxxxxxx> > >>>> Date: Wed Dec 28 12:48:28 2016 +0200 > >>>> > >>>> IB/srp: fix mr allocation when the device supports sg gaps > >>>> > >>>> If the device support arbitrary sg list mapping (device cap > >>>> IB_DEVICE_SG_GAPS_REG set) we allocate the memory regions with > >>>> IB_MR_TYPE_SG_GAPS. > >>>> > >>>> Fixes: 509c5f33f4f6 ("IB/srp: Prevent mapping failures") > >>>> Cc: <stable@xxxxxxxxxxxxxxx> # 4.7+ > >>>> Signed-off-by: Israel Rukshin <israelr@xxxxxxxxxxxx> > >>>> Signed-off-by: Max Gurtovoy <maxg@xxxxxxxxxxxx> > >>>> Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > >>>> Reviewed-by: Mark Bloch <markb@xxxxxxxxxxxx> > >>>> Reviewed-by: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > >>>> Reviewed-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > >>>> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> > >>>> > >>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > >>>> b/drivers/infiniband/ulp/srp/ib_srp.c > >>>> index 8ddc071..0f67cf9 100644 > >>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c > >>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c > >>>> @@ -371,6 +371,7 @@ static struct srp_fr_pool *srp_create_fr_pool(struct > >>>> ib_device *device, > >>>> struct srp_fr_desc *d; > >>>> struct ib_mr *mr; > >>>> int i, ret = -EINVAL; > >>>> + enum ib_mr_type mr_type; > >>>> > >>>> if (pool_size <= 0) > >>>> goto err; > >>>> @@ -384,9 +385,13 @@ static struct srp_fr_pool > >>>> *srp_create_fr_pool(struct > >>>> ib_device *device, > >>>> spin_lock_init(&pool->lock); > >>>> INIT_LIST_HEAD(&pool->free_list); > >>>> > >>>> + if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG) > >>>> + mr_type = IB_MR_TYPE_SG_GAPS; > >>>> + else > >>>> + mr_type = IB_MR_TYPE_MEM_REG; > >>>> + > >>>> for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) { > >>>> - mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, > >>>> - max_page_list_len); > >>>> + mr = ib_alloc_mr(pd, mr_type, max_page_list_len); > >>> > >>> First, ib_alloc_mr receives u32 as a third parameter, but int was > >>> supplied. Second (I can be wrong here), shouldn't max_page_list_len be > >>> replaced with max_fast_reg_page_list_len? > >>> > >>> Thanks > >> > >> there is a statement that: > >> > >> if (srp_dev->use_fast_reg) { > >> srp_dev->max_pages_per_mr = > >> min_t(u32, srp_dev->max_pages_per_mr, > >> attr->max_fast_reg_page_list_len); > >> } > >> > >> so we take the max_fast_reg_page_list_len in this case. > >> > >>> > >>>> if (IS_ERR(mr)) { > >>>> ret = PTR_ERR(mr); > >>>> if (ret == -ENOMEM) > >>>> (END) > >>>> > >>>> > >>>> So here is the revert patch, but you need to decide how you want to deal > >>>> with this. > >>>> > >>>> Revert "IB/srp: fix mr allocation when the device supports sg gaps" > >>>> Laurence Oberman > >>>> Traced after bisection to a cause for this failure > >>>> > >>>> Tested-by: Laurence Oberman <loberman@xxxxxxxxxx> > >>>> Signed-off-by: Laurence Oberman <loberman@xxxxxxxxxx> > >>>> > >>>> commit 90d169d312a173d5350c1bb36d6daab04c592127 > >>>> Author: Laurence Oberman <loberman@xxxxxxxxxx> > >>>> Date: Mon Feb 13 20:33:32 2017 -0500 > >>>> > >>>> Revert "IB/srp: fix mr allocation when the device supports sg gaps" > >>>> Laurence Oberman > >>>> Traced after bisection to a cause for this failure > >>>> > >>>> [ 130.437603] mlx5_0:dump_cqe:262:(pid 3812): dump error cqe > >>>> [ 130.437682] scsi host1: ib_srp: failed RECV status WR flushed (5) > >>>> for CQE ffff8817f0edbfb0 > >>>> [ 130.510899] 00000000 00000000 00000000 00000000 > >>>> [ 130.536455] 00000000 00000000 00000000 00000000 > >>>> [ 130.561878] 00000000 00000000 00000000 00000000 > >>>> [ 130.585904] 00000000 0f007806 2500002a db0ec4d0 > >>>> [ 145.842925] fast_io_fail_tmo expired for SRP port-1:1 / host1. > >>>> [ 146.530439] scsi host1: ib_srp: reconnect succeeded > >>>> [ 146.566629] mlx5_0:dump_cqe:262:(pid 3293): dump error cqe > >>>> [ 146.597635] 00000000 00000000 00000000 00000000 > >>>> [ 146.623545] 00000000 00000000 00000000 00000000 > >>>> [ 146.649599] 00000000 00000000 00000000 00000000 > >>>> [ 146.673938] 00000000 0f007806 25000032 000c46d0 > >>>> [ 146.697969] scsi host1: ib_srp: failed FAST REG status memory > >>>> management operation error (6) for CQE ffff88 > >>>> [ 162.225247] fast_io_fail_tmo expired for SRP port-1:1 / host1. > >>>> [ 162.256337] scsi host1: ib_srp: reconnect succeeded > >>>> [ 162.293396] scsi host1: ib_srp: failed RECV status WR flushed (5) > >>>> for CQE ffff8817f0412ef0` > >>>> > >>>> This reverts commit ad8e66b4a80182174f73487ed25fd2140cf43361. > >>>> > >>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c > >>>> b/drivers/infiniband/ulp/srp/ib_srp.c > >>>> index 79bf484..01338c8 100644 > >>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c > >>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c > >>>> @@ -371,7 +371,6 @@ static struct srp_fr_pool *srp_create_fr_pool(struct > >>>> ib_device *device, > >>>> struct srp_fr_desc *d; > >>>> struct ib_mr *mr; > >>>> int i, ret = -EINVAL; > >>>> - enum ib_mr_type mr_type; > >>>> > >>>> if (pool_size <= 0) > >>>> goto err; > >>>> @@ -385,13 +384,9 @@ static struct srp_fr_pool > >>>> *srp_create_fr_pool(struct > >>>> ib_device *device, > >>>> spin_lock_init(&pool->lock); > >>>> INIT_LIST_HEAD(&pool->free_list); > >>>> > >>>> - if (device->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG) > >>>> - mr_type = IB_MR_TYPE_SG_GAPS; > >>>> - else > >>>> - mr_type = IB_MR_TYPE_MEM_REG; > >>>> - > >>>> for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) { > >>>> - mr = ib_alloc_mr(pd, mr_type, max_page_list_len); > >>>> + mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, > >>>> + max_page_list_len); > >>>> if (IS_ERR(mr)) { > >>>> ret = PTR_ERR(mr); > >>>> if (ret == -ENOMEM) > >>>> > >>>> > >>>> > >>>> Now moving on to what got me here in the first place. > >>>> Bart, let me know if the 7 of the 8 patches in your most recent series > >>>> are > >>>> all still valid after this revert > >>>> Otherwise let me know which ones you want me to apply. > >>>> > >>>> patch 6 - I am thinking i sno longer valid. > >>>> " > >>>> If a HCA supports the SG_GAPS_REG feature then a single memory > >>>> region of type IB_MR_TYPE_SG_GAPS is sufficient. This patch > >>>> reduces the number of memory regions that is allocated per SRP > >>>> session. > >>>> " > >>>> > >>>> Thanks > >>>> Laurence > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > Hello Max, > > > > I only have CX4 and CX3 in my lab, this test bed only has CX4. > > > > CA 'mlx5_0' > > CA type: MT4115 > > Number of ports: 1 > > Firmware version: 12.14.2036 > > Hardware version: 0 > > Node GUID: 0x7cfe900300726ed2 > > System image GUID: 0x7cfe900300726ed2 > > Port 1: > > State: Active > > Physical state: LinkUp > > Rate: 100 > > Base lid: 3 > > LMC: 0 > > SM lid: 3 > > Capability mask: 0x2651e84a > > Port GUID: 0x7cfe900300726ed2 > > Link layer: InfiniBand > > > > The test is simple, it's the same one I start with every time because it > > always > > brings out issues with mapping for large I/O sizes and mem registration if > > such issues exist. > > > > I have a server running LIO with memory backed LUNS. > > These are served via a dual port mlx5 (CX4) over ib_srpt > > > > The client mounts these LUNS via ib_srp (mlx5) and device-mapper-multipath > > and I run a simple dd on the XFS file system. > > > > #!/bin/bash > > while true > > do > > dd if=/dev/zero of=/data-$1/bigfile bs=4096k count=900 > > sync; > > rm -rf /data-$1/bigfile > > done > > > > Once this passes I run a suite of other tests read/write, direct and > > buffered. > > Laurence, > this is 4MB transactions. can you increase the cmd_sg_entries to the > maximum and run the test again ? > > > > > > Thanks > > Laurence > > > Hello Max, Yes 4MB is very important for one of our biggest RHEL customers and I worked many hours with Bart last year to stabilize large 4MB buffered and direct I/O for ib_srp/ib_srpt. I am already running with: options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048 Regards and thanks for your assistance Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html