On 8/25/21 3:58 PM, Bart Van Assche wrote: > On 8/25/21 11:22 AM, Bart Van Assche wrote: >> On 8/25/21 9:32 AM, Jason Gunthorpe wrote: >>> On Wed, Aug 25, 2021 at 11:02:14AM +0800, Zhu Yanjun wrote: >>>> On Tue, Aug 24, 2021 at 11:02 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: >>>>> >>>>> Hi Bob, >>>>> >>>>> If I run the following test against Linus' master branch then that test >>>>> passes (commit d5ae8d7f85b7 ("Revert "media: dvb header files: move some >>>>> headers to staging"")): >>>>> >>>>> # export use_siw=1 && modprobe brd && (cd blktests && ./check -q srp/002) >>>>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed] >>>>> runtime ... 48.849s >>>>> >>>>> The following test fails: >>>>> >>>>> # export use_siw= && modprobe brd && (cd blktests && ./check -q srp/002) >>>>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed] >>>>> runtime 48.849s ... 15.024s >>>>> +++ /home/bart/software/blktests/results/nodev/srp/002.out.bad 2021-08-23 19:51:05.182958728 -0700 >>>>> @@ -1,2 +1 @@ >>>>> Configured SRP target driver >>>>> -Passed >>>> >>>> Can this commit "RDMA/rxe: Zero out index member of struct rxe_queue" >>>> in the link https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-rc >>>> fix this problem? >>>> >>>> And the commit will be merged into linux upstream very soon. >>> >>> Please let me know Bart, if the rxe driver is still broken I will >>> definitely punt all the changes for RXE to the next cycle until it can >>> be fixed. >> >> Hi Jason, >> >> Thanks for having offered to revert the RXE changes from this merge window. >> Unfortunately that wouldn't be sufficient. My test results so far for test >> srp/002 in combination with the rdma_rxe driver are as follows: >> * Kernel v5.12: test passes. >> * Kernel v5.13: test fails. >> * Kernel v5.14-rc7: test fails. >> >> For the rdma_rxe tests for kernel v5.14-rc7 I found the following in the kernel >> log: >> >> ib_srp:add_target_store: ib_srp: max_sectors = 1024; max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr = 4096; mr_per_cmd = 2 >> ib_srp: enp1s0_rxe: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count >> >> There is sufficient memory available in the VM in which I ran the tests. It is >> not clear to me why ib_alloc_mr() fails with these parameters when using the >> rdma_rxe driver? As one can see in srp_alloc_fr_pool() the SRP initiator driver >> respects the max_pages_per_mr RDMA driver limit. > > A correction: test srp/002 passes on my setup against kernel v5.13. I probably > selected the wrong kernel from the GRUB boot menu before I sent my previous email. > So the test failure is something that happens with v5.14-rc but not with v5.13. > > Applying the following patch on top Linus' master branch did not help: > > diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h > index 742e6ec93686..643b80e47c82 100644 > --- a/drivers/infiniband/sw/rxe/rxe_param.h > +++ b/drivers/infiniband/sw/rxe/rxe_param.h > @@ -88,7 +88,7 @@ enum rxe_device_param { > RXE_MIN_SRQ_INDEX = 0x00020001, > RXE_MAX_SRQ_INDEX = 0x00040000, > > - RXE_MAX_MR = 0x00001000, > + RXE_MAX_MR = 0x00100000, > RXE_MAX_MW = 0x00001000, > RXE_MIN_MR_INDEX = 0x00000001, > RXE_MAX_MR_INDEX = 0x00010000, > > Bart. Bart, Are you seeing the ib_alloc_mr() failure in 5.14? I thought that was just a 5.13 thing. I am still not seeing that error in my test setup. I am getting a soft lockup error after ~20 seconds. During most of that there is a constant exchange of req/ack packets with nothing else happening. If you want I can send you a patch to print out error messages from MR allocation. Bob