Re: v5.14 RXE driver broken?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/25/21 11:22 AM, Bart Van Assche wrote:
On 8/25/21 9:32 AM, Jason Gunthorpe wrote:
On Wed, Aug 25, 2021 at 11:02:14AM +0800, Zhu Yanjun wrote:
On Tue, Aug 24, 2021 at 11:02 AM Bart Van Assche <bvanassche@xxxxxxx> wrote:

Hi Bob,

If I run the following test against Linus' master branch then that test
passes (commit d5ae8d7f85b7 ("Revert "media: dvb header files: move some
headers to staging"")):

# export use_siw=1 && modprobe brd && (cd blktests && ./check -q srp/002)
srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed]
     runtime    ...  48.849s

The following test fails:

# export use_siw= && modprobe brd && (cd blktests && ./check -q srp/002)
srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed]
     runtime  48.849s  ...  15.024s
     +++ /home/bart/software/blktests/results/nodev/srp/002.out.bad      2021-08-23 19:51:05.182958728 -0700
     @@ -1,2 +1 @@
      Configured SRP target driver
     -Passed

Can this commit "RDMA/rxe: Zero out index member of struct rxe_queue"
in the link https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-rc
fix this problem?

And the commit will be merged into linux upstream very soon.

Please let me know Bart, if the rxe driver is still broken I will
definitely punt all the changes for RXE to the next cycle until it can
be fixed.

Hi Jason,

Thanks for having offered to revert the RXE changes from this merge window.
Unfortunately that wouldn't be sufficient. My test results so far for test
srp/002 in combination with the rdma_rxe driver are as follows:
* Kernel v5.12: test passes.
* Kernel v5.13: test fails.
* Kernel v5.14-rc7: test fails.

For the rdma_rxe tests for kernel v5.14-rc7 I found the following in the kernel
log:

ib_srp:add_target_store: ib_srp: max_sectors = 1024; max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr = 4096; mr_per_cmd = 2
ib_srp: enp1s0_rxe: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count

There is sufficient memory available in the VM in which I ran the tests. It is
not clear to me why ib_alloc_mr() fails with these parameters when using the
rdma_rxe driver? As one can see in srp_alloc_fr_pool() the SRP initiator driver
respects the max_pages_per_mr RDMA driver limit.

A correction: test srp/002 passes on my setup against kernel v5.13. I probably
selected the wrong kernel from the GRUB boot menu before I sent my previous email.
So the test failure is something that happens with v5.14-rc but not with v5.13.

Applying the following patch on top Linus' master branch did not help:

diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 742e6ec93686..643b80e47c82 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -88,7 +88,7 @@ enum rxe_device_param {
 	RXE_MIN_SRQ_INDEX		= 0x00020001,
 	RXE_MAX_SRQ_INDEX		= 0x00040000,

-	RXE_MAX_MR			= 0x00001000,
+	RXE_MAX_MR			= 0x00100000,
 	RXE_MAX_MW			= 0x00001000,
 	RXE_MIN_MR_INDEX		= 0x00000001,
 	RXE_MAX_MR_INDEX		= 0x00010000,

Bart.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux