RE: [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Yi,

As I replied in other thread, I believe the issue comes from a device
attribute of rxe driver, which is hardcoded for 4k page systems.
Cf. https://lore.kernel.org/all/OS3PR01MB98651C7454C46841B8A78F11E5D2A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Unfortunately, I have no aarch64 machine available to verify that.
Sorry to trouble you, but could you apply the change below to see
if the issue is resolved with it or not?
=====
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index d2f57ead78ad..dc0f28c264b9 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -38,7 +38,7 @@ static inline enum ib_mtu eth_mtu_int_to_enum(int mtu)
 /* default/initial rxe device parameter settings */
 enum rxe_device_param {
        RXE_MAX_MR_SIZE                 = -1ull,
-       RXE_PAGE_SIZE_CAP               = 0xfffff000,
+       RXE_PAGE_SIZE_CAP               = 0xffffffff - (PAGE_SIZE - 1),
        RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
        RXE_DEVICE_CAP_FLAGS            = IB_DEVICE_BAD_PKEY_CNTR

=====

Regards,
Daisuke Matsuda

On Wed, Oct 11, 2023 9:33 AM Yi Zhang wrote:
> On Tue, Oct 10, 2023 at 9:37 PM Zhu Yanjun <yanjun.zhu@xxxxxxxxx> wrote:
> >
> >
> > 在 2023/10/10 19:35, Jason Gunthorpe 写道:
> > > On Tue, Oct 10, 2023 at 06:41:17PM +0800, Zhu Yanjun wrote:
> > >> 在 2023/10/9 12:35, Yi Zhang 写道:
> > >>> Hello
> > >>>
> > >>> blktests srp lead kernel panic[2] on aarch64 when the kernel enabled
> > >>> CONFIG_ARM64_64K_PAGES, bisect shows it was introduced from commit[1],
> > >>> pls help check it and let me know if you need any info/testing for it, thanks.
> > >>>
> > >>> [1]
> > >>> commit 325a7eb85199ec9c5b5a7af812f43ea16b735569
> > >>> Author: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> > >>> Date:   Thu Jan 19 17:59:36 2023 -0600
> > >>>
> > >>>       RDMA/rxe: Cleanup page variables in rxe_mr.c
> > >>>
> > >>>       Cleanup usage of mr->page_shift and mr->page_mask and introduce
> > >>>       an extractor for mr->ibmr.page_size. Normal usage in the kernel
> > >>>       has page_mask masking out offset in page rather than masking out
> > >>>       the page number. The rxe driver had reversed that which was confusing.
> > >>>       Implicitly there can be a per mr page_size which was not uniformly
> > >>>       supported.
> > >>>
> > >>>       Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@xxxxxxxxx
> > >>>       Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
> > >>>       Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > >>>
> > >> Hi, Yi
> > >>
> > >> I delved into the commit. And the commit can not be reverted cleanly. So I
> > >> made the following diff to try to revert this commit. After this commit is
> > >> applied, rping can work well.
> 
> Hi Yanjun
> 
> With the change, the blktests srp works now.
> 
> > > We can't keep reverting things for what are probably small bugs. Fix
> > > the issues please!
> >
> >
> > This is not an official commit. Because the reporter mentioned that the
> > commit causes this problem,
> >
> > we just confirmed that. If we confirmed that this commit is the root
> > cause, we will analyze this commit,
> >
> > then fix it.
> >
> > Zhu Yanjun
> >
> >
> > >
> > > Jason
> >
> 
> --
> Best Regards,
>   Yi Zhang





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux