Re: [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2023/10/16 14:07, Daisuke Matsuda (Fujitsu) 写道:
On Fri, Oct 13, 2023 10:44 PM Rain River <rain.1986.08.12@xxxxxxxxx> wrote:
On Friday, October 13, 2023 9:29 PM Zhu Yanjun:
在 2023/10/13 20:01, Daisuke Matsuda (Fujitsu) 写道:
On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote:
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>

The page_size of mr is set in infiniband core originally. In the commit
325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the
page_size is also set. Sometime this will cause conflict.

I appreciate your prompt action, but I do not think this commit deals with
the root cause. I agree that the problem lies in rxe driver, but what is wrong
with assigning actual page size to ibmr.page_size?

Please check the source code. ibmr.page_size is assigned in
infiniband/core. And then it is assigned in rxe.
When the 2 are different, the problem will occur.

In the first place, the two must always be equal.
Is there any situation there are two different page sizes for a MR?
I think I have explained the value assigned in the core layer is wrong
when the PAGE_SIZE is bigger than 4k, and that is why they are inconsistent.

As I have no environment to test the kernel panic, it remains speculative,
but I have provided enough information for everyone to rebut my argument.
It is possible I am wrong. I hope someone will tell me specifically where
my guess is wrong, or Yi would kindly take the trouble to verify it.

I made a quick test.

With Zhu's patch, the problem fixed.
With your patch, this problem exists. And other problems occur. I do
not know why.
Will continue to make tests.

Thank you for your time and work.
I will try to find out why there were two different page sizes from
different perspectives. This may take a while because I am busy
with other projects now. If anybody need the driver without the crash
issue right now, I do not object to partially undoing "RDMA/rxe: Cleanup
page variables in rxe_mr.c" as Zhu suggests, but that should be interim
and not a final solution.

Sorry, it is late to reply.

This is a difficult and complicated problem because it involves rdma and block/blktests. This commit is based on the fact that the commit 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c") causes this problem.

So I delved into this commit. In this commit, the core part is to change the size of ibmr.page_size. From this, I checked the source code. I found that this page_size is also set in infiniband/core. So I try to remove the change of ibmr.page_size.

I will continue to delve into this problem and the source code to find the root cause that you also agree with.

Zhu Yanjun

Thanks,
Daisuke



Thanks,
Daisuke Matsuda

Please add logs to infiniband/core and rxe to check them.

Zhu Yanjun


IMO, the problem comes from the device attribute of rxe driver, which is used
in ulp/srp layer to calculate the page_size.
=====
static int srp_add_one(struct ib_device *device)
{
          struct srp_device *srp_dev;
          struct ib_device_attr *attr = &device->attrs;
<...>
          /*
           * Use the smallest page size supported by the HCA, down to a
           * minimum of 4096 bytes. We're unlikely to build large sglists
           * out of smaller entries.
           */
          mr_page_shift           = max(12, ffs(attr->page_size_cap) - 1);
          srp_dev->mr_page_size   = 1 << mr_page_shift;
=====
On initialization of srp driver, mr_page_size is calculated here.
Note that the device attribute is used to calculate the value of page shift
when the device is trying to use a page size larger than 4096. Since Yi specified
CONFIG_ARM64_64K_PAGES, the system naturally met the condition.

=====
static int srp_map_finish_fr(struct srp_map_state *state,
                               struct srp_request *req,
                               struct srp_rdma_ch *ch, int sg_nents,
                               unsigned int *sg_offset_p)
{
          struct srp_target_port *target = ch->target;
          struct srp_device *dev = target->srp_host->srp_dev;
<...>
          n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, sg_offset_p,
                           dev->mr_page_size);
=====
After that, mr_page_size is presumably passed to ib_core layer.

=====
int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
                   unsigned int *sg_offset, unsigned int page_size)
{
          if (unlikely(!mr->device->ops.map_mr_sg))
                  return -EOPNOTSUPP;

          mr->page_size = page_size;

          return mr->device->ops.map_mr_sg(mr, sg, sg_nents, sg_offset);
}
EXPORT_SYMBOL(ib_map_mr_sg);
=====
Consequently, the page size calculated in srp driver is set to ibmr.page_size.

Coming back to rxe, the device attribute is set here:
=====
rxe.c
<...>
/* initialize rxe device parameters */
static void rxe_init_device_param(struct rxe_dev *rxe)
{
          rxe->max_inline_data                    = RXE_MAX_INLINE_DATA;

          rxe->attr.vendor_id                     = RXE_VENDOR_ID;
          rxe->attr.max_mr_size                   = RXE_MAX_MR_SIZE;
          rxe->attr.page_size_cap                 = RXE_PAGE_SIZE_CAP;
          rxe->attr.max_qp                        = RXE_MAX_QP;
---
rxe_param.h
<...>
/* default/initial rxe device parameter settings */
enum rxe_device_param {
          RXE_MAX_MR_SIZE                 = -1ull,
          RXE_PAGE_SIZE_CAP               = 0xfffff000,
          RXE_MAX_QP_WR                   = DEFAULT_MAX_VALUE,
=====
rxe_init_device_param() sets the attributes to rxe_dev->attr, and it is later copied
to ib_device->attrs in setup_device()@core/device.c.
See that the page size cap is hardcoded to 4096 bytes. I suspect this led to
incorrect page_size being set to ibmr.page_size, resulting in the kernel crash.

I think rxe driver is made to be able to handle arbitrary page sizes.
Probably, we can just modify RXE_PAGE_SIZE_CAP to fix the issue.
What do you guys think?

Thanks,
Daisuke Matsuda






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux