On Fri, Oct 13, 2023 at 9:01 AM Daisuke Matsuda (Fujitsu) <matsuda-daisuke@xxxxxxxxxxx> wrote: > > On Friday, October 13, 2023 9:29 PM Zhu Yanjun: > > 在 2023/10/13 20:01, Daisuke Matsuda (Fujitsu) 写道: > > > On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote: > > >> From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> > > >> > > >> The page_size of mr is set in infiniband core originally. In the commit > > >> 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the > > >> page_size is also set. Sometime this will cause conflict. > > > > > > I appreciate your prompt action, but I do not think this commit deals with > > > the root cause. I agree that the problem lies in rxe driver, but what is wrong > > > with assigning actual page size to ibmr.page_size? > > > > Please check the source code. ibmr.page_size is assigned in > > infiniband/core. And then it is assigned in rxe. > > When the 2 are different, the problem will occur. > > In the first place, the two must always be equal. > Is there any situation there are two different page sizes for a MR? > I think I have explained the value assigned in the core layer is wrong > when the PAGE_SIZE is bigger than 4k, and that is why they are inconsistent. > > As I have no environment to test the kernel panic, it remains speculative, > but I have provided enough information for everyone to rebut my argument. > It is possible I am wrong. I hope someone will tell me specifically where > my guess is wrong, or Yi would kindly take the trouble to verify it. I made a quick test. With Zhu's patch, the problem fixed. With your patch, this problem exists. And other problems occur. I do not know why. Will continue to make tests. > > Thanks, > Daisuke Matsuda > > > Please add logs to infiniband/core and rxe to check them. > > > > Zhu Yanjun > > > > > > > > IMO, the problem comes from the device attribute of rxe driver, which is used > > > in ulp/srp layer to calculate the page_size. > > > ===== > > > static int srp_add_one(struct ib_device *device) > > > { > > > struct srp_device *srp_dev; > > > struct ib_device_attr *attr = &device->attrs; > > > <...> > > > /* > > > * Use the smallest page size supported by the HCA, down to a > > > * minimum of 4096 bytes. We're unlikely to build large sglists > > > * out of smaller entries. > > > */ > > > mr_page_shift = max(12, ffs(attr->page_size_cap) - 1); > > > srp_dev->mr_page_size = 1 << mr_page_shift; > > > ===== > > > On initialization of srp driver, mr_page_size is calculated here. > > > Note that the device attribute is used to calculate the value of page shift > > > when the device is trying to use a page size larger than 4096. Since Yi specified > > > CONFIG_ARM64_64K_PAGES, the system naturally met the condition. > > > > > > ===== > > > static int srp_map_finish_fr(struct srp_map_state *state, > > > struct srp_request *req, > > > struct srp_rdma_ch *ch, int sg_nents, > > > unsigned int *sg_offset_p) > > > { > > > struct srp_target_port *target = ch->target; > > > struct srp_device *dev = target->srp_host->srp_dev; > > > <...> > > > n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, sg_offset_p, > > > dev->mr_page_size); > > > ===== > > > After that, mr_page_size is presumably passed to ib_core layer. > > > > > > ===== > > > int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, > > > unsigned int *sg_offset, unsigned int page_size) > > > { > > > if (unlikely(!mr->device->ops.map_mr_sg)) > > > return -EOPNOTSUPP; > > > > > > mr->page_size = page_size; > > > > > > return mr->device->ops.map_mr_sg(mr, sg, sg_nents, sg_offset); > > > } > > > EXPORT_SYMBOL(ib_map_mr_sg); > > > ===== > > > Consequently, the page size calculated in srp driver is set to ibmr.page_size. > > > > > > Coming back to rxe, the device attribute is set here: > > > ===== > > > rxe.c > > > <...> > > > /* initialize rxe device parameters */ > > > static void rxe_init_device_param(struct rxe_dev *rxe) > > > { > > > rxe->max_inline_data = RXE_MAX_INLINE_DATA; > > > > > > rxe->attr.vendor_id = RXE_VENDOR_ID; > > > rxe->attr.max_mr_size = RXE_MAX_MR_SIZE; > > > rxe->attr.page_size_cap = RXE_PAGE_SIZE_CAP; > > > rxe->attr.max_qp = RXE_MAX_QP; > > > --- > > > rxe_param.h > > > <...> > > > /* default/initial rxe device parameter settings */ > > > enum rxe_device_param { > > > RXE_MAX_MR_SIZE = -1ull, > > > RXE_PAGE_SIZE_CAP = 0xfffff000, > > > RXE_MAX_QP_WR = DEFAULT_MAX_VALUE, > > > ===== > > > rxe_init_device_param() sets the attributes to rxe_dev->attr, and it is later copied > > > to ib_device->attrs in setup_device()@core/device.c. > > > See that the page size cap is hardcoded to 4096 bytes. I suspect this led to > > > incorrect page_size being set to ibmr.page_size, resulting in the kernel crash. > > > > > > I think rxe driver is made to be able to handle arbitrary page sizes. > > > Probably, we can just modify RXE_PAGE_SIZE_CAP to fix the issue. > > > What do you guys think? > > > > > > Thanks, > > > Daisuke Matsuda > > > >