On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote: > From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> > > The page_size of mr is set in infiniband core originally. In the commit > 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the > page_size is also set. Sometime this will cause conflict. I appreciate your prompt action, but I do not think this commit deals with the root cause. I agree that the problem lies in rxe driver, but what is wrong with assigning actual page size to ibmr.page_size? IMO, the problem comes from the device attribute of rxe driver, which is used in ulp/srp layer to calculate the page_size. ===== static int srp_add_one(struct ib_device *device) { struct srp_device *srp_dev; struct ib_device_attr *attr = &device->attrs; <...> /* * Use the smallest page size supported by the HCA, down to a * minimum of 4096 bytes. We're unlikely to build large sglists * out of smaller entries. */ mr_page_shift = max(12, ffs(attr->page_size_cap) - 1); srp_dev->mr_page_size = 1 << mr_page_shift; ===== On initialization of srp driver, mr_page_size is calculated here. Note that the device attribute is used to calculate the value of page shift when the device is trying to use a page size larger than 4096. Since Yi specified CONFIG_ARM64_64K_PAGES, the system naturally met the condition. ===== static int srp_map_finish_fr(struct srp_map_state *state, struct srp_request *req, struct srp_rdma_ch *ch, int sg_nents, unsigned int *sg_offset_p) { struct srp_target_port *target = ch->target; struct srp_device *dev = target->srp_host->srp_dev; <...> n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, sg_offset_p, dev->mr_page_size); ===== After that, mr_page_size is presumably passed to ib_core layer. ===== int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents, unsigned int *sg_offset, unsigned int page_size) { if (unlikely(!mr->device->ops.map_mr_sg)) return -EOPNOTSUPP; mr->page_size = page_size; return mr->device->ops.map_mr_sg(mr, sg, sg_nents, sg_offset); } EXPORT_SYMBOL(ib_map_mr_sg); ===== Consequently, the page size calculated in srp driver is set to ibmr.page_size. Coming back to rxe, the device attribute is set here: ===== rxe.c <...> /* initialize rxe device parameters */ static void rxe_init_device_param(struct rxe_dev *rxe) { rxe->max_inline_data = RXE_MAX_INLINE_DATA; rxe->attr.vendor_id = RXE_VENDOR_ID; rxe->attr.max_mr_size = RXE_MAX_MR_SIZE; rxe->attr.page_size_cap = RXE_PAGE_SIZE_CAP; rxe->attr.max_qp = RXE_MAX_QP; --- rxe_param.h <...> /* default/initial rxe device parameter settings */ enum rxe_device_param { RXE_MAX_MR_SIZE = -1ull, RXE_PAGE_SIZE_CAP = 0xfffff000, RXE_MAX_QP_WR = DEFAULT_MAX_VALUE, ===== rxe_init_device_param() sets the attributes to rxe_dev->attr, and it is later copied to ib_device->attrs in setup_device()@core/device.c. See that the page size cap is hardcoded to 4096 bytes. I suspect this led to incorrect page_size being set to ibmr.page_size, resulting in the kernel crash. I think rxe driver is made to be able to handle arbitrary page sizes. Probably, we can just modify RXE_PAGE_SIZE_CAP to fix the issue. What do you guys think? Thanks, Daisuke Matsuda