On 24/10/2023 16:15, Zhijian Li (Fujitsu) wrote: > > > On 23/10/2023 18:45, Yi Zhang wrote: >> On Mon, Oct 23, 2023 at 11:52 AM Zhijian Li (Fujitsu) >> <lizhijian@xxxxxxxxxxx> wrote: >>> >>> >>> >>> On 20/10/2023 22:01, Jason Gunthorpe wrote: >>>> On Fri, Oct 20, 2023 at 03:47:05AM +0000, Zhijian Li (Fujitsu) wrote: >>>>> CC Bart >>>>> >>>>> On 13/10/2023 20:01, Daisuke Matsuda (Fujitsu) wrote: >>>>>> On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote: >>>>>>> From: Zhu Yanjun<yanjun.zhu@xxxxxxxxx> >>>>>>> >>>>>>> The page_size of mr is set in infiniband core originally. In the commit >>>>>>> 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the >>>>>>> page_size is also set. Sometime this will cause conflict. >>>>>> I appreciate your prompt action, but I do not think this commit deals with >>>>>> the root cause. I agree that the problem lies in rxe driver, but what is wrong >>>>>> with assigning actual page size to ibmr.page_size? >>>>>> >>>>>> IMO, the problem comes from the device attribute of rxe driver, which is used >>>>>> in ulp/srp layer to calculate the page_size. >>>>>> ===== >>>>>> static int srp_add_one(struct ib_device *device) >>>>>> { >>>>>> struct srp_device *srp_dev; >>>>>> struct ib_device_attr *attr = &device->attrs; >>>>>> <...> >>>>>> /* >>>>>> * Use the smallest page size supported by the HCA, down to a >>>>>> * minimum of 4096 bytes. We're unlikely to build large sglists >>>>>> * out of smaller entries. >>>>>> */ >>>>>> mr_page_shift = max(12, ffs(attr->page_size_cap) - 1); >>>>> >>>>> >>>>> You light me up. >>>>> RXE provides attr.page_size_cap(RXE_PAGE_SIZE_CAP) which means it can support 4K-2G page size >>>> >>>> That doesn't seem right even in concept.> >>>> I think the multi-size support in the new xarray code does not work >>>> right, just looking at it makes me think it does not work right. It >>>> looks like it can do less than PAGE_SIZE but more than PAGE_SIZE will >>>> explode because kmap_local_page() does only 4K. >>>> >>>> If RXE_PAGE_SIZE_CAP == PAGE_SIZE will everything work? >>>> >>> >>> Yeah, this should work(even though i only verified hardcoding mr_page_shift to PAGE_SHIFT). >> >> Hi Zhijian >> >> Did you try blktests nvme/rdma use_rxe on your environment, it still >> failed on my side. >> > > Thanks for your testing. > I just did that, it also failed in my environment. After a glance debug, I found there are > other places still registered 4K MR. I will dig into it later. nvme intend to register 4K mr, but it should work IMO, at least the RXE have handled it correctly. 1293 static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue, 1294 struct nvme_rdma_request *req, struct nvme_command *c, 1295 int count) 1296 { 1297 struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl; 1298 int nr; 1299 1300 req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs); 1301 if (WARN_ON_ONCE(!req->mr)) 1302 return -EAGAIN; 1303 1304 /* 1305 * Align the MR to a 4K page size to match the ctrl page size and 1306 * the block virtual boundary. 1307 */ 1308 nr = ib_map_mr_sg(req->mr, req->data_sgl.sg_table.sgl, count, NULL, 1309 SZ_4K); Anyway, i will go ahead. if you have any thought, please let me know. > > Thanks > Zhijian > > > > >> # use_rxe=1 nvme_trtype=rdma ./check nvme/003 >> nvme/003 (test if we're sending keep-alives to a discovery controller) [failed] >> runtime 12.179s ... 11.941s >> --- tests/nvme/003.out 2023-10-22 10:54:43.041749537 -0400 >> +++ /root/blktests/results/nodev/nvme/003.out.bad 2023-10-23 >> 05:52:27.882759168 -0400 >> @@ -1,3 +1,3 @@ >> Running nvme/003 >> -NQN:nqn.2014-08.org.nvmexpress.discovery disconnected 1 controller(s) >> +NQN:nqn.2014-08.org.nvmexpress.discovery disconnected 0 controller(s) >> Test complete >> >> [ 7033.431910] rdma_rxe: loaded >> [ 7033.456341] run blktests nvme/003 at 2023-10-23 05:52:15 >> [ 7033.502306] (null): rxe_set_mtu: Set mtu to 1024 >> [ 7033.510969] infiniband enP2p1s0v0_rxe: set active >> [ 7033.510980] infiniband enP2p1s0v0_rxe: added enP2p1s0v0 >> [ 7033.549301] loop0: detected capacity change from 0 to 2097152 >> [ 7033.556966] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 >> [ 7033.566711] nvmet_rdma: enabling port 0 (10.19.240.81:4420) >> [ 7033.588605] nvmet: connect attempt for invalid controller ID 0x808 >> [ 7033.594909] nvme nvme0: Connect Invalid Data Parameter, cntlid: 65535 >> [ 7033.601504] nvme nvme0: failed to connect queue: 0 ret=16770 >> [ 7046.317861] rdma_rxe: unloaded >> >> >>> >>>>>> import ctypes >>>>>> libc = ctypes.cdll.LoadLibrary('libc.so.6') >>>>>> hex(65536) >>> '0x10000' >>>>>> libc.ffs(0x10000) - 1 >>> 16 >>>>>> 1 << 16 >>> 65536 >>> >>> so >>> mr_page_shift = max(12, ffs(attr->page_size_cap) - 1) = max(12, 16) = 16; >>> >>> >>> So I think Daisuke's patch should work as well. >>> >>> https://lore.kernel.org/linux-rdma/OS3PR01MB98652B2EC2E85DAEC6DDE484E5D2A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#md133060414f0ba6a3dbaf7b4ad2374c8a347cfd1 >>> >>> >>>> Jason >> >> >> >> -- >> Best Regards, >> Yi Zhang