Re: [PATCH 1/1] RDMA/rxe: Fix blktests srp lead kernel panic with 64k page size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 24/10/2023 16:15, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 23/10/2023 18:45, Yi Zhang wrote:
>> On Mon, Oct 23, 2023 at 11:52 AM Zhijian Li (Fujitsu)
>> <lizhijian@xxxxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 20/10/2023 22:01, Jason Gunthorpe wrote:
>>>> On Fri, Oct 20, 2023 at 03:47:05AM +0000, Zhijian Li (Fujitsu) wrote:
>>>>> CC Bart
>>>>>
>>>>> On 13/10/2023 20:01, Daisuke Matsuda (Fujitsu) wrote:
>>>>>> On Fri, Oct 13, 2023 10:18 AM Zhu Yanjun wrote:
>>>>>>> From: Zhu Yanjun<yanjun.zhu@xxxxxxxxx>
>>>>>>>
>>>>>>> The page_size of mr is set in infiniband core originally. In the commit
>>>>>>> 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c"), the
>>>>>>> page_size is also set. Sometime this will cause conflict.
>>>>>> I appreciate your prompt action, but I do not think this commit deals with
>>>>>> the root cause. I agree that the problem lies in rxe driver, but what is wrong
>>>>>> with assigning actual page size to ibmr.page_size?
>>>>>>
>>>>>> IMO, the problem comes from the device attribute of rxe driver, which is used
>>>>>> in ulp/srp layer to calculate the page_size.
>>>>>> =====
>>>>>> static int srp_add_one(struct ib_device *device)
>>>>>> {
>>>>>>             struct srp_device *srp_dev;
>>>>>>             struct ib_device_attr *attr = &device->attrs;
>>>>>> <...>
>>>>>>             /*
>>>>>>              * Use the smallest page size supported by the HCA, down to a
>>>>>>              * minimum of 4096 bytes. We're unlikely to build large sglists
>>>>>>              * out of smaller entries.
>>>>>>              */
>>>>>>             mr_page_shift           = max(12, ffs(attr->page_size_cap) - 1);
>>>>>
>>>>>
>>>>> You light me up.
>>>>> RXE provides attr.page_size_cap(RXE_PAGE_SIZE_CAP) which means it can support 4K-2G page size
>>>>
>>>> That doesn't seem right even in concept.>
>>>> I think the multi-size support in the new xarray code does not work
>>>> right, just looking at it makes me think it does not work right. It
>>>> looks like it can do less than PAGE_SIZE but more than PAGE_SIZE will
>>>> explode because kmap_local_page() does only 4K.
>>>>
>>>> If RXE_PAGE_SIZE_CAP == PAGE_SIZE  will everything work?
>>>>
>>>
>>> Yeah, this should work(even though i only verified hardcoding mr_page_shift to PAGE_SHIFT).
>>
>> Hi Zhijian
>>
>> Did you try blktests nvme/rdma use_rxe on your environment, it still
>> failed on my side.
>>
> 
> Thanks for your testing.
> I just did that, it also failed in my environment. After a glance debug, I found there are
> other places still registered 4K MR. I will dig into it later.


nvme intend to register 4K mr, but it should work IMO, at least the RXE have handled it correctly.


1293 static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
1294                 struct nvme_rdma_request *req, struct nvme_command *c,
1295                 int count)
1296 {
1297         struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
1298         int nr;
1299
1300         req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
1301         if (WARN_ON_ONCE(!req->mr))
1302                 return -EAGAIN;
1303
1304         /*
1305          * Align the MR to a 4K page size to match the ctrl page size and
1306          * the block virtual boundary.
1307          */
1308         nr = ib_map_mr_sg(req->mr, req->data_sgl.sg_table.sgl, count, NULL,
1309                           SZ_4K);


Anyway, i will go ahead. if you have any thought, please let me know.



> 
> Thanks
> Zhijian
> 
> 
> 
> 
>> # use_rxe=1 nvme_trtype=rdma  ./check nvme/003
>> nvme/003 (test if we're sending keep-alives to a discovery controller) [failed]
>>       runtime  12.179s  ...  11.941s
>>       --- tests/nvme/003.out 2023-10-22 10:54:43.041749537 -0400
>>       +++ /root/blktests/results/nodev/nvme/003.out.bad 2023-10-23
>> 05:52:27.882759168 -0400
>>       @@ -1,3 +1,3 @@
>>        Running nvme/003
>>       -NQN:nqn.2014-08.org.nvmexpress.discovery disconnected 1 controller(s)
>>       +NQN:nqn.2014-08.org.nvmexpress.discovery disconnected 0 controller(s)
>>        Test complete
>>
>> [ 7033.431910] rdma_rxe: loaded
>> [ 7033.456341] run blktests nvme/003 at 2023-10-23 05:52:15
>> [ 7033.502306] (null): rxe_set_mtu: Set mtu to 1024
>> [ 7033.510969] infiniband enP2p1s0v0_rxe: set active
>> [ 7033.510980] infiniband enP2p1s0v0_rxe: added enP2p1s0v0
>> [ 7033.549301] loop0: detected capacity change from 0 to 2097152
>> [ 7033.556966] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [ 7033.566711] nvmet_rdma: enabling port 0 (10.19.240.81:4420)
>> [ 7033.588605] nvmet: connect attempt for invalid controller ID 0x808
>> [ 7033.594909] nvme nvme0: Connect Invalid Data Parameter, cntlid: 65535
>> [ 7033.601504] nvme nvme0: failed to connect queue: 0 ret=16770
>> [ 7046.317861] rdma_rxe: unloaded
>>
>>
>>>
>>>>>> import ctypes
>>>>>> libc = ctypes.cdll.LoadLibrary('libc.so.6')
>>>>>> hex(65536)
>>> '0x10000'
>>>>>> libc.ffs(0x10000) - 1
>>> 16
>>>>>> 1 << 16
>>> 65536
>>>
>>> so
>>> mr_page_shift = max(12, ffs(attr->page_size_cap) - 1) = max(12, 16) = 16;
>>>
>>>
>>> So I think Daisuke's patch should work as well.
>>>
>>> https://lore.kernel.org/linux-rdma/OS3PR01MB98652B2EC2E85DAEC6DDE484E5D2A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#md133060414f0ba6a3dbaf7b4ad2374c8a347cfd1
>>>
>>>
>>>> Jason
>>
>>
>>
>> --
>> Best Regards,
>>     Yi Zhang




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux