Re: [bug report] NVMe/IB: reset_controller need more than 1min

Haakon Bugge <haakon.bugge@xxxxxxxxxx> · Fri, 17 Dec 2021 11:19:59 +0000

> On 17 Dec 2021, at 08:03, Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
> 
> On Fri, Dec 17, 2021 at 1:34 AM Haakon Bugge <haakon.bugge@xxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> On 16 Dec 2021, at 17:32, Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
>>> 
>>> On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> On 12/16/2021 4:18 AM, Yi Zhang wrote:
>>>>> On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> On 12/15/2021 3:15 AM, Yi Zhang wrote:
>>>>>>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote:
>>>>>>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote:
>>>>>>>>>>>> Hi Sagi
>>>>>>>>>>>> It is still reproducible with the change, here is the log:
>>>>>>>>>>>> 
>>>>>>>>>>>> # time nvme reset /dev/nvme0
>>>>>>>>>>>> 
>>>>>>>>>>>> real    0m12.973s
>>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>>> sys     0m0.006s
>>>>>>>>>>>> # time nvme reset /dev/nvme0
>>>>>>>>>>>> 
>>>>>>>>>>>> real    1m15.606s
>>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>>> sys     0m0.007s
>>>>>>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ?
>>>>>>>>>> Yes, with -i 4, it has stablee 1.3s
>>>>>>>>>> # time nvme reset /dev/nvme0
>>>>>>>>> So it appears that destroying a qp takes a long time on
>>>>>>>>> IB for some reason...
>>>>>>>>> 
>>>>>>>>>> real 0m1.225s
>>>>>>>>>> user 0m0.000s
>>>>>>>>>> sys 0m0.007s
>>>>>>>>>> 
>>>>>>>>>>>> # dmesg | grep nvme
>>>>>>>>>>>> [  900.634877] nvme nvme0: resetting controller
>>>>>>>>>>>> [  909.026958] nvme nvme0: creating 40 I/O queues.
>>>>>>>>>>>> [  913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues.
>>>>>>>>>>>> [  917.600993] nvme nvme0: resetting controller
>>>>>>>>>>>> [  988.562230] nvme nvme0: I/O 2 QID 0 timeout
>>>>>>>>>>>> [  988.567607] nvme nvme0: Property Set error: 881, offset 0x14
>>>>>>>>>>>> [  988.608181] nvme nvme0: creating 40 I/O queues.
>>>>>>>>>>>> [  993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues.
>>>>>>>>>>>> 
>>>>>>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment.
>>>>>>>>>>> Then I think that we need the rdma folks to help here...
>>>>>>>>> Max?
>>>>>>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+.
>>>>>>>> 
>>>>>>>> Can you try repro with latest versions please ?
>>>>>>>> 
>>>>>>>> Or give the exact scenario ?
>>>>>>> Yeah, both target and client are using Mellanox Technologies MT27700
>>>>>>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the
>>>>>>> first time reset will take 12s, and it always can be reproduced at the
>>>>>>> second reset operation.
>>>>>> I created a target with 1 namespace backed by null_blk and connected to
>>>>>> it from the same server in loopback rdma connection using the ConnectX-4
>>>>>> adapter.
>>>>> Could you share your loop.json file so I can try it on my environment?
>>>> 
>>>> {
>>>>  "hosts": [],
>>>>  "ports": [
>>>>    {
>>>>      "addr": {
>>>>        "adrfam": "ipv4",
>>>>        "traddr": "<ip>",
>>>>        "treq": "not specified",
>>>>        "trsvcid": "4420",
>>>>        "trtype": "rdma"
>>>>      },
>>>>      "portid": 1,
>>>>      "referrals": [],
>>>>      "subsystems": [
>>>>        "testsubsystem_0"
>>>>      ]
>>>>    }
>>>>  ],
>>>>  "subsystems": [
>>>>    {
>>>>      "allowed_hosts": [],
>>>>      "attr": {
>>>>        "allow_any_host": "1",
>>>>        "cntlid_max": "65519",
>>>>        "cntlid_min": "1",
>>>>        "model": "Linux",
>>>>        "serial": "3d83c78b76623f1d",
>>>>        "version": "1.3"
>>>>      },
>>>>      "namespaces": [
>>>>        {
>>>>          "device": {
>>>>            "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d",
>>>>            "path": "/dev/nullb0",
>>>>            "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460"
>>>>          },
>>>>          "enable": 1,
>>>>          "nsid": 1
>>>>        }
>>>>      ],
>>>>      "nqn": "testsubsystem_0"
>>>>    }
>>>>  ]
>>>> }
>>> 
>>> Thanks, I reproduced it with your json file on one server with
>>> loopback rdma connection:
>>> # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0
>>> 
>>> real 0m4.557s
>>> user 0m0.000s
>>> sys 0m0.005s
>>> # time nvme reset /dev/nvme0
>>> 
>>> real 0m13.176s
>>> user 0m0.000s
>>> sys 0m0.007s
>>> # time nvme reset /dev/nvme0
>>> 
>>> real 1m16.577s
>>> user 0m0.002s
>>> sys 0m0.005s
>>> # time nvme disconnect -n testsubsystem_0
>> 
>> What does:
>> 
>> # rdma res show qp | grep -c ERR
>> 
>> say, when it is slow?
> 
> Hi Haakon
> Here is the output during the reset operation:
> 
> 38
> 33
> 28
> 24
> 19
> 14
> 9
> 4
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 1
> 0
> 0
> 0

OK. I had a hunch that many QPs in the ERR state could be the reason. But this is not many, so I rest my case :-) Thanks for trying though!

Thxs, Håkon

> 
>> 
>> 
>> 
>> Thxs, Håkon
>> 
> 
> 
> 
> -- 
> Best Regards,
>  Yi Zhang