On Fri, Dec 17, 2021 at 1:34 AM Haakon Bugge <haakon.bugge@xxxxxxxxxx> wrote: > > > > > On 16 Dec 2021, at 17:32, Yi Zhang <yi.zhang@xxxxxxxxxx> wrote: > > > > On Thu, Dec 16, 2021 at 9:21 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > >> > >> > >> On 12/16/2021 4:18 AM, Yi Zhang wrote: > >>> On Wed, Dec 15, 2021 at 8:10 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > >>>> > >>>> On 12/15/2021 3:15 AM, Yi Zhang wrote: > >>>>> On Tue, Dec 14, 2021 at 8:01 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > >>>>>> On 12/14/2021 12:39 PM, Sagi Grimberg wrote: > >>>>>>>>>> Hi Sagi > >>>>>>>>>> It is still reproducible with the change, here is the log: > >>>>>>>>>> > >>>>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>>>> > >>>>>>>>>> real 0m12.973s > >>>>>>>>>> user 0m0.000s > >>>>>>>>>> sys 0m0.006s > >>>>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>>>>> > >>>>>>>>>> real 1m15.606s > >>>>>>>>>> user 0m0.000s > >>>>>>>>>> sys 0m0.007s > >>>>>>>>> Does it speed up if you use less queues? (i.e. connect with -i 4) ? > >>>>>>>> Yes, with -i 4, it has stablee 1.3s > >>>>>>>> # time nvme reset /dev/nvme0 > >>>>>>> So it appears that destroying a qp takes a long time on > >>>>>>> IB for some reason... > >>>>>>> > >>>>>>>> real 0m1.225s > >>>>>>>> user 0m0.000s > >>>>>>>> sys 0m0.007s > >>>>>>>> > >>>>>>>>>> # dmesg | grep nvme > >>>>>>>>>> [ 900.634877] nvme nvme0: resetting controller > >>>>>>>>>> [ 909.026958] nvme nvme0: creating 40 I/O queues. > >>>>>>>>>> [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>>>> [ 917.600993] nvme nvme0: resetting controller > >>>>>>>>>> [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout > >>>>>>>>>> [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 > >>>>>>>>>> [ 988.608181] nvme nvme0: creating 40 I/O queues. > >>>>>>>>>> [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. > >>>>>>>>>> > >>>>>>>>>> BTW, this issue cannot be reproduced on my NVME/ROCE environment. > >>>>>>>>> Then I think that we need the rdma folks to help here... > >>>>>>> Max? > >>>>>> It took me 12s to reset a controller with 63 IO queues with 5.16-rc3+. > >>>>>> > >>>>>> Can you try repro with latest versions please ? > >>>>>> > >>>>>> Or give the exact scenario ? > >>>>> Yeah, both target and client are using Mellanox Technologies MT27700 > >>>>> Family [ConnectX-4], could you try stress "nvme reset /dev/nvme0", the > >>>>> first time reset will take 12s, and it always can be reproduced at the > >>>>> second reset operation. > >>>> I created a target with 1 namespace backed by null_blk and connected to > >>>> it from the same server in loopback rdma connection using the ConnectX-4 > >>>> adapter. > >>> Could you share your loop.json file so I can try it on my environment? > >> > >> { > >> "hosts": [], > >> "ports": [ > >> { > >> "addr": { > >> "adrfam": "ipv4", > >> "traddr": "<ip>", > >> "treq": "not specified", > >> "trsvcid": "4420", > >> "trtype": "rdma" > >> }, > >> "portid": 1, > >> "referrals": [], > >> "subsystems": [ > >> "testsubsystem_0" > >> ] > >> } > >> ], > >> "subsystems": [ > >> { > >> "allowed_hosts": [], > >> "attr": { > >> "allow_any_host": "1", > >> "cntlid_max": "65519", > >> "cntlid_min": "1", > >> "model": "Linux", > >> "serial": "3d83c78b76623f1d", > >> "version": "1.3" > >> }, > >> "namespaces": [ > >> { > >> "device": { > >> "nguid": "5b722b05-e9b6-542d-ba80-62010b57775d", > >> "path": "/dev/nullb0", > >> "uuid": "26ffc8ce-73b4-321d-9685-7d7a9872c460" > >> }, > >> "enable": 1, > >> "nsid": 1 > >> } > >> ], > >> "nqn": "testsubsystem_0" > >> } > >> ] > >> } > > > > Thanks, I reproduced it with your json file on one server with > > loopback rdma connection: > > # time nvme connect -t rdma -a 172.31.0.202 -s 4420 -n testsubsystem_0 > > > > real 0m4.557s > > user 0m0.000s > > sys 0m0.005s > > # time nvme reset /dev/nvme0 > > > > real 0m13.176s > > user 0m0.000s > > sys 0m0.007s > > # time nvme reset /dev/nvme0 > > > > real 1m16.577s > > user 0m0.002s > > sys 0m0.005s > > # time nvme disconnect -n testsubsystem_0 > > What does: > > # rdma res show qp | grep -c ERR > > say, when it is slow? Hi Haakon Here is the output during the reset operation: 38 33 28 24 19 14 9 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 > > > > Thxs, Håkon > -- Best Regards, Yi Zhang