On Sun, Dec 12, 2021 at 5:45 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > > > On 12/11/21 5:01 AM, Yi Zhang wrote: > > On Fri, Jun 25, 2021 at 12:14 AM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote: > >> > >> On Thu, Jun 24, 2021 at 5:32 AM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > >>> > >>> > >>>> Hello > >>>> > >>>> Gentle ping here, this issue still exists on latest 5.13-rc7 > >>>> > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.636s > >>>> user 0m0.002s > >>>> sys 0m0.005s > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 0m12.641s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>> > >>> Strange that even normal resets take so long... > >>> What device are you using? > >> > >> Hi Sagi > >> > >> Here is the device info: > >> Mellanox Technologies MT27700 Family [ConnectX-4] > >> > >>> > >>>> # time nvme reset /dev/nvme0 > >>>> > >>>> real 1m16.133s > >>>> user 0m0.000s > >>>> sys 0m0.007s > >>> > >>> There seems to be a spurious command timeout here, but maybe this > >>> is due to the fact that the queues take so long to connect and > >>> the target expires the keep-alive timer. > >>> > >>> Does this patch help? > >> > >> The issue still exists, let me know if you need more testing for it. :) > > > > Hi Sagi > > ping, this issue still can be reproduced on the latest > > linux-block/for-next, do you have a chance to recheck it, thanks. > > Can you check if it happens with the below patch: Hi Sagi It is still reproducible with the change, here is the log: # time nvme reset /dev/nvme0 real 0m12.973s user 0m0.000s sys 0m0.006s # time nvme reset /dev/nvme0 real 1m15.606s user 0m0.000s sys 0m0.007s # dmesg | grep nvme [ 900.634877] nvme nvme0: resetting controller [ 909.026958] nvme nvme0: creating 40 I/O queues. [ 913.604297] nvme nvme0: mapped 40/0/0 default/read/poll queues. [ 917.600993] nvme nvme0: resetting controller [ 988.562230] nvme nvme0: I/O 2 QID 0 timeout [ 988.567607] nvme nvme0: Property Set error: 881, offset 0x14 [ 988.608181] nvme nvme0: creating 40 I/O queues. [ 993.203495] nvme nvme0: mapped 40/0/0 default/read/poll queues. BTW, this issue cannot be reproduced on my NVME/ROCE environment. > -- > diff --git a/drivers/nvme/target/fabrics-cmd.c > b/drivers/nvme/target/fabrics-cmd.c > index f91a56180d3d..6e5aadfb07a0 100644 > --- a/drivers/nvme/target/fabrics-cmd.c > +++ b/drivers/nvme/target/fabrics-cmd.c > @@ -191,6 +191,14 @@ static u16 nvmet_install_queue(struct nvmet_ctrl > *ctrl, struct nvmet_req *req) > } > } > > + /* > + * Controller establishment flow may take some time, and the > host may not > + * send us keep-alive during this period, hence reset the > + * traffic based keep-alive timer so we don't trigger a > + * controller teardown as a result of a keep-alive expiration. > + */ > + ctrl->reset_tbkas = true; > + > return 0; > > err: > -- > -- Best Regards, Yi Zhang