Hi Laurence, Great thanks for your so quick test! On Fri, May 11, 2018 at 5:59 AM, Laurence Oberman <loberman@xxxxxxxxxx> wrote: > On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote: >> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote: >> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: >> > > Hi, >> > > >> > > The 1st patch introduces blk_quiesce_timeout() and >> > > blk_unquiesce_timeout() >> > > for NVMe, meantime fixes blk_sync_queue(). >> > > >> > > The 2nd patch covers timeout for admin commands for recovering >> > > controller >> > > for avoiding possible deadlock. >> > > >> > > The 3rd and 4th patches avoid to wait_freeze on queues which >> > > aren't >> > > frozen. >> > > >> > > The last 4 patches fixes several races wrt. NVMe timeout handler, >> > > and >> > > finally can make blktests block/011 passed. Meantime the NVMe PCI >> > > timeout >> > > mecanism become much more rebost than before. >> > > >> > > gitweb: >> > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 >> > > >> > > V4: >> > > - fixe nvme_init_set_host_mem_cmd() >> > > - use nested EH model, and run both nvme_dev_disable() and >> > > resetting in one same context >> > > >> > > V3: >> > > - fix one new race related freezing in patch 4, >> > > nvme_reset_work() >> > > may hang forever without this patch >> > > - rewrite the last 3 patches, and avoid to break >> > > nvme_reset_ctrl*() >> > > >> > > V2: >> > > - fix draining timeout work, so no need to change return value >> > > from >> > > .timeout() >> > > - fix race between nvme_start_freeze() and nvme_unfreeze() >> > > - cover timeout for admin commands running in EH >> > > >> > > Ming Lei (7): >> > > block: introduce blk_quiesce_timeout() and >> > > blk_unquiesce_timeout() >> > > nvme: pci: cover timeout for admin commands running in EH >> > > nvme: pci: only wait freezing if queue is frozen >> > > nvme: pci: freeze queue in nvme_dev_disable() in case of error >> > > recovery >> > > nvme: core: introduce 'reset_lock' for sync reset state and >> > > reset >> > > activities >> > > nvme: pci: prepare for supporting error recovery from resetting >> > > context >> > > nvme: pci: support nested EH >> > > >> > > block/blk-core.c | 21 +++- >> > > block/blk-mq.c | 9 ++ >> > > block/blk-timeout.c | 5 +- >> > > drivers/nvme/host/core.c | 46 ++++++- >> > > drivers/nvme/host/nvme.h | 5 + >> > > drivers/nvme/host/pci.c | 304 >> > > ++++++++++++++++++++++++++++++++++++++++------- >> > > include/linux/blkdev.h | 13 ++ >> > > 7 files changed, 356 insertions(+), 47 deletions(-) >> > > >> > > Cc: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> >> > > Cc: Christoph Hellwig <hch@xxxxxx> >> > > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> >> > > Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx >> > > Cc: Laurence Oberman <loberman@xxxxxxxxxx> >> > >> > Hello Ming >> > >> > I have a two node NUMA system here running your kernel tree >> > 4.17.0-rc3.ming.nvme+ >> > >> > [root@segstorage1 ~]# numactl --hardware >> > available: 2 nodes (0-1) >> > node 0 cpus: 0 3 5 6 8 11 13 14 >> > node 0 size: 63922 MB >> > node 0 free: 61310 MB >> > node 1 cpus: 1 2 4 7 9 10 12 15 >> > node 1 size: 64422 MB >> > node 1 free: 62372 MB >> > node distances: >> > node 0 1 >> > 0: 10 20 >> > 1: 20 10 >> > >> > I ran block/011 >> > >> > [root@segstorage1 blktests]# ./check block/011 >> > block/011 => nvme0n1 (disable PCI device while doing >> > I/O) [failed] >> > runtime ... 106.936s >> > --- tests/block/011.out 2018-05-05 18:01:14.268414752 >> > -0400 >> > +++ results/nvme0n1/block/011.out.bad 2018-05-05 >> > 19:07:21.028634858 -0400 >> > @@ -1,2 +1,36 @@ >> > Running block/011 >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > ... >> > (Run 'diff -u tests/block/011.out >> > results/nvme0n1/block/011.out.bad' to see the entire diff) >> > >> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 >> > [ 1452.676351] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.718221] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.718239] nvme nvme0: EH 0: before shutdown >> > [ 1452.760890] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760894] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760897] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760900] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760903] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760906] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760909] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760912] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760915] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760918] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760921] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760923] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760926] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1453.330251] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1453.391713] nvme nvme0: EH 0: after shutdown >> > [ 1456.804695] device-mapper: multipath: Failing path 259:0. >> > [ 1526.721196] nvme nvme0: I/O 15 QID 0 timeout, disable controller >> > [ 1526.754335] nvme nvme0: EH 1: before shutdown >> > [ 1526.793257] nvme nvme0: EH 1: after shutdown >> > [ 1526.793327] nvme nvme0: Identify Controller failed (-4) >> > [ 1526.847869] nvme nvme0: Removing after probe failure status: -5 >> > [ 1526.888206] nvme nvme0: EH 0: after recovery >> > [ 1526.888212] nvme0n1: detected capacity change from 400088457216 >> > to 0 >> > [ 1526.947520] print_req_error: 1 callbacks suppressed >> > [ 1526.947522] print_req_error: I/O error, dev nvme0n1, sector >> > 794920 >> > [ 1526.947534] print_req_error: I/O error, dev nvme0n1, sector >> > 569328 >> > [ 1526.947540] print_req_error: I/O error, dev nvme0n1, sector >> > 1234608 >> > [ 1526.947556] print_req_error: I/O error, dev nvme0n1, sector >> > 389296 >> > [ 1526.947564] print_req_error: I/O error, dev nvme0n1, sector >> > 712432 >> > [ 1526.947566] print_req_error: I/O error, dev nvme0n1, sector >> > 889304 >> > [ 1526.947572] print_req_error: I/O error, dev nvme0n1, sector >> > 205776 >> > [ 1526.947574] print_req_error: I/O error, dev nvme0n1, sector >> > 126480 >> > [ 1526.947575] print_req_error: I/O error, dev nvme0n1, sector >> > 1601232 >> > [ 1526.947580] print_req_error: I/O error, dev nvme0n1, sector >> > 1234360 >> > [ 1526.947745] Pid 683(fio) over core_pipe_limit >> > [ 1526.947746] Skipping core dump >> > [ 1526.947747] Pid 675(fio) over core_pipe_limit >> > [ 1526.947748] Skipping core dump >> > [ 1526.947863] Pid 672(fio) over core_pipe_limit >> > [ 1526.947863] Skipping core dump >> > [ 1526.947865] Pid 674(fio) over core_pipe_limit >> > [ 1526.947866] Skipping core dump >> > [ 1526.947870] Pid 676(fio) over core_pipe_limit >> > [ 1526.947871] Pid 679(fio) over core_pipe_limit >> > [ 1526.947872] Skipping core dump >> > [ 1526.947872] Skipping core dump >> > [ 1526.948197] Pid 677(fio) over core_pipe_limit >> > [ 1526.948197] Skipping core dump >> > [ 1526.948245] Pid 686(fio) over core_pipe_limit >> > [ 1526.948245] Skipping core dump >> > [ 1526.974610] Pid 680(fio) over core_pipe_limit >> > [ 1526.974611] Pid 684(fio) over core_pipe_limit >> > [ 1526.974611] Skipping core dump >> > [ 1526.980370] nvme nvme0: failed to mark controller CONNECTING >> > [ 1526.980373] nvme nvme0: Removing after probe failure status: -19 >> > [ 1526.980385] nvme nvme0: EH 1: after recovery >> > [ 1526.980477] Pid 687(fio) over core_pipe_limit >> > [ 1526.980478] Skipping core dump >> > [ 1527.858207] Skipping core dump >> > >> > And leaves me looping here >> > >> > [ 1721.272276] INFO: task kworker/u66:0:24214 blocked for more than >> > 120 >> > seconds. >> > [ 1721.311263] Tainted: G I 4.17.0- >> > rc3.ming.nvme+ >> > #1 >> > [ 1721.348027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> > disables this message. >> > [ 1721.392957] kworker/u66:0 D 0 24214 2 0x80000080 >> > [ 1721.424425] Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme] >> > [ 1721.458568] Call Trace: >> > [ 1721.472499] ? __schedule+0x290/0x870 >> > [ 1721.493515] schedule+0x32/0x80 >> > [ 1721.511656] blk_mq_freeze_queue_wait+0x46/0xb0 >> > [ 1721.537609] ? remove_wait_queue+0x60/0x60 >> > [ 1721.561081] blk_cleanup_queue+0x7e/0x180 >> > [ 1721.584637] nvme_ns_remove+0x106/0x140 [nvme_core] >> > [ 1721.612589] nvme_remove_namespaces+0x8e/0xd0 [nvme_core] >> > [ 1721.643163] nvme_remove+0x80/0x120 [nvme] >> > [ 1721.666188] pci_device_remove+0x3b/0xc0 >> > [ 1721.688553] device_release_driver_internal+0x148/0x220 >> > [ 1721.719332] nvme_remove_dead_ctrl_work+0x29/0x40 [nvme] >> > [ 1721.750474] process_one_work+0x158/0x360 >> > [ 1721.772632] worker_thread+0x47/0x3e0 >> > [ 1721.792471] kthread+0xf8/0x130 >> > [ 1721.810354] ? max_active_store+0x80/0x80 >> > [ 1721.832459] ? kthread_bind+0x10/0x10 >> > [ 1721.852845] ret_from_fork+0x35/0x40 >> > >> > Did I di something wrong >> > >> > I never set anything else, the nvme0n1 was not mounted etc. >> >> Hi Laurence, >> >> Thanks for your test! >> >> Could you run the following V5(not posted yest) and see if >> the issues you triggered can be fixed? If not, please provide >> me the dmesg log. >> >> https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V5 >> >> BTW, the main change is on handling reset failure, in V5, only >> the failure from top EH is handled. >> >> Thanks, >> Ming > > Hello Ming > > Seems better , had a failure on first test but no panics. > Following tests have all passed > > root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O) [failed] > runtime 41.790s ... 79.184s > --- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400 > +++ results/nvme0n1/block/011.out.bad 2018-05-10 > 17:48:34.792080746 -0400 > @@ -1,2 +1,3 @@ > Running block/011 > +tests/block/011: line 47: echo: write error: Input/output error > Test complete This is expected result, since it is possible for request completed as error finally after retrying 5 times. I will post out V5 soon. > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O) [passed] > runtime 79.184s ... 42.196s > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O) [passed] > runtime 42.196s ... 41.390s > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O) [passed] > runtime 41.390s ... 42.193s > > Kernel 4.17.0-rc3.ming.v5+ on an x86_64 > > segstorage1 login: [ 631.297687] run blktests block/011 at 2018-05-10 > 17:47:15 > [ 661.951541] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 661.990218] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 661.990257] nvme nvme0: EH 0: before shutdown > [ 662.031388] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031395] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031398] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031402] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031405] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031409] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031412] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031416] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.031420] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.436080] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.477826] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.519368] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.560755] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.602456] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 662.657904] nvme nvme0: EH 0: after shutdown > [ 668.730405] nvme nvme0: EH 0: after recovery 0 > [ 738.859987] run blktests block/011 at 2018-05-10 17:49:03 > [ 810.586431] run blktests block/011 at 2018-05-10 17:50:14 > [ 1065.694108] run blktests block/011 at 2018-05-10 17:54:29 Thanks, Ming Lei