On Thu, Apr 14, 2022 at 06:09:45PM -0700, Davidlohr Bueso wrote: > No idea, however, why this would happen when using qemu as opposed to > nbd. Sorry I keep saying nbd often when I mean null_blk, but yeah, same concept, in that I meant an nvme controller is involved *with* the RCU splat. The qemu nvme driver and qemu controller would be one code run time surface to conside here, the two consecutive nvme controller IO timeouts seem to be telling of a backlog of some sort: [493305.769531] run blktests zbd/006 at 2022-04-14 20:03:55 [493336.979482] nvme nvme9: I/O 192 QID 5 timeout, aborting [493336.981666] nvme nvme9: Abort status: 0x0 [493367.699440] nvme nvme9: I/O 192 QID 5 timeout, reset controller <-- rcu possible stall and NMIs--> [493426.706021] nvme nvme9: 8/0/0 default/read/poll queues So that seems to be pattern here. First puzzle is *why* we end up with the IO timeout on the qemu nvme controller. And for this reason I CC'd Klaus, in case he might have any ideas. The first "timeout, aborting" happens when we have a first timeout of a request for IO tag 192 sent to the controller and it just times out. This seems to happen about 30 seconds after the test zbd/006 starts. The "reset controller" will happen if we already issued the IO command with tag 192 before and had already timed out before and hadn't been returned to the driver. Resetting the controller is the stop gap measure. The nvme_timeout() is just shared blk_mq_ops between nvme_mq_admin_ops and nvme_mq_ops. Inspecting the logs of running test zbd/005 and zbd/006 in a loop I can see that not all nvme resets of the controller end up in the RCU stall splat though. The timeout of an IO happens about after 30 seconds the test starts, then the reset 30 seconds after that first timeout, then the RCU stall about 30 seconds after. For instance follow these now mapped times: Apr 13 23:02:33 linux517-blktests-zbd unknown: run blktests zbd/006 at 2022-04-13 23:02:33 30 seconds go by ... Apr 13 23:03:05 linux517-blktests-zbd kernel: nvme nvme9: I/O 651 QID 3 timeout, aborting Apr 13 23:03:05 linux517-blktests-zbd kernel: nvme nvme9: Abort status: 0x0 30 seconds go by ... Apr 13 23:03:36 linux517-blktests-zbd kernel: nvme nvme9: I/O 651 QID 3 timeout, reset controller 30 seconds go by ... Apr 13 23:04:04 linux517-blktests-zbd kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: So the reset happens and yet we stay put. The reset work for the PCI nvme controller is on nvme_reset_work() drivers/nvme/host/pci.c and I don't see that issue'ing stuff back up the admin queue, however, the first IO timeout did send an admin abort, on the same file: drivers/nvme/host/pci.c static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) { ... dev_warn(nvmeq->dev->ctrl.device, "I/O %d QID %d timeout, aborting\n", req->tag, nvmeq->qid); abort_req = blk_mq_alloc_request(dev->ctrl.admin_q, nvme_req_op(&cmd), BLK_MQ_REQ_NOWAIT); if (IS_ERR(abort_req)) { atomic_inc(&dev->ctrl.abort_limit); return BLK_EH_RESET_TIMER; } nvme_init_request(abort_req, &cmd); abort_req->end_io_data = NULL; blk_execute_rq_nowait(abort_req, false, abort_endio); ... } So this is all seems by design. What's puzzling is why it takes about 30 seconds to get to the possible RCU stall splat *after* we reset the nvme controller. It make me wonder if we left something lingering in a black hole somewhere after our first reset. Is it right to gather from the RCU splat that whatever it was, it was stuck in a long spin lock? Luis