Patch "nvme-fabrics: fix kernel crash while shutting down controller" has been added to the 6.11-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Tue, 3 Dec 2024 07:39:10 -0500

This is a note to let you know that I've just added the patch titled

    nvme-fabrics: fix kernel crash while shutting down controller

to the 6.11-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     nvme-fabrics-fix-kernel-crash-while-shutting-down-co.patch
and it can be found in the queue-6.11 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4365cf2dac213a0d94c0d300c5ea23439f79b135
Author: Nilay Shroff <nilay@xxxxxxxxxxxxx>
Date:   Tue Nov 5 11:42:09 2024 +0530

    nvme-fabrics: fix kernel crash while shutting down controller
    
    [ Upstream commit e9869c85c81168a1275f909d5972a3fc435304be ]
    
    The nvme keep-alive operation, which executes at a periodic interval,
    could potentially sneak in while shutting down a fabric controller.
    This may lead to a race between the fabric controller admin queue
    destroy code path (invoked while shutting down controller) and hw/hctx
    queue dispatcher called from the nvme keep-alive async request queuing
    operation. This race could lead to the kernel crash shown below:
    
    Call Trace:
        autoremove_wake_function+0x0/0xbc (unreliable)
        __blk_mq_sched_dispatch_requests+0x114/0x24c
        blk_mq_sched_dispatch_requests+0x44/0x84
        blk_mq_run_hw_queue+0x140/0x220
        nvme_keep_alive_work+0xc8/0x19c [nvme_core]
        process_one_work+0x200/0x4e0
        worker_thread+0x340/0x504
        kthread+0x138/0x140
        start_kernel_thread+0x14/0x18
    
    While shutting down fabric controller, if nvme keep-alive request sneaks
    in then it would be flushed off. The nvme_keep_alive_end_io function is
    then invoked to handle the end of the keep-alive operation which
    decrements the admin->q_usage_counter and assuming this is the last/only
    request in the admin queue then the admin->q_usage_counter becomes zero.
    If that happens then blk-mq destroy queue operation (blk_mq_destroy_
    queue()) which could be potentially running simultaneously on another
    cpu (as this is the controller shutdown code path) would forward
    progress and deletes the admin queue. So, now from this point onward
    we are not supposed to access the admin queue resources. However the
    issue here's that the nvme keep-alive thread running hw/hctx queue
    dispatch operation hasn't yet finished its work and so it could still
    potentially access the admin queue resource while the admin queue had
    been already deleted and that causes the above crash.
    
    The above kernel crash is regression caused due to changes implemented
    in commit a54a93d0e359 ("nvme: move stopping keep-alive into
    nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin
    g the admin queue and freeing the admin tagset so that it wouldn't sneak
    in during the shutdown operation. However we removed the keep alive stop
    operation from the beginning of the controller shutdown code path in commit
    a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()")
    and added it under nvme_uninit_ctrl() which executes very late in the
    shutdown code path after the admin queue is destroyed and its tagset is
    removed. So this change created the possibility of keep-alive sneaking in
    and interfering with the shutdown operation and causing observed kernel
    crash.
    
    To fix the observed crash, we decided to move nvme_stop_keep_alive() from
    nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure
    that we don't forward progress and delete the admin queue until the keep-
    alive operation is finished (if it's in-flight) or cancelled and that would
    help contain the race condition explained above and hence avoid the crash.
    
    Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of
    adding nvme_stop_keep_alive() to the beginning of the controller shutdown
    code path in nvme_stop_ctrl(), as was the case earlier before commit
    a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"),
    would help save one callsite of nvme_stop_keep_alive().
    
    Fixes: a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()")
    Link: https://lore.kernel.org/all/1a21f37b-0f2a-4745-8c56-4dc8628d3983@xxxxxxxxxxxxx/
    Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
    Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
    Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 128932c849a1a..f49431cbc8dfc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4551,6 +4551,11 @@ EXPORT_SYMBOL_GPL(nvme_alloc_admin_tag_set);
 
 void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
 {
+	/*
+	 * As we're about to destroy the queue and free tagset
+	 * we can not have keep-alive work running.
+	 */
+	nvme_stop_keep_alive(ctrl);
 	blk_mq_destroy_queue(ctrl->admin_q);
 	blk_put_queue(ctrl->admin_q);
 	if (ctrl->ops->flags & NVME_F_FABRICS) {