Re: [PATCH 4.9] nvme: validate admin queue before force-start on removal

Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> · Wed, 11 Jul 2018 15:41:15 +0200

On Wed, Jul 11, 2018 at 03:21:52PM +0200, Simon Veith wrote:
> Commit 4aae4388165a2611fa4206363ccb243c1622446c ("nvme: fix hang in remove
> path"), which was introduced in Linux 4.9.94, changed nvme_kill_queues()
> to also forcibly start admin queues in order to avoid getting stuck during
> device removal.
> 
> If a device is being removed because it did not respond during device
> initialization (e.g., if it is not ready yet at boot time), we will end up
> trying to start an admin queue that has not yet been set up at all. This
> attempt will lead to a NULL pointer dereference.
> 
> To avoid hitting this bug, we add a sanity check around the invocation of
> blk_mq_start_hw_queues() to ensure that the admin queue has actually been
> set up already.
> 
> Upstream already has this check in place since commit
> 7dd1ab163c17e11473a65b11f7e748db30618ebb ("nvme: validate admin queue
> before unquiesce"), and thus 4.14 contains it as well. Linux 4.4 is not
> affected by this particular issue since it does not have the force-start
> behavior yet.
> 
> Fixes: 4aae4388165a2611fa42 ("nvme: fix hang in remove path")
> 
> Signed-off-by: Simon Veith <sveith@xxxxxxxxx>
> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
> ---
>  drivers/nvme/host/core.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c823e93..8a30478 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2041,8 +2041,10 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  
> -	/* Forcibly start all queues to avoid having stuck requests */
> -	blk_mq_start_hw_queues(ctrl->admin_q);
> +	if (ctrl->admin_q) {
> +		/* Forcibly start all queues to avoid having stuck requests */
> +		blk_mq_start_hw_queues(ctrl->admin_q);
> +	}
>  

Why have you rewritten commit 7dd1ab163c17 ("nvme: validate admin queue
before unquiesce") here?  Why not just backport it directly?

confused,

greg k-h