Re: untangle the request_queue refcounting from the queue kobject v2

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sat, 19 Nov 2022 03:00:39 +0000

On Sat, Nov 19, 2022 at 02:19:43AM +0000, Al Viro wrote:
> On Mon, Nov 14, 2022 at 05:26:32AM +0100, Christoph Hellwig wrote:
> > Hi Jens,
> > 
> > this series cleans up the registration of the "queue/" kobject, and given
> > untangles it from the request_queue refcounting.
> > 
> > Changes since v1:
> >  - also change the blk_crypto_sysfs_unregister prototype
> >  - add two patches to fix the error handling in blk_register_queue
> 
> Umm...  Do we ever want access to queue parameters of the stuff that has
> a queue, but no associated gendisk?  SCSI tape, for example...
> 
> 	Re refcounting: AFAICS, blk_mq_alloc_disk_for_queue() is broken.

[snip]

> can't be right - we might fail in blk_get_queue(), returning NULL with
> unchanged refcount, we might succeed and return the new gendisk that
> has consumed the extra reference grabbed by blk_get_queue() *OR*
> we might grab an extra reference, fail in __alloc_disk_node() and
> return NULL with refcount on q bumped.  No way for caller to tell these
> failure modes from each other...  The callers (both sd and sr) treat
> both as "no reference grabbed", i.e. leak the queue refcount if they
> fail past grabbing the queue.

Speaking of leaks, how can this
	q = blk_mq_init_queue(&sdev->host->tag_set);
	if (IS_ERR(q)) {
		/* release fn is set up in scsi_sysfs_device_initialise, so
		 * have to free and put manually here */
		put_device(&starget->dev);
		kfree(sdev);
		goto out;
	}
	kref_get(&sdev->host->tagset_refcnt);
	sdev->request_queue = q;
	q->queuedata = sdev;
	__scsi_init_queue(sdev->host, q);

	depth = sdev->host->cmd_per_lun ?: 1;

	/*
	 * Use .can_queue as budget map's depth because we have to
	 * support adjusting queue depth from sysfs. Meantime use
	 * default device queue depth to figure out sbitmap shift
	 * since we use this queue depth most of times.
	 */
	if (scsi_realloc_sdev_budget_map(sdev, depth)) {
		put_device(&starget->dev);
		kfree(sdev);
		goto out;
	}
	...
out:
        if (display_failure_msg)
                printk(ALLOC_FAILURE_MSG, __func__);
        return NULL;

in scsi_alloc_sdev() possibly avoid leaking sdev->request_queue on the
second failure exit?  AFAICS scsi_realloc_sdev_budget_map() will see
NULL in sdev->budget_map.map, attempt
        ret = sbitmap_init_node(&sdev->budget_map,
                                scsi_device_max_queue_depth(sdev),
                                new_shift, GFP_KERNEL,
                                sdev->request_queue->node, false, true);
and if that fails - return without having even looked at sdev->request_queue.
Then we drop startget->dev (which has no way to observe sdev or anything in
it) and kfree sdev, which leaves q the only place where we have the address
of queue.  And we don't look at q after that point...

Shouldn't we do blk_mq_destroy_queue()/blk_put_queue() on that failure
exit?