On Mon, Nov 14, 2022 at 05:26:32AM +0100, Christoph Hellwig wrote: > Hi Jens, > > this series cleans up the registration of the "queue/" kobject, and given > untangles it from the request_queue refcounting. > > Changes since v1: > - also change the blk_crypto_sysfs_unregister prototype > - add two patches to fix the error handling in blk_register_queue Umm... Do we ever want access to queue parameters of the stuff that has a queue, but no associated gendisk? SCSI tape, for example... Re refcounting: AFAICS, blk_mq_alloc_disk_for_queue() is broken. __alloc_disk_node() consumes queue reference (and stuffs it into gendisk->queue) on success; on failure it leaves the reference alone. E.g. this struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata, struct lock_class_key *lkclass) { struct request_queue *q; struct gendisk *disk; q = blk_mq_init_queue_data(set, queuedata); if (IS_ERR(q)) return ERR_CAST(q); // we hold the initial reference to q disk = __alloc_disk_node(q, set->numa_node, lkclass); if (!disk) { // __alloc_disk_node() failed, we are still holding q blk_mq_destroy_queue(q); blk_put_queue(q); // reference dropped return ERR_PTR(-ENOMEM); } set_bit(GD_OWNS_QUEUE, &disk->state); return disk; // ... and on success, the reference is consumed by disk, which is returned to caller } is fine; however, this struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q, struct lock_class_key *lkclass) { if (!blk_get_queue(q)) return NULL; return __alloc_disk_node(q, NUMA_NO_NODE, lkclass); } can't be right - we might fail in blk_get_queue(), returning NULL with unchanged refcount, we might succeed and return the new gendisk that has consumed the extra reference grabbed by blk_get_queue() *OR* we might grab an extra reference, fail in __alloc_disk_node() and return NULL with refcount on q bumped. No way for caller to tell these failure modes from each other... The callers (both sd and sr) treat both as "no reference grabbed", i.e. leak the queue refcount if they fail past grabbing the queue. Looks like we should drop the queue if __alloc_disk_node() fails. As in struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q, struct lock_class_key *lkclass) { struct gendisk *disk; if (!blk_get_queue(q)) return NULL; disk = __alloc_disk_node(q, NUMA_NO_NODE, lkclass); if (!disk) blk_put_queue(q); return disk; } Objections?