Bart Van Assche <bvanassche@xxxxxxx> writes: > On 2020-04-05 18:27, Eric Sandeen wrote: >> The thing I can't figure out from reading the change log is >> >> 1) what the root cause of the problem is, and >> 2) how this patch fixes it? > > I think that the root cause is that do_blk_trace_setup() uses > debugfs_lookup() and that debugfs_lookup() may return a pointer > associated with a previous incarnation of the block device. That's correct, the debugfs_lookup() can find a previous incarnation's dir of the same name which is about to get removed from a not yet schedule work. I.e. something like the following is possible: LOOP_CTL_DEL(loop0) /* schedule __blk_release_queue() work_struct */ LOOP_CTL_ADD(loop0) /* debugfs_create_dir() from * blk_mq_debugfs_register() fails with EEXIST */ BLKTRACE_SETUP(loop0) /* debugfs_lookup() finds the directory about to * get deleted and blktrace files will be created * thereunder. */ The work_struct gets scheduled and the debugfs dir debugfs_remove()ed recursively, which includes the blktrace files just created. blktrace's dentry pointers are now dangling and there will be a UAF when it attempts to delete those again. Luis' patch [2/3] fixes the issue of the debugfs_lookup() from blk_mq_debugfs_register() potentially returning an existing directory associated with a previous block device incarnation of the same name and thus, fixes the UAF. However, the problem that the debugfs_create_dir() from blk_mq_debugfs_register() in a sequence of LOOP_CTL_DEL(loop0) LOOP_CTL_ADD(loop0) could silently fail still remains. The RFC patch [3/3] from Luis attempts to address this issue by folding the delayed __blk_release_queue() work back into blk_release_queue(), the release handler associated with the queue kobject and executed from the final blk_queue_put(). However, that's still no full solution, because the kobject release handler can run asynchronously, long after blk_unregister_queue() has returned (c.f. also CONFIG_DEBUG_KOBJECT_RELEASE). Note that I proposed this change here (internally) as a potential cleanup, because I missed the kobject_del() from blk_unregister_queue() and *wrongly* concluded that blk_queue_put() must be allowed to sleep nowadays. However, that kobject_del() is in place and moreover, the analysis requested by Bart (c.f. [1] in this thread) revealed that there are indeed a couple of sites calling blk_queue_put() from atomic context. So I'd suggest to drop patch [3/3] from this series and modify this patch [2/3] here to move the blk_q_debugfs_unregister(q) invocation from __blk_release_queue() to blk_unregister_queue() instead. > Additionally, I think the following changes fix that problem by using > q->debugfs_dir in the blktrace code instead of debugfs_lookup(): That would fix the UAF, but !queue_is_mq() queues wouldn't get a debugfs directory created for them by blktrace anymore? Thanks, Nicolai [1] https://lkml.kernel.org/r/87o8saj62m.fsf@xxxxxxx > [ ... ] > --- a/kernel/trace/blktrace.c > +++ b/kernel/trace/blktrace.c > @@ -311,7 +311,6 @@ static void blk_trace_free(struct blk_trace *bt) > debugfs_remove(bt->msg_file); > debugfs_remove(bt->dropped_file); > relay_close(bt->rchan); > - debugfs_remove(bt->dir); > free_percpu(bt->sequence); > free_percpu(bt->msg_data); > kfree(bt); > [ ... ] > @@ -509,21 +510,19 @@ static int do_blk_trace_setup(struct request_queue > *q, char *name, dev_t dev, > > ret = -ENOENT; > > - dir = debugfs_lookup(buts->name, blk_debugfs_root); > - if (!dir) > - bt->dir = dir = debugfs_create_dir(buts->name, blk_debugfs_root); > - > bt->dev = dev; > atomic_set(&bt->dropped, 0); > INIT_LIST_HEAD(&bt->running_list); > > ret = -EIO; > - bt->dropped_file = debugfs_create_file("dropped", 0444, dir, bt, > + bt->dropped_file = debugfs_create_file("dropped", 0444, > + q->debugfs_dir, bt, > &blk_dropped_fops); > [ ... ] > > Bart. -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg), GF: Felix Imendörffer