Re: [PATCH v4 05/10] blk-mq: Unregister debugfs attributes earlier

Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx> · Mon, 24 Apr 2017 17:34:46 +0000

On Mon, 2017-04-24 at 10:29 -0700, Omar Sandoval wrote:
> On Mon, Apr 24, 2017 at 10:26:15AM -0700, Omar Sandoval wrote:
> > On Mon, Apr 24, 2017 at 05:24:13PM +0000, Bart Van Assche wrote:
> > > On Mon, 2017-04-24 at 10:17 -0700, Omar Sandoval wrote:
> > > > On Mon, Apr 24, 2017 at 05:12:05PM +0000, Bart Van Assche wrote:
> > > > > On Mon, 2017-04-24 at 09:55 -0700, Omar Sandoval wrote:
> > > > > > On Fri, Apr 21, 2017 at 04:40:21PM -0700, Bart Van Assche wrote:
> > > > > > > @@ -566,6 +566,11 @@ void blk_cleanup_queue(struct request_queue *q)
> > > > > > >  	spin_lock_irq(lock);
> > > > > > >  	if (!q->mq_ops)
> > > > > > >  		__blk_drain_queue(q, true);
> > > > > > > +	spin_unlock_irq(lock);
> > > > > > > +
> > > > > > > +	blk_mq_debugfs_unregister_mq(q);
> > > > > > > +
> > > > > > > +	spin_lock_irq(lock);
> > > > > > >  	queue_flag_set(QUEUE_FLAG_DEAD, q);
> > > > > > >  	spin_unlock_irq(lock);
> > > > > > 
> > > > > > Do we actually have to hold the queue lock when we set QUEUE_FLAG_DEAD?
> > > > > 
> > > > > It's way easier to keep that spin_lock()/spin_unlock() pair than to analyze
> > > > > the block driver core and all block drivers to see whether or not any
> > > > > concurrent queue flag changes could occur.
> > > > 
> > > > Ah, I didn't realize that queue_flag_set() did a non-atomic set. I'm
> > > > wondering if anything bad could happen if something raced between when
> > > > we drop the lock and regrab it. Maybe just move the
> > > > blk_mq_debugfs_unregister_mq() before we grab the lock the first time
> > > > instead?
> > > 
> > > That would have the disadvantage that debugfs attributes would be unregistered
> > > before __blk_drain_queue() is called and hence that these debugfs attributes
> > > would not be available to debug hangs in queue draining for traditional block
> > > layer queues ...
> > 
> > True, this is probably fine, then.
> 
> Actually, if we drop this lock, then for non-mq, can't we end up with
> some I/O's sneaking in between when we drain the queue and mark it dead
> while the lock is dropped?

That's a good question but I don't think so. Queuing new I/O after a queue
has been marked as "dying" is not allowed. For both blk-sq and blk-mq queues
blk_get_request() returns -ENODEV if that function is called after the "dying"
flag has been set.

Bart.