Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Thu, 16 Apr 2020 01:12:47 +0000

On Wed, Apr 15, 2020 at 07:18:22AM -0700, Bart Van Assche wrote:
> On 2020-04-15 05:34, Luis Chamberlain wrote:
> > On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote:
> >> Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
> >> blk_queue_dying after getting the reference and undo it if the queue is
> >> indeeed dying?
> > 
> > Yes that race should be possible:
> > 
> > bool blk_get_queue(struct request_queue *q)                                     
> > {                                                                               
> > 	if (likely(!blk_queue_dying(q))) {
> >        ----------> we can get the queue to go dying here <---------
> > 		__blk_get_queue(q);
> > 		return true;
> > 	}                                                                       
> > 
> > 	return false;
> > }                                                                               
> > EXPORT_SYMBOL(blk_get_queue);
> > 
> > I'll pile up a fix. I've also considered doing a full review of callers
> > outside of the core block layer using it, and maybe just unexporting
> > this. It was originally exported due to commit d86e0e83b ("block: export
> > blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> > respective fix. I suspec that using bdgrab()/bdput() seems more likely
> > what drivers should be using. That would allow us to keep this
> > functionality internal.
> 
> blk_get_queue() prevents concurrent freeing of struct request_queue but
> does not prevent concurrent blk_cleanup_queue() calls.

Wouldn't concurrent blk_cleanup_queue() calls be a bug? If so should
I make it clear that it would be or simply prevent it?

> Callers of
> blk_get_queue() may encounter a change of the queue state from normal
> into dying any time during the blk_get_queue() call or after
> blk_get_queue() has finished. Maybe I'm overlooking something but I
> doubt that modifying blk_get_queue() will help.

Good point, to fix that race described by Christoph we'd have to take
into consideration refcounts of the request_queue to prevent queues from
changing state to dying if the refcount is > 1, however that'd also
would  mean not allowing the request_queue from ever dying.

One way we could resolve this could be to to keep track of a
quiesce/dying request, then at that point prevent blk_get_queue() from
allowing increments, and once the refcount is down to 1, flip the switch
to dying.

  Luis