Re: block: Check that queue is alive in blk_insert_cloned_request()

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Tue, 12 Jul 2011 11:24:54 -0400 (EDT)

On Mon, 11 Jul 2011, Vivek Goyal wrote:

> > > There's still the issue that Stefan Richter pointed out: The test for a 
> > > dead queue must be made _after_ acquiring the queue lock, not _before_.
> > 
> > Yes, quite important.
> > 
> > Jens, can you tweak the patch or should Roland send a v2?
> 
> I do not think that we should do queue dead check after taking a spinlock.
> The reason being that there are life time issues of two objects.
> 
> - Validity of request queue pointer
> - Validity of q->spin_lock pointer
> 
> If the dm has taken the reference to the request queue in the beginning
> then it can be sure request queue pointer is valid. But spin_lock might
> be coming from driver and might be in one of driver allocated structures.
> So it might happen that driver has called blk_cleanup_queue() and freed
> up structures which contained the spin lock.

Surely this is a bug in the design of the block layer?

> So if queue is not dead, we know that q->spin_lock is valid. I think
> only race present here is that whole operation is not atomic. First
> we check for queue not dead flag and then go on to acquire request
> queue lock. So this leaves a small window for race. I think I have
> seen other code written in such manner (__generic_make_request()). So
> it proably reasonably safe to do here too.

"Probably reasonably safe" = "unsafe".  The fact that it will usually
work out okay means that when it does fail, it will be very difficult
to track down.

It needs to be fixed _now_, when people are aware of the issue.  Not 
five years from now, when everybody has forgotten about it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html