On Thu, 11 Aug 2011, James Bottomley wrote: > > > Well, it's just hiding the problem. The essential problem is that only > > > block has the correctly refcounted knowledge to know the last release of > > > the queue reference. Until that time, the holder of the reference can > > > use the queue regardless of whether blk_cleanup_queue() has been called. > > > This is the race you complain about since use of the queue involves the > > > lock which should be guarded by QUEUE_DEAD checks. > > > > > > This is essentially unfixable with function calls. The only way to fix > > > it is to have a callback model for freeing the external lock. > > > > Assuming the queue is associated with a device, the queue could take a > > reference to the device, dropping that reference when the queue is > > freed. Then the lock could safely be freed at the same time as the > > device. > > If that assumption is correct, there's no point refcounting the queue at > all because its use is entirely subordinated to the lifecycle of the > associated device. That's true. Why wasn't it done that way originally? Are there queues that aren't associated with devices? > Plus all the wittering about my previous patch is > pointless, because blk_cleanup_queue() has to do the final put of the > queue in the lock free path (otherwise the assumption is violated). > > However, much as I'd like to accept this rosy view, the original oops > that started all of this in 2.6.38 was someone caught something with a > reference to a SCSI queue after the device release function had been > called. Not according to your commit log. You wrote that the reference was taken after scsi_remove_device() had been called -- but the device release function is scsi_device_dev_release_usercontext(). Alan Stern -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel