On Wed, 26 Mar 2008 21:18:02 -0500 Mike Christie <michaelc@xxxxxxxxxxx> wrote: > Mike Christie wrote: > > FUJITA Tomonori wrote: > >> On Wed, 26 Mar 2008 07:36:26 -0700 > >> James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> > >>> On Wed, 2008-03-26 at 23:22 +0900, FUJITA Tomonori wrote: > >>>> On Sat, 22 Mar 2008 11:06:00 -0500 > >>>> James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > >>>> > >>>>> On Tue, 2008-03-11 at 00:36 -0500, Mike Christie wrote: > >>>>>> Mike Christie wrote: > >>>>>>> Pete Wyckoff wrote: > >>>>>>>> I think this used not to happen; not sure. But I changed two > >>>>>>>> things > >>>>>>> This most likely did not happen before 2.6.25-rc* or it broke in > >>>>>>> slightly different ways, because iscsi used to try and do > >>>>>>> > >>>>>>> echo 1 > /sys/block/sdX/device/delete > >>>>>>> > >>>>>>> from userspace instead of calling scsi_remove_target from the > >>>>>>> kernel. > >>>>>>> > >>>>>>> As you know around 2.6.21, the behavior of doing the echo to the > >>>>>>> delete file changed due to a driver model and scsi change and > >>>>>>> that broke the iscsi tools. The iscsi tools userspace removal was > >>>>>>> sort of hack in the first place and was racey, so we switched to > >>>>>>> removing devices/target like the FC class. > >>>>>>> > >>>>>>> > >>>>>>>> lately. 2.6.25-rc1 to -rc4 and fedora 8 iscsi-initiator-utils > >>>>>>>> (865) to > >>>>>>>> fedora devel (868). Bidi and varlen patches always too. > >>>>>>>> > >>>>>>>> I'll follow with some more variations on this theme. Looks like > >>>>>>>> bsg > >>>>>>>> needs to protect more carefully against the device going away. Any > >>>>>>>> ideas how best to do this? What was the approach in sg? > >>>>>>>> > >>>>>>> I think sg is broken in similar ways. The iser guys have some > >>>>>>> tests cases that have broken sg while IO is outstanding. I am > >>>>>>> ccing Erez. > >>>>>> Actually one of the problems looks a little different than some of > >>>>>> the problems hit with sg and are caused because we remove the bsg > >>>>>> device too soon. I think we want to wait until all the references > >>>>>> from the commands/requests are released. The attached patch > >>>>>> (untested) moves the bsg unreg call to the scsi device release fn. > >>>>> Well, this fix is now upstream. However, it's causing all our > >>>>> scsi_devices never to get released, which is a serious regression. > >>>>> We're also doing spurious bsg_unregister_queue() for things that never > >>>>> actually registered one (all scan devices that return DID_NO_CONNECT), > >>>>> but bsg doesn't seem to be complaining about this. > >>>>> > >>>>> The essence of the problem is that bsg_register_queue() takes a ref to > >>>>> the sdev_gendev, so you can't move bsg_unregister_queue() into the > >>>>> release function because nothing ever puts bsg's device ref and so > >>>>> release is never called. > >>>>> > >>>>> Options for fixing this before 2.6.25 are > >>>>> > >>>>> 1. revert the patch > >>>>> 2. Do an additional put for the bsg reference in > >>>>> __scsi_remove_device (patch below). It's nasty but it > >>>>> preserves > >>>>> the semantics and does what you want > >>>> After some investigation, this patch doesn't fix the bug that Pete > >>>> reported (I'll send a new patch shortly). > >>>> > >>>> Can you revert the commit 4b6f5b3a993cbe34b4280f252bccc76967c185c8 > >>>> instead of merging this? > >>> Sure ... I didn't like the hack either. As long as iSCSI is fine with > >>> the reversion it's the quickest way to fix the problem. > >> > >> How about this? With the commit reversion, I confirmed that this patch > >> fixes the first bug that Pete reported: > >> > >> http://marc.info/?l=linux-scsi&m=120508166505141&w=2 > >> > >> I suspect that this could fix the rest too. > >> > >> = > >> From: FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> > >> Subject: [PATCH] bsg: takes a ref to struct device in fops->open > >> > >> bsg_register_queue() takes a ref to struct device that a caller > >> passes. For example, it takes a ref to the sdev_gendev with scsi > >> devices. However, bsg doesn't takes a ref to it in fops->open. So > >> while an application opens a bsg device, the scsi device that the bsg > >> device holds can go away (bsg also takes a ref to a queue, but it > >> doesn't prevent the device from going away). > >> > >> With this, bsg takes a ref to struct device in fops->open and frees it > >> in fops->release. > >> > >> Note that bsg doesn't need to takes a ref to a queue for SCSI devices > >> at least. I think that it would be better to remove the code but I let > >> it alone for now. > >> > > > > Why does bsg_add_device do kobject_get instead of blk_get_queue? > > > > It seems like if we added a blk_qet_queue when we opened the device and > > a blk_put_queue when bsg_release is called we could remove the > > get/put_device calls. I am not sure if that is cleaner or not. I was > > just thinking that bsg goes from bsg->request_queue->scsi_device so > > maybe it should not worry about the device. > > Doh, I guess we sort of do this today. It looks like the blk_execute > functions are bypassing the QUEUE_FLAG_DEAD checks, so > scsi_device_dev_release_usercontext could have called scsi_free_queue, > but if bsg calls blk_execute at the same time then it could stick a > request into the queue and end up calling the scsi_request_fn (maybe > this is what happens in #2 and when scsi_request_fn calls get_device we > get that weird error since the refcount on the device is zero). With the patch, the device is still hold, so a command is rejected properly by scsi_prep_state_check. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html