> -----Original Message----- > From: Menny_Hamburger@xxxxxxxx [mailto:Menny_Hamburger@xxxxxxxx] > Sent: Tuesday, February 01, 2011 2:03 AM > To: Moger, Babu > Cc: dm-devel@xxxxxxxxxx > Subject: RE: [PATCH]: scsi_dh_rdac: fix BUG_ON in get_rdac_data > from within send_mode_select > > Hi Babu, > > I ran the tests with my earlier patch intact. > > I reviewed your patches and I can see how they solve the same issue. > I also acknowledge that we have two different approaches here: > 1) The patches you suggest solves the issue in a wider range (for all SCSI > device handlers). > The patches are quite complex (several pieces of code) so the potential > for bugs is higher, and they will require a wider range of testing. > 2) The patch I submitted solves this issue only in the RDAC hardware > driver. > Since we do have time to market issues involved, if the patch has no > logical flaws and solves the problem locally I think I will apply it > on our side. > > I would like to suggest these action items: > 1) After you modify the patches to fit the current code, I will test the > patches here with our RDAC environment that seems quite adequate for > detecting > these problems. > 2) Please review my patch and see if it has any logical flaws. I tested > the patch in our test scenario a few hundreds of times and it > I seems to be doing what it's supposed to. > > Best Regards, > Menny > This panic comes from get_rdac_data, which is called from within the > send_mode_select code. > My analysis picked up the following problem: > The RDAC detach code nullifies the scsi_dh_data before flushing the mode > select workqueue, and as result a pending send_mode_select will BUG_ON in > get_rdac_data. > The following patch (tested over RHEL54) flushes the workqueue before > setting scsi_dh_data to NULL: Your code looks bit different than what I have. I don't have RHEL5.4. I looked at RHEL 5.3. I don't see get_rdac_data in send_mode_select. However, I got the logic behind your code. Your logic helps in cases where get_rdac_data is used in the same context as mode select submission. But, if you use get_rdac_data in other places then this code does not help. I see get_rdac_data in rdac_activate which is used before send_mode_select. > diff -r -U 2 a/drivers/scsi/device_handler/scsi_dh_rdac.c > b/drivers/scsi/device_handler/scsi_dh_rdac.c > --- a/drivers/scsi/device_handler/scsi_dh_rdac.c 2011-01-30 > 11:02:31.271426000 +0200 > +++ b/drivers/scsi/device_handler/scsi_dh_rdac.c 2011-01-30 > 11:02:31.327069000 +0200 > @@ -853,8 +853,22 @@ > spin_lock_irqsave(sdev->request_queue->queue_lock, flags); > scsi_dh_data = retrieve_scsi_dh_data(sdev); > - store_scsi_dh_data(sdev, NULL); > spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags); > > h = (struct rdac_dh_data *) scsi_dh_data->buf; > + if (h->ctlr) { > + int flush; > + > + spin_lock(&h->ctlr->ms_lock); > + flush = (h->ctlr->ms_sdev == sdev); > + spin_unlock(&h->ctlr->ms_lock); > + > + if (flush) > + flush_workqueue(kmpath_rdacd); > + } > + > + spin_lock_irqsave(sdev->request_queue->queue_lock, flags); > + store_scsi_dh_data(sdev, NULL); > + spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags); > + > if (h->ctlr) > kref_put(&h->ctlr->kref, release_controller); > > > > The patch ensures that we run flush only when detaching the device the > "owns" the send_mode_select (ms_sdev). > > Menny > > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel