On Tue, Jul 05, 2011 at 10:48:00PM +0530, Desai, Kashyap wrote: > > -----Original Message----- > > From: Matthew Wilcox [mailto:willy@xxxxxxxxxxxxxxx] > > Sent: Tuesday, July 05, 2011 8:52 PM > > To: Desai, Kashyap > > Cc: Matthew Wilcox; linux-scsi@xxxxxxxxxxxxxxx; DL-MPT Fusion Linux; > > nab@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH] mpt2sas: Remove queuecommand wrapper > > void scsi_eh_wakeup(struct Scsi_Host *shost) > > { > > if (shost->host_busy == shost->host_failed) { > > I think you are right here. > > I did below experiment. > Since now host_lock is removed from dispatch() callback. > I added sleep() inside scsih_qcmd().. Just to mimic preemption. And called Host reset using > "Sg_reset -h" at the sametime when driver is sleeping inside qcmd(). I see the kernel crash due to invalid scsi command pointer access after driver comes out from sleep(). OK, but I don't think we've introduced a new race here, just expanded the window. If you follow the path down from userspace, we get to scsi_nonblockable_ioctl() (doesn't take the host_lock), then scsi_reset_provider (which takes and releases the host_lock to set tmf_in_progress), then scsi_try_host_reset() which calls eh_host_reset_handler() without holding the host_lock. So it's entirely possible to hit the same race on an SMP machine already as there's nothing to synchronise the eh_host_reset_handler call with queuecommand. I think someone who uses sg_reset ought to be aware of the possibility that they can trash their system. Doug, what do you think? > As you said, due to actual Error handling will not wake up, but if user/customer do testing using sg_tools > They can kick off Host reset on demand. > > What is your input on my above test case/observation ? > > > trace_scsi_eh_wakeup(shost); > > wake_up_process(shost->ehandler); > > > > So the SCSI error handler doesn't run until host_busy == host_failed. > > host_busy is incremented in scsi_request_fn() (before scsi_dispatch_cmd > > is called). host_failed is incremented in scsi_eh_scmd_add(). So after > > a command has been issued, it prevents the EH from running until it > > too fails. That means queuecommand cannot be running at the same time > > as the EH. > > > > That's documented in Documentation/scsi/scsi_eh.txt section [1-3]. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html