On Fri, 2010-09-24 at 16:10 -0500, Brian King wrote: > On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote: > > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote: > >> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote: > >>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd > >>> int scsi_dispatch_cmd(struct scsi_cmnd *cmd) > >>> { > >>> struct Scsi_Host *host = cmd->device->host; > >>> - unsigned long flags = 0; > >>> unsigned long timeout; > >>> int rtn = 0; > >>> > >>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd) > >>> scsi_done(cmd); > >>> goto out; > >>> } > >>> - > >>> - spin_lock_irqsave(host->host_lock, flags); > >>> /* > >>> - * AK: unlikely race here: for some reason the timer could > >>> - * expire before the serial number is set up below. > >>> - * > >>> - * TODO: kill serial or move to blk layer > >>> + * Note that scsi_cmd_get_serial() used to be called here, but > >>> + * now we expect the legacy SCSI LLDs that actually need this > >>> + * to call it directly within their SHT->queuecommand() caller. > >>> */ > >>> - scsi_cmd_get_serial(host, cmd); > >>> > >>> if (unlikely(host->shost_state == SHOST_DEL)) { > >>> cmd->result = (DID_NO_CONNECT << 16); > >>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd) > >>> trace_scsi_dispatch_cmd_start(cmd); > >>> rtn = host->hostt->queuecommand(cmd, scsi_done); > >>> } > >>> - spin_unlock_irqrestore(host->host_lock, flags); > >>> + > >>> if (rtn) { > >>> trace_scsi_dispatch_cmd_error(cmd, rtn); > >>> if (rtn != SCSI_MLQUEUE_DEVICE_BUSY && > >> > >> Are you planning a future revision that moves the acquiring of the host lock > >> into the LLDD's queuecommand for all the other drivers you don't currently > >> touch in this patch set? > >> > > > > Hi Brian, > > > > I was under the impression that this would be unnecessary for the vast > > majority of existing LLD drivers, but if you are aware of specific LLDs > > that would still need the struct Scsi_Host->host_lock held in their > > SHT->queuecommand() for whaterver reason please let me know and I would > > be happy to include this into an RFCv4. > > I would think that most drivers might have issues without some pretty careful > auditing. When Christoph did this for the EH handlers, the first step was to > simply move acquiring the host lock into the LLDs. That way we can optimize > drivers one at a time after ensuring they can run lockless in their queuecommand > handler. > > A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr): > > * Some drivers will do list manipulation for resources needed to send commands. If > done lockless, this could result in list corruption with multiple readers/writers. > > * Some drivers check the state of the hardware before sending a command. Failing to > do this when the hardware is being reset may result in nasty things like PCI bus > errors or even sending a command to the wrong device. > Indeed, I can very much see some older LLDs making these types of assumptions. > These are all the sorts of errors that will be very difficult to hit but have > pretty bad consequence when they are hit. > I think pretty bad would be an under-statement when running into either of the above two items in ancient LLD code. In that case, I will re-spin a v4 series that contains a legacy host_lock held w/ SHT->queuecomand() for all of the "historically high host_lock optimized in ->queuecommand()" LLDs that are in RFCv3, and include the other specific ones (namely mpt/fusion and mpt2sas) that we know are safe to drop host_lock. Many thanks for your invaluable input on some of the non-obvious items at play here. Best, --nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html