On Tue, 2014-08-19 at 10:30 -0500, scameron@xxxxxxxxxxxxxxxxxx wrote: > > Is it permitted for a LLD to call scsi_done() for a command which > has been or is attempting to be aborted via the LLD abort error handler > function? > > I am looking at the scsi mid layer code, and I'm a bit confused. Hannes asked a similar question; does this help: http://marc.info/?l=linux-scsi&m=140267838320488 James > There seem to be a few ways the LLD abort handler function can > get called. > > One path: > > scsi_times_out() calls scsi_abort_command() which queues > cmd->abort_work in 10ms. This will cause scmd_eh_abort_handler() > to be called, which can call scsi_try_to_abort_cmd() which > calls the LLD abort error handler function. If the LLD abort > error handler function returns SUCCESS, then scmd_eh_abort_handler() > may requeue the command via scsi_queue_insert() re-using the same > scsi_cmnd pointer. Or, it may call scsi_finish_cmnd() on that > scsi_cmnd. > > Both of those actions, requeueing the same scsi command, or calling > scsi_finish_cmnd(), seem incompatible with the LLD also having > called scsi_done() on the command. > > I think, scsi_done eventually frees the scsi_cmnd... > > scsi_done() -> blk_complete_request -> __blk_complete_request -> > -> raise_softirq_irqoff -> ??? -> scsi_softirq_done() -> > -> scsi_finish_cmnd() -> scsi_io_completion -> scsi_end_request() > -> scsi_mq_uninit_cmd() ... etc. > > It seems like either calling scsi_finish_cmnd() too many times (once via > LLD calling scsi_done, and once via scmd_eh_abort_handler calling > scsi_finish_cmnd()), or calling scsi_done() via LLD concurrently with > re-using that same scsi command (via scmd_eh_abort_handler calling > scsi_queue_insert()) would both be bad things. Correct? > > So, without completely understanding it, it seems like if the LLD > abort handler is going to return SUCCESS, it must be sure that it > hasn't already called scsi_done() on the command. Or to put it > another way, the LLD must make sure there is not a race between > the abort handler and the interrupt handler, and the instant > the LLD abort handler is called for a command, calling scsi_done() > for that command is no longer an option for the LLD. But, I do > not see how to synchronize that. > > I hope I'm missing something here, because I am having a hard time > seeing how to eliminate the race between the LLD abort handler and > command completion in the LLD, esp. since now (due to my hardware's > specific weirdness) I may in the interrupt handler requeue the command > internally in a workqueue within the LLD for submission to HW a 2nd > time before completion back to the mid layer under some circumstances > (e.g. fast path failed, try slower but more robust RAID stack path.) > > But, then there is this comment attached to scsi_times_out() which > makes me feel better: > > * Notes: > * We do not need to lock this. There is the potential for a race > * only in that the normal completion handling might run, but if the > * normal completion function determines that the timer has already > * fired, then it mustn't do anything. > */ > > So, perhaps that is the answer I'm looking for. If the comment is > correct and my understanding of it is correct, then the LLD *is* free > to call scsi_done() even though it races with the LLD abort handler? > > I'm not finding the code that implements what that comment claims though. > Is it in the block layer code? > > -- steve > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html