On 03/08/2016 08:58 PM, ygardi@xxxxxxxxxxxxxx wrote: >> On 03/08/2016 02:01 PM, Hannes Reinecke wrote: >>> On 03/08/2016 01:35 PM, Yaniv Gardi wrote: >>>> A race condition exists between request requeueing and scsi layer >>>> error handling: >>>> When UFS driver queuecommand returns a busy status for a request, >>>> it will be requeued and its tag will be freed and set to -1. >>>> At the same time it is possible that the request will timeout and >>>> scsi layer will start error handling for it. The scsi layer reuses >>>> the request and its tag to send error related commands to the device, >>>> however its tag is no longer valid. >>>> As this request was never really sent to the device, there is no >>>> point to start error handling with the device. >>>> Implement the scsi error handling timeout callback and bypass SCSI >>>> error handling for request that were not actually sent to the device. >>>> For such requests simply reset the block layer timer. Otherwise, let >>>> SCSI layer perform the usual error handling. >>>> >>>> Reviewed-by: Dolev Raviv <draviv@xxxxxxxxxxxxxx> >>>> Signed-off-by: Gilad Broner <gbroner@xxxxxxxxxxxxxx> >>>> Signed-off-by: Yaniv Gardi <ygardi@xxxxxxxxxxxxxx> >>>> >>>> --- >>>> drivers/scsi/ufs/ufshcd.c | 36 ++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 36 insertions(+) >>>> >>> Having a timeout handler is always a good idea, even though this >>> doesn't do anything here. >>> Are we sure that the requests will return eventually? >>> Does the UFS spec provide for a command abort? >>> >> In fact, looking at the UFS spec there _is_ a command abort. >> I would recommend implementing a task management request UPIO with >> type 'ABORT TASK' here for any task found to be pending. >> In the end, you might run into a _valid_ timeout, at which point you >> really want to abort the command... >> > > but this is not what we'd like to achieve. > we don't want to abort a task that was not even dispatched to the UFS driver. > in those cases we need to re-queue the request and reset the timer. > Fully understood. > Hannes, i appreciate your time, but I really don't understand why you > insist on coming up with suggestions, when we already implemented one that > is working. more over, your solution doesn't fix the race condition which is the > reason for this patch. > as i don't have HW to test anything at the moment, I think it's better to > stick with this solution that also fix the BUG and also was verified and > tested. > Ah. Didn't know that. I was under the impression that you _had_ the hardware available. If not then of course it's not easy to verify anything. So, all things considered: Reviewed-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html