On Thu, 2016-06-02 at 16:42 +0800, Wei Fang wrote: > sas_ata_strategy_handler() adds the works of the ata error handler > to system_unbound_wq. This workqueue asynchronously runs work items, > so the ata error handler will be performed concurrently on different > CPUs. In this case, ->host_failed will be decreased simultaneously in > scsi_eh_finish_cmd() on different CPUs, and become abnormal. > > It will lead to permanently inequal between ->host_failed and > ->host_busy, and scsi error handler thread won't become running. > IO errors after that won't be handled forever. > > Since all scmds must have been handled in the strategy handle, just > remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy > after the strategy handle to fix this race. > > This fixes the problem introduced in > commit 50824d6c5657 ("[SCSI] libsas: async ata-eh"). > > Signed-off-by: Wei Fang <fangwei1@xxxxxxxxxx> > --- > Changes v1->v2: > - update Documentation/scsi/scsi_eh.txt about ->host_failed > Changes v2->v3: > - don't use atomic type, just zero ->host_failed after the strategy > handle Reviewed-by: James Bottomley <jejb@xxxxxxxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html