Le 28 octobre 2010 13:22, weasel@xxxxxxxx <weasel@xxxxxxxx> a écrit : > scsi_decide_disposition() looking at a command w/ DID_RESET returns > SUCCESS and then scsi_io_completion() unconditionally retries the > command. w/ our injector setup to corrupt alternating data frames i can > induce an 'infinite loop' in the case where the controller's reset works > and then the scsi READ(10) fails. > > anyway, i'd like to avoid this loop regardless of 'odd' controller > behavior so am asking about the two possible fixes i see and to see if > there's some other way. > > - make scsi_decide_disposition()'s handling of DID_RESET just use the > maybe_retry logic. > > however, (being new to this area) scsi_io_completion() calls > scsi_end_request() in case some data actually got read. i don't see that > a request w/ DID_RESET set can have data, but if it really can, then... > > - copy retry logic to scsi_io_completion() in the DID_RESET case. using > cmd->retries seems safe here since it won't ahve gotten updated in > scsi_decide_disposition(), but maybe a comment there might be nice. in the hopes that i can get a fix pushed upstream of us, i've attached patches for my two alternatives listed above. i realize that my error is now artificially induced, but it did startoff as a real error where something as simple as 'mount /dev/sdb1 /mnt/foo' went off into the weeds. yes, the controller can do better, but i think we already have the infrastructure in place to avoid this. \p
Attachment:
scsi_decide_disposition.diff
Description: Binary data
Attachment:
scsi_io_completion.diff
Description: Binary data