Alan Stern wrote: > On Sun, 19 Oct 2008, Boaz Harrosh wrote: > >> Alan Stern wrote: >>> We do have a problem with infinite retry loops. I'm not sure which >>> kernels are affected, but there's a good chance 2.6.27 is and an >>> excellent chance that 2.6.28-rc1 will be. > ... > >> Do you have the scsi_io_completion patchset on a public git somewhere? >> I would like to re-test them and review them again. > > They aren't in any git repositories, so I am including the two patches > as attachments to this message. The first patch changes the failure > analysis logic in scsi_io_completion() along the lines suggested by > James, and the second gets rid of scsi_end_request(). They are based > roughly on 2.6.27, so they might not apply cleanly after the merge > window. > Thanks. I will apply them in my trees and run with them for a while. Once the merge window is over, if you resend them (Please do) I will send my Review-by: (I hope I will review them by then) > Neither patch addresses the infinite-retry problem; I wanted to keep > the issues separate. > >> Did you try them with above problem and do they solve the issue? > > At this point I can't remember exactly which combinations I tried! :-) > However I don't think these patches will have any effect on the retry > loop. > >> Also have you looked farther into the retries/timeout issues from >> block layer? > > Not yet. I'm waiting for 2.6.28-rc1 to appear. > I would just want to make a comment, for your consideration at this stage. Once you get to re-examine all this. Users of SCSI devices like file systems, /dev/sg, or any other source, do not directly see scsi-devices per-Ce. Even scsi_execute() will just issue blk_execute_req commands. At this level, of block-request users, there are two user-parameters: @retries and @timeout. What ever the semantics are of: a. MAX_TOTAL_TIME=(@retries * @timeout) or b. MAX_TOTAL_TIME=(@timeout or @retries which ever is shorter) The SCSI-ml should implement that policy. So at the end of the day if an fs sends a request it should take at most MAX_TOTAL_TIME. Even if a brain-dead device short-circuits the scsi logic, the time-frame/retries at the block level should be kept, no matter the reason. Which for me means - At no condition should a transport/target see more then @retries of the same command, and the MAX_TOTAL_TIME until a user gets a return code, success/failure, is some constant. It seems to me that current scsi-block-device breaks both assumptions. What does it do with @retries and @timeout is beyond me. > Alan Stern Again thanks for looking into this. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html