On Sun, 21 Sep 2008, James Bottomley wrote: > > For example, suppose a buggy device (without removable media) always > > replies with UNIT ATTENTION without making any forward progress. > > We'll just call scsi_requeue_command each time and get stuck. > > That's a separate bug from the current one ... fortunately one I don't > think we've actually seen manifest. > > But I'm still concerned about the possibility of getting stuck doing > > the same command or request over and over. Both structures have a > > "retries" field, but I'm not clear on how/where they get used. > > Block relies on the lower layers for retry ... it just transmits the > status, so we get to fix it. Okay -- I'll keep it in the back of my mind for later... > > To be honest, I don't know what sort of requests get marked as > > non-retryable in the block layer. Maybe you're right and we don't need > > to worry about them. > > They tend to be device mapper ones. Anything that wants a fast failure > to do path switch over for instance. If all the retry paths in scsi_io_completion jump to a common location, it will be easy to add the test there. > > I tested your simple fix, and it does indeed solve the problem of tasks > > hanging because of an uncompleted request. In view of Boaz's concerns, > > should this change be postponed until 2.6.27.stable so that it can get > > wider testing? > > We can ... I think it's safe enough though given it only affects > multiple transaction commands. The decision's yours. Let me know when and in which tree it is merged, so I can start writing some patches for a more-complete fix. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html