On Sat, 20 Sep 2008, James Bottomley wrote: > > Except that those places don't do the blk_noretry_request test. Should > > they? And even if they do, what's to prevent an infinite retry loop? > > Well, that's another oddity ... the retries occur at a lower level > (scsi_decide_disposition) ... there's no reason to make that check if > all the error paths complete everything. Are you referring to the NEEDS_RETRY case in scsi_softirq_done? Yet another looping mechanism! This one requeues the scsi_cmnd, whereas I was talking about requeuing the request. However either one could in theory lead to unending retries. For example, suppose a buggy device (without removable media) always replies with UNIT ATTENTION without making any forward progress. We'll just call scsi_requeue_command each time and get stuck. > Now if we find an error where > we should be moving on to the next transaction, then we'd need to do > that test. However, I think I've demonstrated so far that there isn't > one. I also don't think we want to make the test for the obviously > retryable conditions like UNIT_ATTENTION because that will cause paths > to flip on irrelevant AEN conditions. To be honest, I don't know what sort of requests get marked as non-retryable in the block layer. Maybe you're right and we don't need to worry about them. But I'm still concerned about the possibility of getting stuck doing the same command or request over and over. Both structures have a "retries" field, but I'm not clear on how/where they get used. > > I take your point. Which leads to the question: Why was the code ever > > calling scsi_end_request(cmd, -EIO, this_count, 1) in the first place? > > Apparently all of these paths should be setting the last argument to 0, > > always. > > Yes, that was the question I asked in my first reply (where I said the > requeue looks superfluous). I think the answer is that there's no > point ... that's why I advocated simply eliminating scsi_end_request() > in favour of either scsi_requeue_request/blk_run_queue or You mean just scsi_requeue_command? There is no scsi_requeue_request. > end_dequeued_request(). Yes. Those new "goto"s in scsi_io_completion could be rearranged to end up calling either scsi_requeue_command or end_dequeued_request. That still leaves the initial call to scsi_end_request; I guess this is where you would move that routine's contents into scsi_io_completion. > So are you happy with the simple fix proposal ... I think rearranging > this code needs more debate and discussion. I tested your simple fix, and it does indeed solve the problem of tasks hanging because of an uncompleted request. In view of Boaz's concerns, should this change be postponed until 2.6.27.stable so that it can get wider testing? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html