On Thu, Oct 28, 2010 at 2:18 AM, Mike Christie <michaelc@xxxxxxxxxxx> wrote: > On 10/26/2010 06:09 PM, James Bottomley wrote: >> >> On Tue, 2010-10-26 at 18:07 -0500, Mike Christie wrote: >>> >>> On 10/25/2010 03:53 PM, James Bottomley wrote: >>>> >>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c >>>> index 1de30eb..484b128 100644 >>>> --- a/drivers/scsi/scsi_error.c >>>> +++ b/drivers/scsi/scsi_error.c >>>> @@ -1156,51 +1156,40 @@ static int scsi_eh_target_reset(struct Scsi_Host >>>> *shost, >>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âstruct list_head *work_q, >>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âstruct list_head *done_q) >>>> Â { >>>> - Â Â Â struct scsi_cmnd *scmd, *tgtr_scmd, *next; >>>> - Â Â Â unsigned int id = 0; >>>> - Â Â Â int rtn; >>>> + Â Â Â LIST_HEAD(tmp_list); >>>> >>>> - Â Â Â do { >>>> - Â Â Â Â Â Â Â tgtr_scmd = NULL; >>>> - Â Â Â Â Â Â Â list_for_each_entry(scmd, work_q, eh_entry) { >>>> - Â Â Â Â Â Â Â Â Â Â Â if (id == scmd_id(scmd)) { >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tgtr_scmd = scmd; >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â break; >>>> - Â Â Â Â Â Â Â Â Â Â Â } >>>> - Â Â Â Â Â Â Â } >>>> - Â Â Â Â Â Â Â if (!tgtr_scmd) { >>>> - Â Â Â Â Â Â Â Â Â Â Â /* not one exactly equal; find the next highest >>>> */ >>>> - Â Â Â Â Â Â Â Â Â Â Â list_for_each_entry(scmd, work_q, eh_entry) { >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (scmd_id(scmd)> Â id&& >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â (!tgtr_scmd || >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âscmd_id(tgtr_scmd)> >>>> scmd_id(scmd))) >>>> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â tgtr_scmd = scmd; >>>> - Â Â Â Â Â Â Â Â Â Â Â } >>>> - Â Â Â Â Â Â Â } >>>> - Â Â Â Â Â Â Â if (!tgtr_scmd) >>>> - Â Â Â Â Â Â Â Â Â Â Â /* no more commands, that's it */ >>>> - Â Â Â Â Â Â Â Â Â Â Â break; >>>> + Â Â Â list_splice(work_q,&tmp_list); >>> >>> >>> I think I am hitting multiple bugs. One might be with this patch, but >>> strangely they both hit __list_add bugs in the scsi eh. I think this >>> needs to be list_splice_init(). >> >> I think that's exactly right ... I keep getting tripped up by the _init >> requirements for reusing lists (and list_heads). >> >>>> + Â Â Â Â Â Â Â list_for_each_entry_safe(scmd, next,&tmp_list, eh_entry) >>>> { >>>> + Â Â Â Â Â Â Â Â Â Â Â if (scmd_id(scmd) != id) >>>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â continue; >>>> + >>>> + Â Â Â Â Â Â Â Â Â Â Â if ((rtn == SUCCESS || rtn == FAST_IO_FAIL) >>>> + Â Â Â Â Â Â Â Â Â Â Â && Â (!scsi_device_online(scmd->device) || >>>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Ârtn == FAST_IO_FAIL || >>>> !scsi_eh_tur(scmd))) >>>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â scsi_eh_finish_cmd(scmd, done_q); >>> >>> Because if the target resets work and move all the commands to the >>> done_q here. >>> >>> >>>> + Â Â Â Â Â Â Â Â Â Â Â else >>>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â /* push back on work queue for further >>>> processing */ >>>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â list_move(&scmd->eh_entry, work_q); >>>> + Â Â Â Â Â Â Â } >>>> >>>> Â Â Â Âreturn list_empty(work_q); >>>> Â } >>> >>> >>> Then this work_q is showing garbage in it from the splice I think. >>> Without the list_splice_init I get that trace I cut and pasted in the >>> other mail when I just force the scsi eh to run the target reset code. >>> >>> In the test, the target reset succeeds. The scsi_eh_finish_cmd call >>> above moves all the cmds to the done_q. But the work_q still has junk in >>> it, and so we return that we need to do more work and eventually we end >>> up running scsi_eh_offline_sdevs which calls scsi_eh_finish_cmd on a >>> command we already moved to the done_q. >> >> So does changing to the _init version fix at least the junk done_q >> problem? >> > > Yeah. > Blame me if it is buggy, please. --- --- a/drivers/scsi/scsi_error.c 2010-09-13 07:07:38.000000000 +0800 +++ b/drivers/scsi/scsi_error.c 2010-10-28 21:27:24.000000000 +0800 @@ -1156,51 +1156,42 @@ static int scsi_eh_target_reset(struct S struct list_head *work_q, struct list_head *done_q) { - struct scsi_cmnd *scmd, *tgtr_scmd, *next; + struct scsi_cmnd *scmd, *next; unsigned int id = 0; int rtn; + int success; + LIST_HEAD(head); - do { - tgtr_scmd = NULL; - list_for_each_entry(scmd, work_q, eh_entry) { - if (id == scmd_id(scmd)) { - tgtr_scmd = scmd; - break; - } - } - if (!tgtr_scmd) { - /* not one exactly equal; find the next highest */ - list_for_each_entry(scmd, work_q, eh_entry) { - if (scmd_id(scmd) > id && - (!tgtr_scmd || - scmd_id(tgtr_scmd) > scmd_id(scmd))) - tgtr_scmd = scmd; - } - } - if (!tgtr_scmd) - /* no more commands, that's it */ - break; + while (! list_empty(work_q)) { + scmd = list_entry(work_q->next, struct scsi_cmnd, eh_entry); + id = scmd_id(scmd); SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Sending target reset " "to target %d\n", current->comm, id)); - rtn = scsi_try_target_reset(tgtr_scmd); - if (rtn == SUCCESS || rtn == FAST_IO_FAIL) { - list_for_each_entry_safe(scmd, next, work_q, eh_entry) { - if (id == scmd_id(scmd)) - if (!scsi_device_online(scmd->device) || - rtn == FAST_IO_FAIL || - !scsi_eh_tur(tgtr_scmd)) - scsi_eh_finish_cmd(scmd, - done_q); - } - } else + rtn = scsi_try_target_reset(scmd); + success = (rtn == SUCCESS || rtn == FAST_IO_FAIL); + if (! success) SCSI_LOG_ERROR_RECOVERY(3, printk("%s: Target reset" " failed target: " "%d\n", current->comm, id)); - id++; - } while(id != 0); + list_for_each_entry_safe(scmd, next, work_q, eh_entry) { + if (scmd_id(scmd) != id) + continue; + if (! success) { + list_move(&scmd->eh_entry, &head); + continue; + } + if (!scsi_device_online(scmd->device) || + rtn == FAST_IO_FAIL || + !scsi_eh_tur(scmd)) + scsi_eh_finish_cmd(scmd, done_q); + else + list_move(&scmd->eh_entry, &head); + } + } + list_splice(&head, work_q); return list_empty(work_q); } -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html