Dear Tomonori, I applied the patch. The symptom still exists. I add 2 print statements in the patch. I can't find it in the log after tgtd segfault. Please download the log if you need: http://dl.dropbox.com/u/8354750/tgtd/20110718/messages.gz the code: diff --git a/Makefile b/Makefile index 5df544d..2712b86 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -VERSION ?= 1.0.16 +VERSION ?= 1.0.16.0718 CHECK_CC = cgcc CHECK_CC_FLAGS = '$(CHECK_CC) -Wbitwise -Wno-return-void -no-compile $(ARCH)' diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c index ee17ef3..5c6ff34 100644 --- a/usr/iscsi/iscsid.c +++ b/usr/iscsi/iscsid.c @@ -1200,6 +1200,13 @@ static int iscsi_scsi_cmd_done(uint64_t nid, int result, struct scsi_cmd *scmd) struct iscsi_task *task = ITASK(scmd); uint32_t read_len = scsi_get_in_length(scmd); + if (result == TASK_ABORTED) { + eprintf("TASK_ABORTED, remove task %p\n", task); + list_del(&task->c_hlist); + iscsi_free_task(task); + return 0; + } + /* * Since the connection is closed we just free the task. * We could delay the closing of the conn in some cases and send diff --git a/usr/target.c b/usr/target.c index 24efb13..9fb1d18 100644 --- a/usr/target.c +++ b/usr/target.c @@ -1134,7 +1134,10 @@ static int abort_cmd(struct target* target, struct mgmt_req *mreq, cmd->mreq = mreq; err = -EBUSY; } else { - cmd->dev->cmd_done(target, cmd); + eprintf("remove cmd tag %" PRIx64 "\n", cmd->tag); + cmd_hlist_remove(cmd); + list_del(&cmd->qlist); + target_cmd_io_done(cmd, TASK_ABORTED); } return err; Thanks a lot. 2011/7/16 Kiefer Chang <zapchang@xxxxxxxxx>: > Dear Tomonori, > > I will try it. Thanks! > > By the way, using single thread for each LUN seems avoid the symptom > (at least before weekend...) > I will report if there are any progresses. > > 2011/7/16 FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>: >> On Thu, 14 Jul 2011 16:52:42 +0900 >> FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote: >> >>> On Tue, 12 Jul 2011 17:31:30 -0700 >>> Andy Grover <agrover@xxxxxxxxxx> wrote: >>> >>> > We are also seeing this issue reported, yes based on aborting tasks: >>> > >>> > https://bugzilla.redhat.com/show_bug.cgi?id=719687 >>> > >>> > From looking at the code, it looks like target_cmd_io_done() may be >>> > called twice for the same command, which leads to iscsi_scsi_cmd_done >>> > being called twice, and double-freeing the iscsi_task? >>> > >>> > 1st: abort_task_set -> abort_cmd -> target_cmd_io_done >>> > 2nd: abort_task_set -> abort_cmd -> cmd->dev->cmd_done() [__cmd_done] -> >>> > post_cmd_done -> target_cmd_io_done >>> >>> Yeah, I think that you are right. Surely, that code looks buggy. >>> >>> The command that is not in the 'processed' state should live in >>> tgt_cmd_queue. So what we need to do is that unlinking the command >>> from the queue and freeing the command resource. >> >> How about the following patch? Kiefer, Can you try this? >> >> diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c >> index 22a21cc..c970a49 100644 >> --- a/usr/iscsi/iscsid.c >> +++ b/usr/iscsi/iscsid.c >> @@ -1209,6 +1209,12 @@ static int iscsi_scsi_cmd_done(uint64_t nid, int result, struct scsi_cmd *scmd) >> struct iscsi_task *task = ITASK(scmd); >> uint32_t read_len = scsi_get_in_length(scmd); >> >> + if (result == TASK_ABORTED) { >> + list_del(&task->c_hlist); >> + iscsi_free_task(task); >> + return 0; >> + } >> + >> /* >> * Since the connection is closed we just free the task. >> * We could delay the closing of the conn in some cases and send >> diff --git a/usr/target.c b/usr/target.c >> index 21575dc..786bbb9 100644 >> --- a/usr/target.c >> +++ b/usr/target.c >> @@ -1134,7 +1134,9 @@ static int abort_cmd(struct target* target, struct mgmt_req *mreq, >> cmd->mreq = mreq; >> err = -EBUSY; >> } else { >> - cmd->dev->cmd_done(target, cmd); >> + cmd_hlist_remove(cmd); >> + list_del(&cmd->qlist); >> + >> target_cmd_io_done(cmd, TASK_ABORTED); >> } >> return err; >> >> > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html