Kiefer, How are you configuring tgtd to use a single thread for each LUN? On Mon, Jul 18, 2011 at 08:26:07PM +0800, Kiefer Chang wrote: > Dear Tomonori, > > I applied the patch. The symptom still exists. > I add 2 print statements in the patch. I can't find it in the log > after tgtd segfault. > Please download the log if you need: > http://dl.dropbox.com/u/8354750/tgtd/20110718/messages.gz > > the code: > > diff --git a/Makefile b/Makefile > index 5df544d..2712b86 100644 > --- a/Makefile > +++ b/Makefile > @@ -1,4 +1,4 @@ > -VERSION ?= 1.0.16 > +VERSION ?= 1.0.16.0718 > > CHECK_CC = cgcc > CHECK_CC_FLAGS = '$(CHECK_CC) -Wbitwise -Wno-return-void -no-compile $(ARCH)' > diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c > index ee17ef3..5c6ff34 100644 > --- a/usr/iscsi/iscsid.c > +++ b/usr/iscsi/iscsid.c > @@ -1200,6 +1200,13 @@ static int iscsi_scsi_cmd_done(uint64_t nid, > int result, struct scsi_cmd *scmd) > struct iscsi_task *task = ITASK(scmd); > uint32_t read_len = scsi_get_in_length(scmd); > > + if (result == TASK_ABORTED) { > + eprintf("TASK_ABORTED, remove task %p\n", task); > + list_del(&task->c_hlist); > + iscsi_free_task(task); > + return 0; > + } > + > /* > * Since the connection is closed we just free the task. > * We could delay the closing of the conn in some cases and send > diff --git a/usr/target.c b/usr/target.c > index 24efb13..9fb1d18 100644 > --- a/usr/target.c > +++ b/usr/target.c > @@ -1134,7 +1134,10 @@ static int abort_cmd(struct target* target, > struct mgmt_req *mreq, > cmd->mreq = mreq; > err = -EBUSY; > } else { > - cmd->dev->cmd_done(target, cmd); > + eprintf("remove cmd tag %" PRIx64 "\n", cmd->tag); > + cmd_hlist_remove(cmd); > + list_del(&cmd->qlist); > + > target_cmd_io_done(cmd, TASK_ABORTED); > } > return err; > > > > Thanks a lot. > > > > 2011/7/16 Kiefer Chang <zapchang@xxxxxxxxx>: > > Dear Tomonori, > > > > I will try it. Thanks! > > > > By the way, using single thread for each LUN seems avoid the symptom > > (at least before weekend...) > > I will report if there are any progresses. > > > > 2011/7/16 FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>: > >> On Thu, 14 Jul 2011 16:52:42 +0900 > >> FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote: > >> > >>> On Tue, 12 Jul 2011 17:31:30 -0700 > >>> Andy Grover <agrover@xxxxxxxxxx> wrote: > >>> > >>> > We are also seeing this issue reported, yes based on aborting tasks: > >>> > > >>> > https://bugzilla.redhat.com/show_bug.cgi?id=719687 > >>> > > >>> > From looking at the code, it looks like target_cmd_io_done() may be > >>> > called twice for the same command, which leads to iscsi_scsi_cmd_done > >>> > being called twice, and double-freeing the iscsi_task? > >>> > > >>> > 1st: abort_task_set -> abort_cmd -> target_cmd_io_done > >>> > 2nd: abort_task_set -> abort_cmd -> cmd->dev->cmd_done() [__cmd_done] -> > >>> > post_cmd_done -> target_cmd_io_done > >>> > >>> Yeah, I think that you are right. Surely, that code looks buggy. > >>> > >>> The command that is not in the 'processed' state should live in > >>> tgt_cmd_queue. So what we need to do is that unlinking the command > >>> from the queue and freeing the command resource. > >> > >> How about the following patch? Kiefer, Can you try this? > >> > >> diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c > >> index 22a21cc..c970a49 100644 > >> --- a/usr/iscsi/iscsid.c > >> +++ b/usr/iscsi/iscsid.c > >> @@ -1209,6 +1209,12 @@ static int iscsi_scsi_cmd_done(uint64_t nid, int result, struct scsi_cmd *scmd) > >> struct iscsi_task *task = ITASK(scmd); > >> uint32_t read_len = scsi_get_in_length(scmd); > >> > >> + if (result == TASK_ABORTED) { > >> + list_del(&task->c_hlist); > >> + iscsi_free_task(task); > >> + return 0; > >> + } > >> + > >> /* > >> * Since the connection is closed we just free the task. > >> * We could delay the closing of the conn in some cases and send > >> diff --git a/usr/target.c b/usr/target.c > >> index 21575dc..786bbb9 100644 > >> --- a/usr/target.c > >> +++ b/usr/target.c > >> @@ -1134,7 +1134,9 @@ static int abort_cmd(struct target* target, struct mgmt_req *mreq, > >> cmd->mreq = mreq; > >> err = -EBUSY; > >> } else { > >> - cmd->dev->cmd_done(target, cmd); > >> + cmd_hlist_remove(cmd); > >> + list_del(&cmd->qlist); > >> + > >> target_cmd_io_done(cmd, TASK_ABORTED); > >> } > >> return err; > >> > >> > > > -- > To unsubscribe from this list: send the line "unsubscribe stgt" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- James R. Leu Software Architect INOC 608.204.0203 608.663.4555 fax jleu@xxxxxxxx www.inoc.com *** DELIVERING UPTIME ***
Attachment:
pgpri5_fsEsBP.pgp
Description: PGP signature