Re: tgtadm: this logical unit is still active

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have written a patch as follows. And I have tested it and it works
ok when io_submit fails.

diff --git a/usr/bs_aio.c b/usr/bs_aio.c
index 1f46a2a..f00a974 100644
--- a/usr/bs_aio.c
+++ b/usr/bs_aio.c
@@ -152,6 +152,13 @@ static int bs_aio_submit_dev_batch(struct
bs_aio_info *info)
                                ", err: %d\n",
                                nsubmit, info->lu->tgt->tid,
                                info->lu->lun, -nsuccess);
+                       for (i = nsubmit - 1; i >= 0; i--) {
+                            cmd = info->iocb_arr[i].data;
+                            clear_cmd_async(cmd);
+                            info->nwaiting--;
+                            if (!info->nwaiting)
+                                list_del(&info->dev_list_entry);
+                       }
                        return nsuccess;
                }
        }

On Fri, Apr 24, 2015 at 6:24 PM, FUJITA Tomonori
<fujita.tomonori@xxxxxxxxxxxxx> wrote:
> On Fri, 24 Apr 2015 18:02:53 +0800
> Dong Wu <archer.wudong@xxxxxxxxx> wrote:
>
>> hi all,
>>
>> I use aio as tgt backing store, when io_submit fail in aio, then I
>> delete that lun and got an error
>> 'tgtadm: this logical unit is still active'.
>>
>> Is this a bug?
>
> Definitely a bug. Thanks a lot for the detailed investigation. bs_aio
> needs to complete a command as failure when io_submit() fails (like bs_rdwr).
>
> Can you send a patch?
>
>> here is reproducing method.
>>
>> 1.create target and lun
>> # tgtadm --lld iscsi --mode target --op new --tid=1000
>> --targetname=iqn.tgt.test.1
>> # tgtadm --lld iscsi --mode logicalunit --op new --tid=1000 --lun 1
>> --backing-store=/dev/vdb --bstype aio
>> # tgtadm --lld iscsi --mode target --op bind --tid=1000 --initiator-address ALL
>>
>> 2.use open-iscsi to connect the target
>> # iscsiadm -m discovery -t st -p 10.184.17.14:3260
>> # iscsiadm -m node -T iqn.tgt.test.1 -p 10.184.17.14:3260 -l
>>
>>
>> # iscsiadm -m session -P3
>> iSCSI Transport Class version 2.0-870
>> version 2.0-873
>> Target: iqn.tgt.test.1
>>     Current Portal: 10.184.17.14:3260,1
>>     Persistent Portal: 10.184.17.14:3260,1
>>         **********
>>         Interface:
>>         **********
>>         Iface Name: default
>>         Iface Transport: tcp
>>         Iface Initiatorname: iqn.1993-08.org.debian:01:743f99353c6
>>         Iface IPaddress: 10.184.17.14
>>         Iface HWaddress: <empty>
>>         Iface Netdev: <empty>
>>         SID: 3
>>         iSCSI Connection State: LOGGED IN
>>         iSCSI Session State: LOGGED_IN
>>         Internal iscsid Session State: NO CHANGE
>>         *********
>>         Timeouts:
>>         *********
>>         Recovery Timeout: 120
>>         Target Reset Timeout: 30
>>         LUN Reset Timeout: 30
>>         Abort Timeout: 15
>>         *****
>>         CHAP:
>>         *****
>>         username: <empty>
>>         password: ********
>>         username_in: <empty>
>>         password_in: ********
>>         ************************
>>         Negotiated iSCSI params:
>>         ************************
>>         HeaderDigest: None
>>         DataDigest: None
>>         MaxRecvDataSegmentLength: 262144
>>         MaxXmitDataSegmentLength: 8192
>>         FirstBurstLength: 65536
>>         MaxBurstLength: 262144
>>         ImmediateData: Yes
>>         InitialR2T: Yes
>>         MaxOutstandingR2T: 1
>>         ************************
>>         Attached SCSI devices:
>>         ************************
>>         Host Number: 4    State: running
>>         scsi4 Channel 00 Id 0 Lun: 0
>>         scsi4 Channel 00 Id 0 Lun: 1
>>             Attached scsi disk sda        State: running
>>
>>
>> 3.simulate io_submit fail condition
>> 1) here is the code slice of bs_aio.c
>>
>> 143         nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> 144         if (unlikely(nsuccess < 0)) {
>> 145                 if (nsuccess == -EAGAIN) {
>> 146                         eprintf("delayed submit %d cmds to tgt:%d
>> lun:%"PRId64 "\n",
>> 147                                 nsubmit, info->lu->tgt->tid, info->lu->lun);
>> 148                         nsuccess = 0; /* leave the dev pending
>> with all cmds */
>> 149                 }
>> 150                 else {
>> 151                         eprintf("failed to submit %d cmds to
>> tgt:%d lun:%"PRId64
>> 152                                 ", err: %d\n",
>> 153                                 nsubmit, info->lu->tgt->tid,
>> 154                                 info->lu->lun, -nsuccess);
>> 155                         return nsuccess;
>> 156                 }
>> 157         }
>>
>> 2) then use gdb to debug the tgt, and break at bs_aio.c:143
>> # gdb -p 8891
>> (gdb) b bs_aio.c:143
>> Breakpoint 1 at 0x41b7c4: file bs_aio.c, line 143.
>> (gdb) c
>> Continuing.
>>
>> 3) and open another shell, use dd  to read this iscsi device, here dd
>> will hang because it trigger gdb breakpoints.
>> # dd if=/dev/sda of=/dev/null bs=4k count=1 iflag=direct
>>
>> 4) switch to the gdb shell, I just set nsubmit=-1 to simulate io_submit fail,
>> this operation repeated several times.
>>
>> (gdb) b bs_aio.c:143
>> Breakpoint 1 at 0x41b7c4: file bs_aio.c, line 143.
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) p nsubmit
>> $2 = 1
>> (gdb) set nsubmit=-1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $4 = -22
>> (gdb) n
>> 145            if (nsuccess == -EAGAIN) {
>> (gdb) n
>> 151                eprintf("failed to submit %d cmds to tgt:%d lun:%"PRId64
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) p nsubmit
>> $5 = 2
>> (gdb) set nsubmit=-1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $6 = -22
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) p nsubmit
>> $7 = 3
>> (gdb) set nsubmit=-1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $8 = -22
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) p nsubmit
>> $9 = 4
>> (gdb) set nsubmit=-1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $10 = -22
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) p nsubmit
>> $11 = 5
>> (gdb) set nsubmit=-1
>> (gdb) p nsubmit
>> $12 = -1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $13 = -22
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 1, bs_aio_submit_dev_batch (info=0x73b908) at bs_aio.c:143
>> 143        nsuccess = io_submit(info->ctx, nsubmit, info->piocb_arr);
>> (gdb) set nsubmit=-1
>> (gdb) n
>> 144        if (unlikely(nsuccess < 0)) {
>> (gdb) p nsuccess
>> $14 = -22
>> (gdb) c
>> Continuing.
>>
>> 5) after simulate io_submit several times, the dd process fail.
>> # dd if=/dev/sda of=/dev/null bs=4k count=1 iflag=direct
>> dd: reading `/dev/sda': Input/output error
>> 0+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 130.426 s, 0.0 kB/s
>>
>> 6) then I delete the lun, and get an error.
>> # tgtadm --lld iscsi --mode logicalunit --op delete --tid=1000 --lun=1
>> tgtadm: this logical unit is still active
>>
>> 7) here is some delete lun logic in target.c:tgt_device_destroy.
>>  737         if (!list_empty(&lu->cmd_queue.queue) || lu->cmd_queue.active_cmd)
>>  738                 return TGTADM_LUN_ACTIVE;
>>
>> 8) I use gdb to debug tgt again, this time break at target.c:731, here
>> is some debug info.
>> (gdb) b target.c:731
>> Breakpoint 2 at 0x421a7d: file target.c, line 731.
>> (gdb) c
>> Continuing.
>>
>> Breakpoint 2, tgt_device_destroy (tid=1000, lun=1, force=0) at target.c:731
>> 731        lu = __device_lookup(tid, lun, &target);
>> (gdb) n
>> 732        if (!lu) {
>> (gdb) n
>> 737        if (!list_empty(&lu->cmd_queue.queue) || lu->cmd_queue.active_cmd)
>> (gdb) p lu->cmd_queue.queue
>> $15 = {next = 0x739bb0, prev = 0x739bb0}
>> (gdb) p lu->cmd_queue.active_cmd
>> $16 = 6
>> (gdb) c
>> Continuing.
>>
>> 9) according to above information, I guess that when io_submit fail,
>> it goes to the fault operation but does not minus
>> lu->cmd_queue.active_cmd, and this cause "the logical unit is still
>> active" error when I delete this lun.
>>
>> So, Is it a bug or just tgt normal logic?
>> --
>> To unsubscribe from this list: send the line "unsubscribe stgt" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Linux RAID]     [Linux Clusters]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]

  Powered by Linux