Nicholas A. Bellinger <nab <at> linux-iscsi.org> writes: > > On Sun, 2012-05-06 at 18:31 +0200, Henning Becker wrote: > > Am Samstag, 14. April 2012, 14:37:47 schrieb Nicholas A. Bellinger: > > > On Sat, 2012-04-14 at 18:35 +0200, Henning Becker wrote: > > > > Am Dienstag, 10. April 2012, 23:54:06 schrieb Nicholas A. Bellinger: > > <SNIP> > > > > Hi Henning, > > > > > > Ok, I think I've identified the cause of this oops within iscsi-target. > > > > > > It has to do with the ordering in which your scripts are tearing down > > > the configfs layout. Looking at the inotify log again I see the > > > following ordering: > > > > > > # Tear down LUN=0 from TPG=1 > > > /sys/kernel/config/target/iscsi/iqn.2012- 04.lan.storage:iscsi.storage/tpgt_1 > > > /lun/lun_0/ DELETE,ISDIR statistics > > > /sys/kernel/config/target/iscsi/iqn.2012- 04.lan.storage:iscsi.storage/tpgt_ > > > 1/lun/lun_0/ DELETE_SELF > > > /sys/kernel/config/target/iscsi/iqn.2012- 04.lan.storage:iscsi.storage/tpgt_ > > > 1/lun/ DELETE,ISDIR lun_0 > > > > > > # Release IBLOCK backend device > > > /sys/kernel/config/target/core/iblock_0/iscsiLUNTest/ DELETE_SELF > > > /sys/kernel/config/target/core/iblock_0/ DELETE,ISDIR iscsiLUNTest > > > > > > # Echo '0 > enable' to disable TPG > > > /sys/kernel/config/target/iscsi/iqn.2012-04.lan.storage:iscsi.storage/ > > > CLOSE_NOWRITE,CLOSE,ISDIR > > > /sys/kernel/config/target/iscsi/iqn.2012- 04.lan.storage:iscsi.storage/tpgt_ > > > 1/ MODIFY enable > > > /sys/kernel/config/target/iscsi/iqn.2012- 04.lan.storage:iscsi.storage/tpgt_ > > > 1/ OPEN enable > > > > > > So it appears with your custom scripts that the LUN=0 + IBLOCK backend > > > is being released *before* explicitly disabling the TPG and forcing all > > > of the active sessions to shutdown. > > > > > > The OOPs itself is being caused by the removal of the IBLOCK backend, as > > > there is code in the iscsi_cmd descriptor release path that depends upon > > > the backend being in place (although removing the TPG LUN is OK).. This > > > is a genuine bug, for which I'll need to think some more to best resolve > > > in order to avoid extra overhead within the existing data I/O fast > > > path.. > > > > > > That said, the work-around for this bug is to change your custom scripts > > > to follow what rtslib/lio-utils currently does for TPG removal. That > > > is: > > > > > > 1: Echo '0 > enable' to disable TPG > > > 2: Tear down NodeACLs+MappedLUNs from TPG > > > 3: Tear down LUN from TPG > > > 4: Tear down entire TPG > > > 4: Release IBLOCK backend device > > > > Hi Nicholas, > > I'm just running my cluster according to your specs for 3 weeks now and the > > problem has not occured anymore. > > > > > Hi Henning, > > Thanks for confirmation that the backend device shutdown ordering is the > root cause trigger for the bug you've seen.. As mentioned, I still need > to think some more about what the proper resolution should actually be > here.. > > > > I'm quite certain this will avoid the bug in question by forcing > > > shutdown of all active sessions at step #1, instead of doing this part > > > at the end of the sequence as done in your current setup. > > > > > > Please give it a shot and let me know if you have problems getting your > > > scripts to sync with what the official userspace code is doing here. > > > > Which official userspace code does that? I'm currently just calling lio_node > > and it didn't refuse me, to release an iblock which is still connected to a > > portal. > > > > What I meant here is that the important part is currently disabling the > TPG before bringing down the TPG LUN associated with the backend with > active IO, ahead of the backend itself. This will shutdown all active > iSCSI sessions (and hence outstanding I/Os) to underlying backend > devices, and after it's completed it will be safe to remove an > associated backend device. > > So the main issue is still the final release of the backend device > (after it's been released from TPG LUN) to ensure that any remaining > outstanding I/O that is still referencing se_device memory is allowed to > complete before 'rmdir /sys/kernel/config/target/core/$HBA/$DEV' is > releasing se_device. > > --nab > > We are experiencing what I believe to be this same oops on kernel version 3.4.1 during removal of an iscsi target. BUG: unable to handle kernel paging request at 000000066474e5d9 IP: [<ffffffffa049e278>] transport_free_dev_tasks+0xf8/0x120 [target_core_mod] PGD 0 Oops: 0000 [#1] PREEMPT SMP CPU 1 Modules linked in: md5 ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables 8021q garp bridge stp llc target_core_file target_core_iblock iscsi_target_mod sunrpc af_packet ipv6 binfmt_misc target_core_mod configfs vhost_net macvtap macvlan tun kvm container coretemp microcode serio_raw pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support ixgbe(O) i5000_edac edac_core i5k_amb ioatdma dca sg ses enclosure e1000e shpchp pci_hotplug ext4 mbcache jbd2 sd_mod crc_t10dif ahci libahci qla2xxx scsi_transport_fc scsi_tgt megaraid_sas button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mperf] Pid: 3525, comm: iscsi_ttx Tainted: G O 3.4.1-1.BH #3 Supermicro X7DB8/X7DB8 RIP: 0010:[<ffffffffa049e278>] [<ffffffffa049e278>] transport_free_dev_tasks+0xf8/0x120 [target_core_mod] RSP: 0018:ffff88040b515e00 EFLAGS: 00010246 RAX: 000000066474e551 RBX: ffff88041941c270 RCX: ffff88041941c440 RDX: ffff88040b515e00 RSI: 0000000000000286 RDI: ffff8804181449c0 RBP: ffff88040b515e30 R08: ffff88041941c470 R09: dead000000200200 R10: dead000000100100 R11: 0000000000000001 R12: ffff88040b515e00 R13: ffff8804181449c0 R14: 0000000000000000 R15: ffff8803fe2c04c0 FS: 0000000000000000(0000) GS:ffff88042fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000066474e5d9 CR3: 0000000400ed5000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process iscsi_ttx (pid: 3525, threadinfo ffff88040b514000, task ffff8803fe2c04c0) Stack: ffff88040b515e00 ffff88040b515e00 0000000000000000 ffff88041941c270 ffff88041941c128 ffff88041941c040 ffff88040b515e50 ffffffffa04a26f1 ffff88040b515e60 ffff880419dd1c00 ffff88040b515e60 ffffffffa0594071 Call Trace: [<ffffffffa04a26f1>] transport_generic_free_cmd+0x61/0x90 [target_core_mod] [<ffffffffa0594071>] iscsit_free_cmd+0x21/0x50 [iscsi_target_mod] [<ffffffffa059acc7>] iscsi_target_tx_thread+0x497/0x680 [iscsi_target_mod] [<ffffffffa059a830>] ? iscsit_send_text_rsp+0x330/0x330 [iscsi_target_mod] [<ffffffffa059a830>] ? iscsit_send_text_rsp+0x330/0x330 [iscsi_target_mod] [<ffffffff8105f626>] kthread+0x96/0xa0 [<ffffffff815125e4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105f590>] ? kthread_freezable_should_stop+0x60/0x60 [<ffffffff815125e0>] ? gs_change+0x13/0x13 Code: 00 00 ad de 49 b9 00 02 20 00 00 00 ad de 4c 89 ef 48 89 42 08 48 89 10 4d 89 55 30 4d 89 4d 38 48 8b 43 78 48 8b 80 88 01 00 00 <ff> 90 88 00 00 00 48 8b 45 d0 4c 39 e0 75 99 48 83 c4 18 5b 41 RIP [<ffffffffa049e278>] transport_free_dev_tasks+0xf8/0x120 [target_core_mod] RSP <ffff88040b515e00> CR2: 000000066474e5d9 ---[ end trace 30b5ec5ccc64a33d ]--- We do use a custom script but wrap around targetcli rather than using sysfs directly. Our process is: 1) targetcli <target iqn>/tpg1/luns delete lunX 2) targetcli /iscsi delete <target iqn> (if this was the last lun being exported) 3) targetcli /backstores/block delete <backstore name> According to the response to Henning previously the best approach is to disable the tpg prior to teardown but that is not desirable here as we will have several other luns possibly exported to which we do not wish to lose communication. If we delay the tear down of the backstore until several minutes later and cause the initiator to issue a 'delete' in the intervening period would we still need to disable the tpg to safely delete it? Any further thoughts on addressing the bug in the iscsi_cmd_descriptor release path to eliminate the need for this workaround? -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html