Hi Zhu & Co, Thanks for the detailed logs. Comments below. On Mon, 2016-10-31 at 16:51 +0800, Zhu Lingshan wrote: > Hi Nicholas, > > (sorry it would be a long mail) > > Sorry for the delay, I spent some test and debug work. I find the patch > http://www.spinics.net/lists/target-devel/msg13530.html can solve two > issues: > (a). iscsit_stop_session() on the top of iscsi_np stack. > (b).iscsi_check_for_session_reinstatement() on the top of iscsi_np > stack, it is a must to reboot to recover. > Great, thanks for confirming the patch above. The key wrt this scenario, and other scenarios below is once target-core ABORT_TASK logic locates a se_cmd descriptor for IBLOCK backend I/O still outstanding, both iscsi session reinstatement in iscsi_np context and iscsi session shutdown in iscsi_t[t,r]x context are blocked until the specific outstanding se_cmd I/Os are completed, back to target-core backend driver. Note this is expected behavior during target-core ABORT_TASK and iscsi-target session reinstatement + session shutdown. > But I also find two more issues. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Please let me explain my setup: > I have a target server, kernel version 4.4.21, it has a SATA HDD as a > LUN, only one LUN. > I have two initiators, both login to the target. > Create two partitions on the LUN, each initiator mount a certain > partition. Like initiator1 mount /dev/sdc1, initiator2 mount /dev/sdc2. > looks like this: > lszhu_DEV:~ # lsscsi > [0:0:0:0] cd/dvd HL-DT-ST DVD+-RW GHB0N A1C0 /dev/sr0 > [4:0:0:0] disk ATA TOSHIBA DT01ACA2 MX4O /dev/sda > [9:0:0:0] disk LIO-ORG IBLOCK 4.0 /dev/sdc > lszhu_DEV:~ # lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 1.8T 0 disk > ├─sda1 8:1 0 16G 0 part [SWAP] > ├─sda2 8:2 0 200G 0 part / > └─sda3 8:3 0 1.6T 0 part /home > sdc 8:32 0 465.8G 0 disk > ├─sdc1 8:33 0 200G 0 part > └─sdc2 8:34 0 265.8G 0 part /mnt > sr0 11:0 1 1024M 0 rom > > bj-ha-5:~ # lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 465.8G 0 disk > ├─sda1 8:1 0 8G 0 part [SWAP] > ├─sda2 8:2 0 100G 0 part / > └─sda3 8:3 0 357.8G 0 part /home > sdb 8:16 0 465.8G 0 disk > sdc 8:32 0 465.8G 0 disk > ├─sdc1 8:33 0 200G 0 part /mnt > └─sdc2 8:34 0 265.8G 0 part > sr0 11:0 1 1024M 0 rom > > so you can see each initiator will read / write on their own partition > from the LUN. > > mount the partition like this: > mount -o dioread_nolock /dev/sdc1 /mnt, option dioread_nolock can help > we reproduce this bug a little quicker. > > Then run fio on both initiators like this: > fio -filename=/mnt/file1 -direct=1 -rw=randread -iodepth=64 > -ioengin=libaio -bs=64K -size=5G -numjobs=24 -runtime=60000 -time_based > -group_reporting -name=init1 > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Here are the three more issues I found, kernel version 4.4.21 > (1) The first one looks like this, it is quite rare, only once, I failed > to get stack information for iscsi_trx and hard to reproduce, call stack > of iscsi_np is here: > > bj-ha-3:~ # ps -aux | grep iscsi > root 10501 0.0 0.0 0 0 ? D 16:45 0:00 [iscsi_np] > root 10519 0.4 0.0 0 0 ? S 16:49 0:02 [iscsi_ttx] > root 10520 2.2 0.0 0 0 ? S 16:49 0:14 [iscsi_trx] > root 10533 0.9 0.0 0 0 ? D 16:54 0:03 [iscsi_trx] > root 10547 0.0 0.0 10508 1552 pts/0 S+ 16:59 0:00 grep > --color=auto iscsi > bj-ha-3:~ # cat /proc/10501/stack > [<ffffffff815334c9>] inet_csk_accept+0x269/0x2e0 > [<ffffffff8155f4aa>] inet_accept+0x2a/0x100 > [<ffffffff814d2a88>] kernel_accept+0x48/0xa0 > [<ffffffffa06fe871>] iscsit_accept_np+0x31/0x230 [iscsi_target_mod] > [<ffffffffa06fefab>] iscsi_target_login_thread+0xeb/0xfd0 [iscsi_target_mod] > [<ffffffff810996bd>] kthread+0xbd/0xe0 > [<ffffffff815e15bf>] ret_from_fork+0x3f/0x70 > [<ffffffff81099600>] kthread+0x0/0xe0 > [<ffffffffffffffff>] 0xffffffffffffffff The iscsi_np here is looks like it's doing normal Linux/NET accept. Is the uninterruptible sleep for PID=10533 waiting for outstanding I/O to complete during iscsit_close_connection() shutdown after ABORT_TASK..? > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > (2) The second issue I found has iscsi_check_for_session_reinstatement() > on the top of iscsi_np call stack, but different from case (b) we > mentioned before. In case (b), we must reboot the target to recover, but > in this case, it will auto recover, here is the details: > dmesg: > [ 100.487421] iSCSI/iqn.1996-04.de.suse:01:faad5846cde9: Unsupported > SCSI Opcode 0xa3, sending CHECK_CONDITION. > [ 182.582323] ABORT_TASK: Found referenced iSCSI task_tag: 15 > [ 197.616800] Unexpected ret: -32 send data 48 > [ 197.712278] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 15 > [ 263.603305] ABORT_TASK: Found referenced iSCSI task_tag: 268435537 > [ 278.640172] Unexpected ret: -32 send data 48 > [ 278.671299] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: > 268435537 > [ 341.584329] ABORT_TASK: Found referenced iSCSI task_tag: 536871038 > [ 357.423710] Unexpected ret: -32 send data 48 > [ 373.660476] iSCSI Login timeout on Network Portal 192.168.100.233:3260 > Great, recovering from uninterruptible sleep once the backend (finally) completes outstanding I/O here confirms the earlier SCF_ACK_KREF patch. So the backend device in question looks like it's repeatably taking extended time to complete I/O. At least 15 seconds plus the host initiator side ABORT_TASK timeout. Is the device failing..? > call stacks: > bj-ha-3:~ # ps -aux | grep iscsi > root 3063 0.0 0.0 0 0 ? D 15:36 0:00 [iscsi_np] > root 3073 0.3 0.0 0 0 ? S 15:36 0:01 [iscsi_ttx] > root 3074 0.4 0.0 0 0 ? S 15:36 0:01 [iscsi_trx] > root 3094 0.3 0.0 0 0 ? D 15:39 0:00 [iscsi_trx] > root 3116 0.0 0.0 10508 1616 pts/0 S+ 15:42 0:00 grep > --color=auto iscsi > bj-ha-3:~ # cat /proc/3063/stack > [<ffffffffa03fcd97>] iscsi_check_for_session_reinstatement+0x1d7/0x270 > [iscsi_target_mod] > [<ffffffffa03ff971>] iscsi_target_do_login+0x111/0x5f0 [iscsi_target_mod] > [<ffffffffa0400b17>] iscsi_target_start_negotiation+0x17/0xa0 > [iscsi_target_mod] > [<ffffffffa03fe932>] iscsi_target_login_thread+0xa72/0xfd0 > [iscsi_target_mod] > [<ffffffff810997cd>] kthread+0xbd/0xe0 > [<ffffffff815e177f>] ret_from_fork+0x3f/0x70 > [<ffffffff81099710>] kthread+0x0/0xe0 > [<ffffffffffffffff>] 0xffffffffffffffff > bj-ha-3:~ # cat /proc/3094/stack > [<ffffffffa03adcc6>] transport_generic_free_cmd+0x76/0x110 [target_core_mod] > [<ffffffffa0404917>] iscsit_free_cmd+0x67/0x130 [iscsi_target_mod] > [<ffffffffa040b929>] iscsit_close_connection+0x4a9/0x860 [iscsi_target_mod] > [<ffffffffa040b0b3>] iscsi_target_rx_thread+0x93/0xa0 [iscsi_target_mod] > [<ffffffff810997cd>] kthread+0xbd/0xe0 > [<ffffffff815e177f>] ret_from_fork+0x3f/0x70 > [<ffffffff81099710>] kthread+0x0/0xe0 > [<ffffffffffffffff>] 0xffffffffffffffff > > At this moment, we can see IO errors at one initiator side, the other > will survive. Once we killed the survival side FIO processes or wait a > moment, the target would auto recover, looks like this: > > bj-ha-3:~ # ps -aux | grep iscsi > root 3063 0.0 0.0 0 0 ? S 15:36 0:00 [iscsi_np] > root 3142 0.0 0.0 0 0 ? S 15:43 0:00 [iscsi_ttx] > root 3143 0.0 0.0 0 0 ? S 15:43 0:00 [iscsi_trx] > root 3146 0.6 0.0 0 0 ? S 15:43 0:00 [iscsi_ttx] > root 3147 0.8 0.0 0 0 ? S 15:43 0:00 [iscsi_trx] > root 3149 0.0 0.0 10508 1636 pts/0 S+ 15:43 0:00 grep > --color=auto iscsi > > dmesg at this moment: > [ 90.507364] iSCSI/iqn.1996-04.de.suse:01:4039926d2313: Unsupported > SCSI Opcode 0xa3, sending CHECK_CONDITION. > [ 100.487421] iSCSI/iqn.1996-04.de.suse:01:faad5846cde9: Unsupported > SCSI Opcode 0xa3, sending CHECK_CONDITION. > [ 182.582323] ABORT_TASK: Found referenced iSCSI task_tag: 15 > [ 197.616800] Unexpected ret: -32 send data 48 > [ 197.712278] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 15 > [ 263.603305] ABORT_TASK: Found referenced iSCSI task_tag: 268435537 > [ 278.640172] Unexpected ret: -32 send data 48 > [ 278.671299] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: > 268435537 > [ 341.584329] ABORT_TASK: Found referenced iSCSI task_tag: 536871038 > [ 357.423710] Unexpected ret: -32 send data 48 > [ 373.660476] iSCSI Login timeout on Network Portal 192.168.100.233:3260 > [ 535.852903] Unexpected ret: -32 send data 48 > [ 535.923803] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: > 536871038 > [ 535.923811] ABORT_TASK: Found referenced iSCSI task_tag: 76 > [ 535.923815] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76 > [ 535.924325] Unexpected ret: -104 send data 360 > [ 535.924329] tx_data returned -32, expecting 360. > [ 535.924489] iSCSI Login negotiation failed. > [ 535.924916] Unexpected ret: -104 send data 360 > [ 535.924918] tx_data returned -32, expecting 360. > [ 535.924943] iSCSI Login negotiation failed. > [ 535.925424] Unexpected ret: -104 send data 360 > [ 535.925428] tx_data returned -32, expecting 360. > [ 535.925650] iSCSI Login negotiation failed. > [ 535.926095] Unexpected ret: -104 send data 360 > [ 535.926097] tx_data returned -32, expecting 360. > [ 535.926132] iSCSI Login negotiation failed. > [ 535.926349] Unexpected ret: -104 send data 360 > [ 535.926351] tx_data returned -32, expecting 360. > [ 535.926369] iSCSI Login negotiation failed. > [ 535.926576] Unexpected ret: -104 send data 360 > [ 535.926577] tx_data returned -32, expecting 360. > [ 535.926593] iSCSI Login negotiation failed. > [ 535.926750] Unexpected ret: -104 send data 360 > [ 535.926752] tx_data returned -32, expecting 360. > [ 535.926768] iSCSI Login negotiation failed. > [ 535.926905] Unexpected ret: -104 send data 360 > [ 535.926906] tx_data returned -32, expecting 360. > [ 535.926925] iSCSI Login negotiation failed. > [ 535.927064] Unexpected ret: -104 send data 360 > [ 535.927065] tx_data returned -32, expecting 360. > [ 535.927082] iSCSI Login negotiation failed. > [ 535.927219] Unexpected ret: -104 send data 360 > [ 535.927220] tx_data returned -32, expecting 360. > [ 535.927237] iSCSI Login negotiation failed. > > I am still debugging on this case, I have core dumps, I think the dump > files are too big for a mail list, I can send it off the list if needed. AFAICT, this is all expected behavior from target-core + iscsi-target when the backend device is not completing I/O for extended periods of time. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > (3) In this case, dmesg and call stack looks like this: > bj-ha-3:~ # dmesg > [ 803.570868] ABORT_TASK: Found referenced iSCSI task_tag: 96 > [ 818.584038] Unexpected ret: -32 send data 48 > [ 818.620210] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 96 > [ 882.555923] ABORT_TASK: Found referenced iSCSI task_tag: 268435496 > [ 897.587925] Unexpected ret: -32 send data 48 > [ 898.845646] Unexpected ret: -32 send data 48 > [ 898.916090] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: > 268435496 > [ 898.916095] ABORT_TASK: Found referenced iSCSI task_tag: 45 > [ 898.916377] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 45 > [ 962.785419] ABORT_TASK: Found referenced iSCSI task_tag: 268435489 > > bj-ha-3:~ # ps -aux | grep iscsi > root 3063 0.0 0.0 0 0 ? D 15:36 0:00 [iscsi_np] > root 3296 1.9 0.0 0 0 ? S 15:49 0:01 [iscsi_ttx] > root 3297 1.9 0.0 0 0 ? R 15:49 0:01 [iscsi_trx] > root 3299 0.0 0.0 0 0 ? D 15:49 0:00 [iscsi_trx] > root 3506 0.0 0.0 10508 1516 pts/0 S+ 15:51 0:00 grep > --color=auto iscsi > > > bj-ha-3:~ # cat /proc/3063/stack > [<ffffffffa03fcd97>] iscsi_check_for_session_reinstatement+0x1d7/0x270 > [iscsi_target_mod] > [<ffffffffa03ff971>] iscsi_target_do_login+0x111/0x5f0 [iscsi_target_mod] > [<ffffffffa0400b17>] iscsi_target_start_negotiation+0x17/0xa0 > [iscsi_target_mod] > [<ffffffffa03fe932>] iscsi_target_login_thread+0xa72/0xfd0 > [iscsi_target_mod] > [<ffffffff810997cd>] kthread+0xbd/0xe0 > [<ffffffff815e177f>] ret_from_fork+0x3f/0x70 > [<ffffffff81099710>] kthread+0x0/0xe0 > [<ffffffffffffffff>] 0xffffffffffffffff > bj-ha-3:~ # cat /proc/3099/stack > [<ffffffffa03ac784>] __transport_wait_for_tasks+0xb4/0x190 [target_core_mod] > [<ffffffffa03ac905>] transport_wait_for_tasks+0x45/0x60 [target_core_mod] > [<ffffffffa03aa034>] core_tmr_abort_task+0xf4/0x160 [target_core_mod] > [<ffffffffa03aca43>] target_tmr_work+0x123/0x140 [target_core_mod] > [<ffffffff81093ace>] process_one_work+0x14e/0x410 > [<ffffffff81094326>] worker_thread+0x116/0x490 > [<ffffffff810997cd>] kthread+0xbd/0xe0 > [<ffffffff815e177f>] ret_from_fork+0x3f/0x70 > [<ffffffff81099710>] kthread+0x0/0xe0 > [<ffffffffffffffff>] 0xffffffffffffffff > > This case is also rare, only once, but I got the core dump files. Looks the same as #2. > ------------------------------------------------------------------------------------------------------------------------------------------------- > Sorry again for the delay and send such a long mail, I am still working > on there cases, I will keep the list update once I find anything new. Thanks again for confirming the earlier patch. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html