Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Mike,

Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
> 
> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
> 
> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
> 
> The cmd is now freed.
> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
> 
> 

Thanks for the review!

There are definitely some problems with task aborts and commands' refcounting *
but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
unless you are saying that abort tasks should never be executed when the connection 
is going down and we have to prevent such cases from happening at all.

* I did a test a few days ago: 
1) blocked the backstore device on the target
2) issued a write command on the initiator
3) waited for abort task and connection shutdown
4) switched the target backstore device to state "running"

I triggered the following warnings in target_handle_abort():

        } else {
                /*
                 * Allow the fabric driver to unmap any resources before
                 * releasing the descriptor via TFO->release_cmd().
                 */
                cmd->se_tfo->aborted_task(cmd);
                if (ack_kref)
                        WARN_ON_ONCE(target_put_sess_cmd(cmd) != 0); <-----------------------
                /*
                 * To do: establish a unit attention condition on the I_T
                 * nexus associated with cmd. See also the paragraph "Aborting
                 * commands" in SAM.
                 */
        }

        WARN_ON_ONCE(kref_read(&cmd->cmd_kref) == 0); <----------------------

        transport_lun_remove_cmd(cmd);

        transport_cmd_check_stop_to_fabric(cmd);
}

target_handle_abort() is executed against a command which has already been released.

[  240.883865] iSCSI/iqn.1994-05.com.redhat:9489e049fb5e: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[  353.848593] ABORT_TASK: Found referenced iSCSI task_tag: 101
[  386.521582] iSCSI Login timeout on Network Portal 0.0.0.0:3260
[  396.537133] WARNING: CPU: 1 PID: 2936 at drivers/target/target_core_transport.c:805 target_handle_abort+0xce/0x180 [target_core_mod]
[  396.537139] Modules linked in: iscsi_target_mod target_core_user uio target_core_pscsi target_core_file target_core_iblock target_core_mod uinput nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm joydev pcspkr i2c_piix4 parport_pc parport xfs libcrc32c sr_mod cdrom sd_mod sg ata_generic ata_piix libata virtio_net net_failover failover serio_raw fuse [last unloaded: ip_tables]
[  396.537302] CPU: 1 PID: 2936 Comm: kworker/1:2 Kdump: loaded Not tainted 4.18.0-240.7.el8.x86_64 #1
[  396.537306] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  396.537353] Workqueue: target_completion target_abort_work [target_core_mod]
[  396.537394] RIP: 0010:target_handle_abort+0xce/0x180 [target_core_mod]
[  396.537399] Code: 00 e8 e6 db 00 f1 41 81 e4 00 00 40 00 74 9d 48 8d bb 0c 01 00 00 f0 ff 8b 0c 01 00 00 0f 88 ed cd 00 00 75 87 e8 32 ed ff ff <0f> 0b e9 7b ff ff ff c6 03 40 0f 1f 44 00 00 0f 1f 44 00 00 48 8b
[  396.537402] RSP: 0018:ffffac1b818bbe80 EFLAGS: 00010286
[  396.537407] RAX: 0000312f0022d690 RBX: ffff9aec21595350 RCX: 0000000000000000
[  396.537410] RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000282
[  396.537413] RBP: 0000000000000000 R08: ffff9aebf4d65a00 R09: ffff9aebf4c12d98
[  396.537416] R10: 0000000000000082 R11: 000000000000002e R12: 0000000000400000
[  396.537418] R13: 0000000000000000 R14: ffff9aec02cd9c00 R15: ffff9aec21595488
[  396.537423] FS:  0000000000000000(0000) GS:ffff9aec7fa80000(0000) knlGS:0000000000000000
[  396.537427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  396.537430] CR2: 000055bfa25e9110 CR3: 00000000614c6000 CR4: 00000000000006e0
[  396.537444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.537447] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  396.537449] Call Trace:
[  396.537504]  process_one_work+0x1a7/0x360
[  396.537525]  worker_thread+0x30/0x390
[  396.537532]  ? create_worker+0x1a0/0x1a0
[  396.537536]  kthread+0x112/0x130
[  396.537542]  ? kthread_flush_work_fn+0x10/0x10
[  396.537550]  ret_from_fork+0x35/0x40
[  396.537570] ---[ end trace d6b5bf9aea218e55 ]---
[  396.537734] WARNING: CPU: 1 PID: 2936 at drivers/target/target_core_transport.c:813 target_handle_abort+0x178/0x180 [target_core_mod]
[  396.537737] Modules linked in: iscsi_target_mod target_core_user uio target_core_pscsi target_core_file target_core_iblock target_core_mod uinput nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm joydev pcspkr i2c_piix4 parport_pc parport xfs libcrc32c sr_mod cdrom sd_mod sg ata_generic ata_piix libata virtio_net net_failover failover serio_raw fuse [last unloaded: ip_tables]
[  396.537811] CPU: 1 PID: 2936 Comm: kworker/1:2 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-240.7.el8.x86_64 #1
[  396.537814] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  396.537860] Workqueue: target_completion target_abort_work [target_core_mod]
[  396.537903] RIP: 0010:target_handle_abort+0x178/0x180 [target_core_mod]
[  396.537909] Code: 67 03 00 48 85 ed 74 1d 48 8b 45 00 48 8b 7d 08 48 83 c5 18 48 89 de e8 26 db 00 f1 48 8b 45 00 48 85 c0 75 e7 e9 6a ff ff ff <0f> 0b e9 df fe ff ff 90 0f 1f 44 00 00 55 48 8d af 08 01 00 00 53
[  396.537912] RSP: 0018:ffffac1b818bbe80 EFLAGS: 00010246
[  396.537916] RAX: 0000000000000000 RBX: ffff9aec21595350 RCX: 0000000000000000
[  396.537919] RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000282
[  396.537922] RBP: 0000000000000000 R08: ffff9aebf4d65a00 R09: ffff9aebf4c12d98
[  396.537925] R10: 0000000000000082 R11: 000000000000002e R12: 0000000000400000
[  396.537928] R13: 0000000000000000 R14: ffff9aec02cd9c00 R15: ffff9aec21595488
[  396.537933] FS:  0000000000000000(0000) GS:ffff9aec7fa80000(0000) knlGS:0000000000000000
[  396.537936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  396.537939] CR2: 000055bfa25e9110 CR3: 00000000614c6000 CR4: 00000000000006e0
[  396.537951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.537954] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  396.537956] Call Trace:
[  396.537971]  process_one_work+0x1a7/0x360
[  396.537979]  worker_thread+0x30/0x390
[  396.537985]  ? create_worker+0x1a0/0x1a0
[  396.537989]  kthread+0x112/0x130
[  396.537994]  ? kthread_flush_work_fn+0x10/0x10
[  396.538009]  ret_from_fork+0x35/0x40
[  396.538016] ---[ end trace d6b5bf9aea218e56 ]---
[  396.538221] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 101

Maurizio




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux