On Sun, Jul 30, 2017 at 5:15 PM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > On Thu, 2017-07-27 at 13:56 -0700, Justin Maggard wrote: >> I ran across a GPF after disabling demo-mode on an iSCSI target. The >> backtrace pointed to target_put_nacl(), which was called from >> transport_free_session(). > > Can we see actual stack trace, and kernel version please..? > Sure. kernel: general protection fault: 0000 [#1] SMP kernel: Modules linked in: kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20 kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014 kernel: task: ffff88026939e800 task.stack: ffffc90007884000 kernel: RIP: 0010:target_put_nacl+0x49/0xb0 kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246 kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000 kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028 kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000 kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020 kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540 kernel: FS: 0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: transport_free_session+0x67/0x140 kernel: transport_deregister_session+0x7a/0xc0 kernel: iscsit_close_session+0x92/0x210 kernel: iscsit_close_connection+0x5f9/0x840 kernel: iscsit_take_action_for_connection_exit+0xfe/0x110 kernel: iscsi_target_tx_thread+0x140/0x1e0 kernel: ? wait_woken+0x90/0x90 kernel: kthread+0x124/0x160 kernel: ? iscsit_thread_get_cpumask+0x90/0x90 kernel: ? kthread_create_on_node+0x40/0x40 kernel: ret_from_fork+0x22/0x30 kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c 89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89 ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20 kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70 kernel: ---[ end trace f12821adbfd46fed ]--- >> A look at that function shows a pretty clear problem: >> >> 543 mutex_unlock(&se_tpg->acl_node_mutex); >> 544 >> 545 if (se_nacl->dynamic_stop) >> 546 target_put_nacl(se_nacl); >> 547 >> 548 target_put_nacl(se_nacl); >> 549 } >> >> So we make two calls to target_put_nacl() when se_nacl->dynamic_stop is true. > > Correct. > >> This got introduced by 01d4d6735589 (target: Fix multi-session dynamic >> se_node_acl double free OOPs). It looks to me like we can just >> eliminate that first call, but I could easily be missing something. > > The double put in transport_free_session() when se_nacl->dynamic_stop is > set is expected behavior, because when a dynamic se_node_acl is > generated it takes the initial reference in target_alloc_node_acl(), and > then a second reference in core_tpg_check_initiator_node_acl(): > > /* > * When allocating a dynamically generated node_acl, go ahead > * and take the extra kref now before returning to the fabric > * driver caller. > * > * Note this reference will be released at session shutdown > * time within transport_free_session() code. > */ > kref_get(&acl->acl_kref); > acl->dynamic_node_acl = 1; > > So for se_node_acl->dynamic_node_acl = 0 in transport_free_session(), > it's just dropping the se_node_acl->nacl_kref obtained by > core_tpg_get_initiator_node_acl(). Eg, it's not dropping the last > reference.. > > But during the se_node_acl->dynamic_node_acl = 1 free special case, > transport_free_session() is responsible for both target_put_nacl() calls > to actually free se_node_acl. > > That is, normally for non dynamic ACLs, it's the configfs callback for > se_node_acl->acl_group rmdir into core_tpg_del_initiator_node_acl() that > drops the reference: > > target_put_nacl(acl); > /* > * Wait for last target_put_nacl() to complete in target_complete_nacl() > * for active fabric session transport_deregister_session() callbacks. > */ > wait_for_completion(&acl->acl_free_comp); > > But since se_node_acl->dynamic_node_acl = 1 doesn't have a corresponding > configfs entry for user-space to remove, it's left up to > transport_free_session() to set se_node_acl->dynamic_stop and drop both > references to free se_node_acl. > Thanks a lot, that clears up my misunderstanding. I should have read the kref documentation carefully first. So it looks like what's *really* happening is, transport_free_session() does a list_del(acl_list). At least in my case, this empties the list. Then, target_put_nacl() calls target_complete_nacl(), which also does a list_del(acl_list) and crashes. -Justin -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html