NULL dereferences in latest 3.0-rc4 LIO

Martin Svec <martin.svec@xxxxxxxx> · Fri, 08 Jul 2011 15:03:58 +0200

Hello,

I've pulled the latest lio-4.1 branch (head 2a940ce682163c) and there
are new NULL dereferences introduced since 2.6.39 that I can quite
regularly hit with my iscsi cluster setup. They occur when removing
busy iscsi targets during I/O activity from initiators.

First oops:

[  345.869604] BUG: unable to handle kernel NULL pointer dereference at 0000000000000168
[  345.869866] IP: [<ffffffffa00871b0>] iscsit_add_cmd_to_response_queue+0xa0/0xe0 [iscsi_target_mod]
[  345.870099] PGD 31ab98067 PUD 31aba6067 PMD 0
[  345.870335] Oops: 0000 [#1] SMP
[  345.870527] CPU 0
[  345.870573] Modules linked in: target_core_iblock target_core_file target_core_pscsi target_core_stgt scsi_tgt iscsi_target_mod target_core_mod bonding
[  345.871239]
[  345.871342] Pid: 5983, comm: LIO_iblock Not tainted 3.0.0-rc6+ #58 Dell Inc. PowerEdge R510/00HDP0
[  345.871644] RIP: 0010:[<ffffffffa00871b0>]  [<ffffffffa00871b0>] iscsit_add_cmd_to_response_queue+0xa0/0xe0 [iscsi_target_mod]
[  345.871867] RSP: 0018:ffff88031979dda0  EFLAGS: 00010246
[  345.871976] RAX: 0000000000000000 RBX: ffff88031d504c00 RCX: 0000000000000000
[  345.872090] RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffffffa00871a9
[  345.872206] RBP: ffff88031979ddd0 R08: 0000000000000000 R09: ffff880319714ef0
[  345.872319] R10: dead000000200200 R11: 000000000000004d R12: ffff880321d0a080
[  345.872320] R13: ffff88031d504f28 R14: ffff880319714ef0 R15: ffff880319714f00
[  345.872322] FS:  0000000000000000(0000) GS:ffff88032f200000(0000) knlGS:0000000000000000
[  345.872324] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  345.872325] CR2: 0000000000000168 CR3: 0000000319b2e000 CR4: 00000000000006f0
[  345.872327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  345.872329] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  345.872331] Process LIO_iblock (pid: 5983, threadinfo ffff88031979c000, task ffff8803196b4670)
[  345.872332] Stack:
[  345.872333]  ffff88031979dde0 ffff880321d0a340 ffff880320272000 ffff880320272230
[  345.872335]  ffff880320272180 0000000000000000 ffff88031979dde0 ffffffffa00926e6
[  345.872338]  ffff88031979ded0 ffffffffa003cdfa ffff88031979de00 ffff88031979de70
[  345.872340] Call Trace:
[  345.872350]  [<ffffffffa00926e6>] lio_queue_data_in+0x26/0x30 [iscsi_target_mod]
[  345.872370]  [<ffffffffa003cdfa>] transport_processing_thread+0x70a/0xdc0 [target_core_mod]
[  345.872376]  [<ffffffff81033776>] ? finish_task_switch+0x66/0xd0
[  345.872381]  [<ffffffff8153c5b1>] ? schedule+0x271/0x6e0
[  345.872386]  [<ffffffff8105bef0>] ? wake_up_bit+0x40/0x40
[  345.872393]  [<ffffffffa003c6f0>] ? transport_handle_cdb_direct+0x70/0x70 [target_core_mod]
[  345.872395]  [<ffffffff8105ba16>] kthread+0x96/0xa0
[  345.872402]  [<ffffffff815403d4>] kernel_thread_helper+0x4/0x10
[  345.872404]  [<ffffffff8105b980>] ? __init_kthread_worker+0x40/0x40
[  345.872406]  [<ffffffff815403d0>] ? gs_change+0xb/0xb
[  345.872407] Code: 00 4c 89 bb b0 03 00 00 49 89 56 10 49 89 46 18 4c 89 38 f0 41 ff 84 24 dc 00 00 00 4c 89 ef e8 c7 78 4b e1 48 8b 83 e8 03 00 00
[  345.872417]  8b b8 68 01 00 00 e8 f4 21 fb e0 48 8b 5d d8 4c 8b 65 e0 4c
[  345.872421] RIP  [<ffffffffa00871b0>] iscsit_add_cmd_to_response_queue+0xa0/0xe0 [iscsi_target_mod]
[  345.872426]  RSP <ffff88031979dda0>
[  345.872427] CR2: 0000000000000168
[  345.872454] ---[ end trace e7eac49507444d66 ]---

Gdb output for iscsit_add_cmd_to_response_queue+0xa0:

(gdb) list *(iscsit_add_cmd_to_response_queue+0xa0)
0x131e0 is in iscsit_add_cmd_to_response_queue (drivers/target/iscsi/iscsi_target_util.c:721).
716             spin_lock_bh(&conn->response_queue_lock);
717             list_add_tail(&qr->qr_list, &conn->response_queue_list);
718             atomic_inc(&cmd->response_queue_count);
719             spin_unlock_bh(&conn->response_queue_lock);
720
721             wake_up_process(conn->thread_set->tx_thread);
722     }

Second oops:

[  346.012397] Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr: ffff880320272000
[  346.012405] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  346.012407] IP: [<ffffffff810629fa>] exit_creds+0x1a/0x90
[  346.012417] PGD 30abe4067 PUD 309fc0067 PMD 0
[  346.012419] Oops: 0000 [#2] SMP
[  346.012422] CPU 1
[  346.012424] Modules linked in: target_core_iblock target_core_file target_core_pscsi target_core_stgt scsi_tgt iscsi_target_mod target_core_mod bonding
[  346.012430]
[  346.012432] Pid: 6434, comm: liofx Tainted: G      D     3.0.0-rc6+ #58 Dell Inc. PowerEdge R510/00HDP0
[  346.012435] RIP: 0010:[<ffffffff810629fa>]  [<ffffffff810629fa>] exit_creds+0x1a/0x90
[  346.012438] RSP: 0018:ffff8803197dbcb8  EFLAGS: 00010296
[  346.012439] RAX: 0000000000000000 RBX: ffff8803196b4670 RCX: 00000000000001f2
[  346.012441] RDX: 0000000000000006 RSI: ffff880309d8ac80 RDI: 0000000000000000
[  346.012442] RBP: ffff8803197dbcc8 R08: 0000000000000020 R09: 0000000000000005
[  346.012443] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  346.012445] R13: ffffffffa00d2440 R14: 0000000000000000 R15: 0000000000000000
[  346.012446] FS:  00007f0cdd7eb720(0000) GS:ffff88032f220000(0000) knlGS:0000000000000000
[  346.012448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  346.012449] CR2: 0000000000000000 CR3: 00000003197e7000 CR4: 00000000000006e0
[  346.012451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  346.012453] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  346.012454] Process liofx (pid: 6434, threadinfo ffff8803197da000, task ffff880321cfb990)
[  346.012456] Stack:
[  346.012456]  ffff8803200c3640 ffff8803196b4670 ffff8803197dbce8 ffffffff8103b8f5
[  346.012459]  ffff8803197dbce8 ffff8803196b4670 ffff8803197dbd08 ffffffff8105bd58
[  346.012462]  ffff880320272000 ffff8803200c3600 ffff8803197dbd38 ffffffffa002714b
[  346.012464] Call Trace:
[  346.012472]  [<ffffffff8103b8f5>] __put_task_struct+0x35/0xa0
[  346.012477]  [<ffffffff8105bd58>] kthread_stop+0x78/0xe0
[  346.012496]  [<ffffffffa002714b>] se_release_device_for_hba+0x3b/0xe0 [target_core_mod]
[  346.012502]  [<ffffffffa002721c>] se_free_virtual_device+0x2c/0x40 [target_core_mod]
[  346.012507]  [<ffffffffa0024fdd>] target_core_dev_release+0x6d/0xc0 [target_core_mod]
[  346.012512]  [<ffffffff81157810>] ? config_item_put+0x20/0x20
[  346.012514]  [<ffffffff81157875>] config_item_release+0x65/0xa0
[  346.012517]  [<ffffffff81157810>] ? config_item_put+0x20/0x20
[  346.012521]  [<ffffffff81282247>] kref_put+0x37/0x70
[  346.012523]  [<ffffffff81157809>] config_item_put+0x19/0x20
[  346.012525]  [<ffffffff8115632d>] configfs_rmdir+0x18d/0x240
[  346.012529]  [<ffffffff810fa208>] vfs_rmdir+0x88/0xc0
[  346.012531]  [<ffffffff810fe1db>] do_rmdir+0x10b/0x120
[  346.012534]  [<ffffffff810ef9ed>] ? vfs_write+0x12d/0x180
[  346.012536]  [<ffffffff810efb2c>] ? sys_write+0x4c/0x90
[  346.012538]  [<ffffffff810fe241>] sys_rmdir+0x11/0x20
[  346.012546]  [<ffffffff8153f37b>] system_call_fastpath+0x16/0x1b

Gdb output for se_release_device_for_hba+0x3b:

(gdb) list *(se_release_device_for_hba+0x3b)   
0x714b is in se_release_device_for_hba (drivers/target/target_core_device.c:734).
729                 (dev->dev_status & TRANSPORT_DEVICE_OFFLINE_DEACTIVATED))
730                     se_dev_stop(dev);
731
732             if (dev->dev_ptr) {
733                     kthread_stop(dev->process_thread);
734                     if (dev->transport->free_device)
735                             dev->transport->free_device(dev->dev_ptr);
736             }
737
738             spin_lock(&hba->device_lock);

Second oops usually follows close after the first one. Also
note the similarity of the first oops with the bug fixed in
70e69281e0616d18414f65a10d31e80efb91a51d ("iscsi-target: Move
conn->thread_set = NULL assignment after iscsi_release_thread_set").

Any ideas?

Martin

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html