Hi Martin, On Mon, 2015-09-21 at 17:26 +0200, Martin Svec wrote: > Hello, > > Dne 3.9.2015 v 9:20 Nicholas A. Bellinger napsal(a): > > (RESENDING) > > > > On Wed, 2015-09-02 at 21:14 -0400, Alex Gorbachev wrote: > >> e have experienced a repeatable issue when performing the following: > >> > >> Ceph backend with no issues, we can repeat any time at will in lab and > >> production. Cloning an ESXi VM to another VM on the same datastore on > >> which the original VM resides. Practically instantly, the LIO machine > >> becomes unresponsive, Pacemaker fails over to another LIO machine and > >> that too becomes unresponsive. > >> > >> Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64), > >> Ceph Hammer 0.94.2, and have been able to take quite a workoad with no > >> issues. > >> > >> output of /var/log/syslog below. I also have a screen dump of a > >> frozen system - attached. > >> > >> Thank you, > >> Alex > >> > > The bug-fix patch to address this NULL pointer dereference with >= v4.1 > > sbc_check_prot() sanity checks + EXTENDED_COPY I/O emulation has been > > sent-out with your Reported-by. > > > > Please verify with your v4.1 environment that it resolves the original > > ESX VAAI CLONE regression with a proper Tested-by tag. > > > > For now, it has also been queued to target-pending.git/for-next with a > > stable CC'. > > > > Thanks for reporting! > > > > --nab > > I have the same issue when migrating a VMDK from one datastore to another. LIO target hangs > immediately inside EXTENDED_COPY and the above patch doesn't fix it. According to the oops dump, > there's another NULL pointer dereference in target_scsi3_ua_check. > > Kernel: 4.1.6-stable + "Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess" patch > > Dmesg output and target_scsi3_ua_check disassembly are attached below. > > Thank you, > Martin > > > [ 1858.639055] BUG: unable to handle kernel NULL pointer dereference at (null) > [ 1858.639106] IP: [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod] > [ 1858.639160] PGD 0 > [ 1858.639174] Oops: 0000 [#1] SMP > [ 1858.639195] Modules linked in: target_core_pscsi target_core_file cbc rbd libceph snd_pcm > snd_timer snd coretemp mgag200 ttm iTCO_wdt psmouse serio_raw drm_kms_helper drm soundcore > iTCO_vendor_support evdev i2c_algo_bit dcdbas joydev pcspkr acpi_power_meter wmi ipmi_devintf kvm > tpm_tis tpm 8250_fintek ipmi_si i7core_edac ipmi_msghandler lpc_ich mfd_core edac_core shpchp button > acpi_cpufreq processor thermal_sys iscsi_target_mod target_core_iblock target_core_mod configfs > autofs4 xfs dm_mod sd_modsg sr_mod cdrom hid_generic uas ata_generic usbhid usb_storage hid mptsas > scsi_transport_sas ata_piix uhci_hcd bnx2x ehci_pci ptp pps_core > ehci_hcd libata mdio mptscsih mptbase crc32c_generic usbcore crc32c_intel usb_common scsi_mod > libcrc32c bnx2 > [ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G I 4.1.6-fixxcopy+ #1 > [ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012 > [ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod] > [ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000 > [ 1858.639822] RIP: 0010:[<ffffffffa01d3774>] [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 > [target_core_mod] > [ 1858.639884] RSP: 0018:ffff880317943ce0 EFLAGS: 00010282 > [ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000 > [ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408 > [ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001 > [ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000 > [ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408 > [ 1858.640100] FS: 0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000 > [ 1858.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0 > [ 1858.640210] Stack: > [ 1858.640223] ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000 > [ 1858.640267] ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400 > [ 1858.640310] ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400 > [ 1858.640354] Call Trace: > [ 1858.640379] [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod] > [ 1858.640429] [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod] > [ 1858.640479] [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod] > [ 1858.640526] [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0 > [ 1858.640562] [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430 > [ 1858.640595] [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560 > [ 1858.640627] [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390 > [ 1858.640661] [<ffffffff810913b3>] ? kthread+0xd3/0xf0 > [ 1858.640689] [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180 > > Dump of assembler code for function target_scsi3_ua_check: > 0x000000000001f780 <+0>: callq 0x1f785 <target_scsi3_ua_check+5> > 0x000000000001f785 <+5>: mov 0x80(%rdi),%rax > 0x000000000001f78c <+12>: test %rax,%rax > 0x000000000001f78f <+15>: je 0x1f7d0 <target_scsi3_ua_check+80> > 0x000000000001f791 <+17>: mov 0x18(%rax),%rax > 0x000000000001f795 <+21>: test %rax,%rax > 0x000000000001f798 <+24>: je 0x1f7d0 <target_scsi3_ua_check+80> > 0x000000000001f79a <+26>: mov 0x30(%rdi),%edx > 0x000000000001f79d <+29>: mov 0x138(%rax),%rax > 0x000000000001f7a4 <+36>: mov (%rax,%rdx,8),%rax > 0x000000000001f7a8 <+40>: mov 0x38(%rax),%edx > 0x000000000001f7ab <+43>: xor %eax,%eax > 0x000000000001f7ad <+45>: test %edx,%edx > 0x000000000001f7af <+47>: je 0x1f7d2 <target_scsi3_ua_check+82> > 0x000000000001f7b1 <+49>: mov 0xe8(%rdi),%rax > 0x000000000001f7b8 <+56>: movzbl (%rax),%eax > 0x000000000001f7bb <+59>: cmp $0x12,%al > 0x000000000001f7bd <+61>: je 0x1f7d0 <target_scsi3_ua_check+80> > 0x000000000001f7bf <+63>: cmp $0xa0,%al > 0x000000000001f7c1 <+65>: je 0x1f7d0 <target_scsi3_ua_check+80> > 0x000000000001f7c3 <+67>: cmp $0x3,%al > 0x000000000001f7c5 <+69>: je 0x1f7d0 <target_scsi3_ua_check+80> > 0x000000000001f7c7 <+71>: mov $0xe,%eax > 0x000000000001f7cc <+76>: retq > 0x000000000001f7cd <+77>: nopl (%rax) > 0x000000000001f7d0 <+80>: xor %eax,%eax > 0x000000000001f7d2 <+82>: repz retq > End of assembler dump. > > (gdb) list *(0x000000000001f780 + 0x24) > 0x1f7a4 is in target_scsi3_ua_check (drivers/target/target_core_ua.c:54). > 49 > 50 nacl = sess->se_node_acl; > 51 if (!nacl) > 52 return 0; > 53 > 54 deve = nacl->device_list[cmd->orig_fe_lun]; > 55 if (!atomic_read(&deve->ua_count)) > 56 return 0; > 57 /* > 58 * From sam4r14, section 5.14 Unit attention condition: > Thanks for this detailed bug report. This is < v4.2 RCU se_node_acl->device_list[] NULL pointer dereference regression that is effecting v4.1.y specific code. Here's a compile tested patch to add NULL ->device_list[] sanity checks in UNIT_ATTENTION and PR non holder path code, which AFAICT should get EXTENDED_COPY I/O functioning on v4.1.y. Please verify. >From 4e43c61ff27d558af316afc8ff80d29e5babbf86 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> Date: Mon, 21 Sep 2015 23:03:56 -0700 Subject: [PATCH] target: Fix v4.1 se_node_acl->device_list[] NULL pointer bug Reported-by: Martin Svec <martin,svec@xxxxxxxx> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> --- drivers/target/target_core_pr.c | 3 +++ drivers/target/target_core_ua.c | 8 ++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c index a15411c..08aa7cc 100644 --- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -328,6 +328,9 @@ static int core_scsi3_pr_seq_non_holder( int legacy = 0; /* Act like a legacy device and return * RESERVATION CONFLICT on some CDBs */ + if (!se_sess->se_node_acl->device_list) + return; + se_deve = se_sess->se_node_acl->device_list[cmd->orig_fe_lun]; /* * Determine if the registration should be ignored due to diff --git a/drivers/target/target_core_ua.c b/drivers/target/target_core_ua.c index 1738b16..9fc33e8 100644 --- a/drivers/target/target_core_ua.c +++ b/drivers/target/target_core_ua.c @@ -48,7 +48,7 @@ target_scsi3_ua_check(struct se_cmd *cmd) return 0; nacl = sess->se_node_acl; - if (!nacl) + if (!nacl || !nacl->device_list) return 0; deve = nacl->device_list[cmd->orig_fe_lun]; @@ -90,7 +90,7 @@ int core_scsi3_ua_allocate( /* * PASSTHROUGH OPS */ - if (!nacl) + if (!nacl || !nacl->device_list) return -EINVAL; ua = kmem_cache_zalloc(se_ua_cache, GFP_ATOMIC); @@ -208,7 +208,7 @@ void core_scsi3_ua_for_check_condition( return; nacl = sess->se_node_acl; - if (!nacl) + if (!nacl || !nacl->device_list) return; spin_lock_irq(&nacl->device_list_lock); @@ -276,7 +276,7 @@ int core_scsi3_ua_clear_for_request_sense( return -EINVAL; nacl = sess->se_node_acl; - if (!nacl) + if (!nacl || !nacl->device_list) return -EINVAL; spin_lock_irq(&nacl->device_list_lock); -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html