Hello, Dne 3.9.2015 v 9:20 Nicholas A. Bellinger napsal(a): > (RESENDING) > > On Wed, 2015-09-02 at 21:14 -0400, Alex Gorbachev wrote: >> e have experienced a repeatable issue when performing the following: >> >> Ceph backend with no issues, we can repeat any time at will in lab and >> production. Cloning an ESXi VM to another VM on the same datastore on >> which the original VM resides. Practically instantly, the LIO machine >> becomes unresponsive, Pacemaker fails over to another LIO machine and >> that too becomes unresponsive. >> >> Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64), >> Ceph Hammer 0.94.2, and have been able to take quite a workoad with no >> issues. >> >> output of /var/log/syslog below. I also have a screen dump of a >> frozen system - attached. >> >> Thank you, >> Alex >> > The bug-fix patch to address this NULL pointer dereference with >= v4.1 > sbc_check_prot() sanity checks + EXTENDED_COPY I/O emulation has been > sent-out with your Reported-by. > > Please verify with your v4.1 environment that it resolves the original > ESX VAAI CLONE regression with a proper Tested-by tag. > > For now, it has also been queued to target-pending.git/for-next with a > stable CC'. > > Thanks for reporting! > > --nab I have the same issue when migrating a VMDK from one datastore to another. LIO target hangs immediately inside EXTENDED_COPY and the above patch doesn't fix it. According to the oops dump, there's another NULL pointer dereference in target_scsi3_ua_check. Kernel: 4.1.6-stable + "Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess" patch Dmesg output and target_scsi3_ua_check disassembly are attached below. Thank you, Martin [ 1858.639055] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1858.639106] IP: [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod] [ 1858.639160] PGD 0 [ 1858.639174] Oops: 0000 [#1] SMP [ 1858.639195] Modules linked in: target_core_pscsi target_core_file cbc rbd libceph snd_pcm snd_timer snd coretemp mgag200 ttm iTCO_wdt psmouse serio_raw drm_kms_helper drm soundcore iTCO_vendor_support evdev i2c_algo_bit dcdbas joydev pcspkr acpi_power_meter wmi ipmi_devintf kvm tpm_tis tpm 8250_fintek ipmi_si i7core_edac ipmi_msghandler lpc_ich mfd_core edac_core shpchp button acpi_cpufreq processor thermal_sys iscsi_target_mod target_core_iblock target_core_mod configfs autofs4 xfs dm_mod sd_modsg sr_mod cdrom hid_generic uas ata_generic usbhid usb_storage hid mptsas scsi_transport_sas ata_piix uhci_hcd bnx2x ehci_pci ptp pps_core ehci_hcd libata mdio mptscsih mptbase crc32c_generic usbcore crc32c_intel usb_common scsi_mod libcrc32c bnx2 [ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G I 4.1.6-fixxcopy+ #1 [ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012 [ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod] [ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000 [ 1858.639822] RIP: 0010:[<ffffffffa01d3774>] [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod] [ 1858.639884] RSP: 0018:ffff880317943ce0 EFLAGS: 00010282 [ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000 [ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408 [ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001 [ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000 [ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408 [ 1858.640100] FS: 0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000 [ 1858.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0 [ 1858.640210] Stack: [ 1858.640223] ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000 [ 1858.640267] ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400 [ 1858.640310] ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400 [ 1858.640354] Call Trace: [ 1858.640379] [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod] [ 1858.640429] [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod] [ 1858.640479] [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod] [ 1858.640526] [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0 [ 1858.640562] [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430 [ 1858.640595] [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560 [ 1858.640627] [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390 [ 1858.640661] [<ffffffff810913b3>] ? kthread+0xd3/0xf0 [ 1858.640689] [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180 Dump of assembler code for function target_scsi3_ua_check: 0x000000000001f780 <+0>: callq 0x1f785 <target_scsi3_ua_check+5> 0x000000000001f785 <+5>: mov 0x80(%rdi),%rax 0x000000000001f78c <+12>: test %rax,%rax 0x000000000001f78f <+15>: je 0x1f7d0 <target_scsi3_ua_check+80> 0x000000000001f791 <+17>: mov 0x18(%rax),%rax 0x000000000001f795 <+21>: test %rax,%rax 0x000000000001f798 <+24>: je 0x1f7d0 <target_scsi3_ua_check+80> 0x000000000001f79a <+26>: mov 0x30(%rdi),%edx 0x000000000001f79d <+29>: mov 0x138(%rax),%rax 0x000000000001f7a4 <+36>: mov (%rax,%rdx,8),%rax 0x000000000001f7a8 <+40>: mov 0x38(%rax),%edx 0x000000000001f7ab <+43>: xor %eax,%eax 0x000000000001f7ad <+45>: test %edx,%edx 0x000000000001f7af <+47>: je 0x1f7d2 <target_scsi3_ua_check+82> 0x000000000001f7b1 <+49>: mov 0xe8(%rdi),%rax 0x000000000001f7b8 <+56>: movzbl (%rax),%eax 0x000000000001f7bb <+59>: cmp $0x12,%al 0x000000000001f7bd <+61>: je 0x1f7d0 <target_scsi3_ua_check+80> 0x000000000001f7bf <+63>: cmp $0xa0,%al 0x000000000001f7c1 <+65>: je 0x1f7d0 <target_scsi3_ua_check+80> 0x000000000001f7c3 <+67>: cmp $0x3,%al 0x000000000001f7c5 <+69>: je 0x1f7d0 <target_scsi3_ua_check+80> 0x000000000001f7c7 <+71>: mov $0xe,%eax 0x000000000001f7cc <+76>: retq 0x000000000001f7cd <+77>: nopl (%rax) 0x000000000001f7d0 <+80>: xor %eax,%eax 0x000000000001f7d2 <+82>: repz retq End of assembler dump. (gdb) list *(0x000000000001f780 + 0x24) 0x1f7a4 is in target_scsi3_ua_check (drivers/target/target_core_ua.c:54). 49 50 nacl = sess->se_node_acl; 51 if (!nacl) 52 return 0; 53 54 deve = nacl->device_list[cmd->orig_fe_lun]; 55 if (!atomic_read(&deve->ua_count)) 56 return 0; 57 /* 58 * From sam4r14, section 5.14 Unit attention condition: -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html