Re: ESXi/LIO/RBD repeatable problem, hang when cloning VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Dne 3.9.2015 v 9:20 Nicholas A. Bellinger napsal(a):
> (RESENDING)
>
> On Wed, 2015-09-02 at 21:14 -0400, Alex Gorbachev wrote:
>> e have experienced a repeatable issue when performing the following:
>>
>> Ceph backend with no issues, we can repeat any time at will in lab and
>> production.  Cloning an ESXi VM to another VM on the same datastore on
>> which the original VM resides.  Practically instantly, the LIO machine
>> becomes unresponsive, Pacemaker fails over to another LIO machine and
>> that too becomes unresponsive.
>>
>> Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64),
>> Ceph Hammer 0.94.2, and have been able to take quite a workoad with no
>> issues.
>>
>> output of /var/log/syslog below.  I also have a screen dump of a
>> frozen system - attached.
>>
>> Thank you,
>> Alex
>>
> The bug-fix patch to address this NULL pointer dereference with >= v4.1
> sbc_check_prot() sanity checks + EXTENDED_COPY I/O emulation has been
> sent-out with your Reported-by.
>
> Please verify with your v4.1 environment that it resolves the original
> ESX VAAI CLONE regression with a proper Tested-by tag.
>
> For now, it has also been queued to target-pending.git/for-next with a
> stable CC'.
>
> Thanks for reporting!
>
> --nab

I have the same issue when migrating a VMDK from one datastore to another. LIO target hangs
immediately inside EXTENDED_COPY and the above patch doesn't fix it. According to the oops dump,
there's another NULL pointer dereference in target_scsi3_ua_check.

Kernel: 4.1.6-stable + "Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess" patch

Dmesg output and target_scsi3_ua_check disassembly are attached below.

Thank you,
Martin


[ 1858.639055] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1858.639106] IP: [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60 [target_core_mod]
[ 1858.639160] PGD 0
[ 1858.639174] Oops: 0000 [#1] SMP
[ 1858.639195] Modules linked in: target_core_pscsi target_core_file cbc rbd libceph snd_pcm
snd_timer snd coretemp mgag200 ttm iTCO_wdt psmouse serio_raw drm_kms_helper drm soundcore
iTCO_vendor_support evdev i2c_algo_bit dcdbas joydev pcspkr acpi_power_meter wmi ipmi_devintf kvm
tpm_tis tpm 8250_fintek ipmi_si i7core_edac ipmi_msghandler lpc_ich mfd_core edac_core shpchp button
acpi_cpufreq processor thermal_sys iscsi_target_mod target_core_iblock target_core_mod configfs
autofs4 xfs dm_mod sd_modsg sr_mod cdrom hid_generic uas ata_generic usbhid usb_storage hid mptsas
scsi_transport_sas ata_piix uhci_hcd bnx2x ehci_pci                                ptp pps_core
ehci_hcd libata mdio mptscsih mptbase crc32c_generic usbcore crc32c_intel usb_common scsi_mod
libcrc32c bnx2
[ 1858.639654] CPU: 2 PID: 1293 Comm: kworker/2:1 Tainted: G          I     4.1.6-fixxcopy+ #1
[ 1858.639699] Hardware name: Dell Inc. PowerEdge R410/0N83VF, BIOS 1.11.0 07/20/2012
[ 1858.639747] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod]
[ 1858.639782] task: ffff880036f0cbe0 ti: ffff880317940000 task.ti: ffff880317940000
[ 1858.639822] RIP: 0010:[<ffffffffa01d3774>]  [<ffffffffa01d3774>] target_scsi3_ua_check+0x24/0x60
[target_core_mod]
[ 1858.639884] RSP: 0018:ffff880317943ce0  EFLAGS: 00010282
[ 1858.639913] RAX: 0000000000000000 RBX: ffff880317943dc0 RCX: 0000000000000000
[ 1858.639950] RDX: 0000000000000000 RSI: ffff880317943dd0 RDI: ffff88030eaee408
[ 1858.639987] RBP: ffff88030eaee408 R08: 0000000000000001 R09: 0000000000000001
[ 1858.640025] R10: 0000000000000000 R11: 00000000000706e0 R12: ffff880315e0a000
[ 1858.640062] R13: ffff88030eaee408 R14: 0000000000000001 R15: ffff88030eaee408
[ 1858.640100] FS:  0000000000000000(0000) GS:ffff880322e80000(0000) knlGS:0000000000000000
[ 1858.640143] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1858.640173] CR2: 0000000000000000 CR3: 000000000180d000 CR4: 00000000000006e0
[ 1858.640210] Stack:
[ 1858.640223]  ffffffffa01cadfa ffff88030eaee400 ffff880318e7c340 ffff880315e0a000
[ 1858.640267]  ffffffffa01d8c25 ffff8800cae809e0 0000000000000400 0000000000000400
[ 1858.640310]  ffff880318e7c3d0 0000000006b75800 0000000000080000 ffff88030eaee400
[ 1858.640354] Call Trace:
[ 1858.640379]  [<ffffffffa01cadfa>] ? target_setup_cmd_from_cdb+0x13a/0x2c0 [target_core_mod]
[ 1858.640429]  [<ffffffffa01d8c25>] ? target_xcopy_setup_pt_cmd+0x85/0x320 [target_core_mod]
[ 1858.640479]  [<ffffffffa01d9424>] ? target_xcopy_do_work+0x264/0x700 [target_core_mod]
[ 1858.640526]  [<ffffffff810ac3a0>] ? pick_next_task_fair+0x720/0x8f0
[ 1858.640562]  [<ffffffff8108b3fb>] ? process_one_work+0x14b/0x430
[ 1858.640595]  [<ffffffff8108bf5b>] ? worker_thread+0x6b/0x560
[ 1858.640627]  [<ffffffff8108bef0>] ? rescuer_thread+0x390/0x390
[ 1858.640661]  [<ffffffff810913b3>] ? kthread+0xd3/0xf0
[ 1858.640689]  [<ffffffff810912e0>] ? kthread_create_on_node+0x180/0x180

Dump of assembler code for function target_scsi3_ua_check:
   0x000000000001f780 <+0>:     callq  0x1f785 <target_scsi3_ua_check+5>
   0x000000000001f785 <+5>:     mov    0x80(%rdi),%rax
   0x000000000001f78c <+12>:    test   %rax,%rax
   0x000000000001f78f <+15>:    je     0x1f7d0 <target_scsi3_ua_check+80>
   0x000000000001f791 <+17>:    mov    0x18(%rax),%rax
   0x000000000001f795 <+21>:    test   %rax,%rax
   0x000000000001f798 <+24>:    je     0x1f7d0 <target_scsi3_ua_check+80>
   0x000000000001f79a <+26>:    mov    0x30(%rdi),%edx
   0x000000000001f79d <+29>:    mov    0x138(%rax),%rax
   0x000000000001f7a4 <+36>:    mov    (%rax,%rdx,8),%rax
   0x000000000001f7a8 <+40>:    mov    0x38(%rax),%edx
   0x000000000001f7ab <+43>:    xor    %eax,%eax
   0x000000000001f7ad <+45>:    test   %edx,%edx
   0x000000000001f7af <+47>:    je     0x1f7d2 <target_scsi3_ua_check+82>
   0x000000000001f7b1 <+49>:    mov    0xe8(%rdi),%rax
   0x000000000001f7b8 <+56>:    movzbl (%rax),%eax
   0x000000000001f7bb <+59>:    cmp    $0x12,%al
   0x000000000001f7bd <+61>:    je     0x1f7d0 <target_scsi3_ua_check+80>
   0x000000000001f7bf <+63>:    cmp    $0xa0,%al
   0x000000000001f7c1 <+65>:    je     0x1f7d0 <target_scsi3_ua_check+80>
   0x000000000001f7c3 <+67>:    cmp    $0x3,%al
   0x000000000001f7c5 <+69>:    je     0x1f7d0 <target_scsi3_ua_check+80>
   0x000000000001f7c7 <+71>:    mov    $0xe,%eax
   0x000000000001f7cc <+76>:    retq
   0x000000000001f7cd <+77>:    nopl   (%rax)
   0x000000000001f7d0 <+80>:    xor    %eax,%eax
   0x000000000001f7d2 <+82>:    repz retq
End of assembler dump.

(gdb) list *(0x000000000001f780 + 0x24)
0x1f7a4 is in target_scsi3_ua_check (drivers/target/target_core_ua.c:54).
49
50              nacl = sess->se_node_acl;
51              if (!nacl)
52                      return 0;
53
54              deve = nacl->device_list[cmd->orig_fe_lun];
55              if (!atomic_read(&deve->ua_count))
56                      return 0;
57              /*
58               * From sam4r14, section 5.14 Unit attention condition:

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux