Re: ESXi/LIO/RBD repeatable problem, hang when cloning VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 3, 2015 at 6:58 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> EnhanceIO? I'd say get rid of that first and then try reproducing it.

Jan, EnhanceIO has not been used in this case, in fact we have never
had a problem with it in read cache mode.

Thank you,
Alex

>
> Jan
>
>> On 03 Sep 2015, at 03:14, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> e have experienced a repeatable issue when performing the following:
>>
>> Ceph backend with no issues, we can repeat any time at will in lab and
>> production.  Cloning an ESXi VM to another VM on the same datastore on
>> which the original VM resides.  Practically instantly, the LIO machine
>> becomes unresponsive, Pacemaker fails over to another LIO machine and
>> that too becomes unresponsive.
>>
>> Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64),
>> Ceph Hammer 0.94.2, and have been able to take quite a workoad with no
>> issues.
>>
>> output of /var/log/syslog below.  I also have a screen dump of a
>> frozen system - attached.
>>
>> Thank you,
>> Alex
>>
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886254] CPU: 22 PID:
>> 18130 Comm: kworker/22:1 Tainted: G         C OE
>> 4.1.0-040100-generic #201506220235
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886303] Hardware name:
>> Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
>> 12/05/2013
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886364] Workqueue:
>> xcopy_wq target_xcopy_do_work [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886395] task:
>> ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886440] RIP:
>> 0010:[<ffffffffc03e4529>]  [<ffffffffc03e4529>]
>> sbc_check_prot+0x49/0x210 [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886498] RSP:
>> 0018:ffff88105bb43b88  EFLAGS: 00010246
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886525] RAX:
>> 0000000000000400 RBX: ffff8810589eb008 RCX: 0000000000000400
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886554] RDX:
>> ffff8810589eb0f8 RSI: 0000000000000000 RDI: 0000000000000000
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886584] RBP:
>> ffff88105bb43bc8 R08: 0000000000000000 R09: 0000000000000001
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886613] R10:
>> 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886643] R13:
>> ffff88084860c000 R14: ffffffffc02372c0 R15: 0000000000000400
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886673] FS:
>> 0000000000000000(0000) GS:ffff88105f480000(0000)
>> knlGS:0000000000000000
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886719] CS:  0010 DS:
>> 0000 ES: 0000 CR0: 0000000080050033
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886747] CR2:
>> 0000000000000010 CR3: 0000000001e0f000 CR4: 00000000001407e0
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886777] Stack:
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886798]  0000000b00000000
>> 000000000000000c 0000000000000000 ffff8810589eb0f8
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886851]  ffff8810589eb008
>> ffff88084860c000 ffffffffc02372c0 0000000000000400
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886904]  ffff88105bb43c28
>> ffffffffc03e528a 0000000c00000000 000400000000000c
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886957] Call Trace:
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886989]
>> [<ffffffffc03e528a>] sbc_parse_cdb+0x66a/0xa20 [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887022]
>> [<ffffffffc0233195>] iblock_parse_cdb+0x15/0x20 [target_core_iblock]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887077]
>> [<ffffffffc03de950>] target_setup_cmd_from_cdb+0x1c0/0x260
>> [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887133]
>> [<ffffffffc03ed1bd>] target_xcopy_setup_pt_cmd+0x8d/0x170
>> [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887188]
>> [<ffffffffc03edb16>] target_xcopy_read_source.isra.12+0x126/0x220
>> [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887243]
>> [<ffffffff81020509>] ? sched_clock+0x9/0x10
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887279]
>> [<ffffffffc03edf01>] target_xcopy_do_work+0xf1/0x370 [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887329]
>> [<ffffffff810146a6>] ? __switch_to+0x1e6/0x580
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887361]
>> [<ffffffff81096414>] process_one_work+0x144/0x490
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887390]
>> [<ffffffff81096e7e>] worker_thread+0x11e/0x460
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887418]
>> [<ffffffff81096d60>] ? create_worker+0x1f0/0x1f0
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887449]
>> [<ffffffff8109ce59>] kthread+0xc9/0xe0
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887477]
>> [<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887510]
>> [<ffffffff8180d6a2>] ret_from_fork+0x42/0x70
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887538]
>> [<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890342] Code: 7d f8 49 89
>> fd 4c 89 65 e0 44 0f b6 62 01 41 89 cf 48 8b be 80 00 00 00 41 8b b5
>> 18 04 00 00 41 c0 ec 05 48 83 bb f0 01 00 00 00 <8b> 4f 10 41 89 f6 74
>> 0a 8b 83 f8 01 00 00 85 c0 75 14 45 84 e4
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890580] RIP
>> [<ffffffffc03e4529>] sbc_check_prot+0x49/0x210 [target_core_mod]
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890636]  RSP <ffff88105bb43b88>
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890659] CR2: 0000000000000010
>> Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890956] ---[ end trace
>> 894b2880b8116889 ]---
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204150] BUG: unable to
>> handle kernel paging request at ffffffffffffffd8
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204291] IP:
>> [<ffffffff8109d220>] kthread_data+0x10/0x20
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204392] PGD 1e12067 PUD
>> 1e14067 PMD 0
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204563] Oops: 0000 [#2] SMP
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204695] Modules linked
>> in: enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE)
>> enhanceio(OE) target_core_user uio rbd libceph libcrc32c
>> iscsi_target_mod target_core_file target_core_pscsi target_core_iblock
>> target_core_mod configfs xt_multiport iptable_filter ip_tables
>> x_tables ipmi_devintf ipmi_ssif bonding x86_pkg_temp_thermal
>> intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul
>> ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
>> ablk_helper cryptd 8021q garp mrp stp llc sb_edac joydev edac_core
>> mei_me lpc_ich mei ioatdma ses enclosure ipmi_si 8250_fintek
>> ipmi_msghandler wmi shpchp mac_hid lp parport mlx4_en vxlan
>> ip6_udp_tunnel udp_tunnel hid_generic igb usbhid ahci hid mpt2sas
>> i2c_algo_bit libahci dca ptp raid_class mlx4_core scsi_transport_sas
>> pps_core
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.207888] CPU: 22 PID:
>> 18130 Comm: kworker/22:1 Tainted: G      D  C OE
>> 4.1.0-040100-generic #201506220235
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.207972] Hardware name:
>> Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
>> 12/05/2013
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208062] task:
>> ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208141] RIP:
>> 0010:[<ffffffff8109d220>]  [<ffffffff8109d220>] kthread_data+0x10/0x20
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208261] RSP:
>> 0018:ffff88105bb43838  EFLAGS: 00010096
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208322] RAX:
>> 0000000000000000 RBX: 0000000000000016 RCX: ffffffff820ea340
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208374] ABORT_TASK: Found
>> referenced iSCSI task_tag: 3511431
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208375] ABORT_TASK:
>> ref_tag: 3511431 already complete, skipping
>> Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208376] ABORT_TASK:
>> Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3511431
>> <2015-09-02_21-07-15.png>_______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux