ESXi/LIO/RBD repeatable problem, hang when cloning VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



e have experienced a repeatable issue when performing the following:

Ceph backend with no issues, we can repeat any time at will in lab and
production.  Cloning an ESXi VM to another VM on the same datastore on
which the original VM resides.  Practically instantly, the LIO machine
becomes unresponsive, Pacemaker fails over to another LIO machine and
that too becomes unresponsive.

Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64),
Ceph Hammer 0.94.2, and have been able to take quite a workoad with no
issues.

output of /var/log/syslog below.  I also have a screen dump of a
frozen system - attached.

Thank you,
Alex

Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886254] CPU: 22 PID:
18130 Comm: kworker/22:1 Tainted: G         C OE
4.1.0-040100-generic #201506220235
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886303] Hardware name:
Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
12/05/2013
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886364] Workqueue:
xcopy_wq target_xcopy_do_work [target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886395] task:
ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886440] RIP:
0010:[<ffffffffc03e4529>]  [<ffffffffc03e4529>]
sbc_check_prot+0x49/0x210 [target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886498] RSP:
0018:ffff88105bb43b88  EFLAGS: 00010246
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886525] RAX:
0000000000000400 RBX: ffff8810589eb008 RCX: 0000000000000400
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886554] RDX:
ffff8810589eb0f8 RSI: 0000000000000000 RDI: 0000000000000000
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886584] RBP:
ffff88105bb43bc8 R08: 0000000000000000 R09: 0000000000000001
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886613] R10:
0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886643] R13:
ffff88084860c000 R14: ffffffffc02372c0 R15: 0000000000000400
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886673] FS:
0000000000000000(0000) GS:ffff88105f480000(0000)
knlGS:0000000000000000
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886719] CS:  0010 DS:
0000 ES: 0000 CR0: 0000000080050033
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886747] CR2:
0000000000000010 CR3: 0000000001e0f000 CR4: 00000000001407e0
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886777] Stack:
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886798]  0000000b00000000
000000000000000c 0000000000000000 ffff8810589eb0f8
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886851]  ffff8810589eb008
ffff88084860c000 ffffffffc02372c0 0000000000000400
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886904]  ffff88105bb43c28
ffffffffc03e528a 0000000c00000000 000400000000000c
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886957] Call Trace:
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.886989]
[<ffffffffc03e528a>] sbc_parse_cdb+0x66a/0xa20 [target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887022]
[<ffffffffc0233195>] iblock_parse_cdb+0x15/0x20 [target_core_iblock]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887077]
[<ffffffffc03de950>] target_setup_cmd_from_cdb+0x1c0/0x260
[target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887133]
[<ffffffffc03ed1bd>] target_xcopy_setup_pt_cmd+0x8d/0x170
[target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887188]
[<ffffffffc03edb16>] target_xcopy_read_source.isra.12+0x126/0x220
[target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887243]
[<ffffffff81020509>] ? sched_clock+0x9/0x10
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887279]
[<ffffffffc03edf01>] target_xcopy_do_work+0xf1/0x370 [target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887329]
[<ffffffff810146a6>] ? __switch_to+0x1e6/0x580
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887361]
[<ffffffff81096414>] process_one_work+0x144/0x490
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887390]
[<ffffffff81096e7e>] worker_thread+0x11e/0x460
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887418]
[<ffffffff81096d60>] ? create_worker+0x1f0/0x1f0
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887449]
[<ffffffff8109ce59>] kthread+0xc9/0xe0
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887477]
[<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887510]
[<ffffffff8180d6a2>] ret_from_fork+0x42/0x70
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.887538]
[<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890342] Code: 7d f8 49 89
fd 4c 89 65 e0 44 0f b6 62 01 41 89 cf 48 8b be 80 00 00 00 41 8b b5
18 04 00 00 41 c0 ec 05 48 83 bb f0 01 00 00 00 <8b> 4f 10 41 89 f6 74
0a 8b 83 f8 01 00 00 85 c0 75 14 45 84 e4
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890580] RIP
[<ffffffffc03e4529>] sbc_check_prot+0x49/0x210 [target_core_mod]
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890636]  RSP <ffff88105bb43b88>
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890659] CR2: 0000000000000010
Sep  2 12:11:55 roc-4r-scd214 kernel: [86831.890956] ---[ end trace
894b2880b8116889 ]---
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204150] BUG: unable to
handle kernel paging request at ffffffffffffffd8
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204291] IP:
[<ffffffff8109d220>] kthread_data+0x10/0x20
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204392] PGD 1e12067 PUD
1e14067 PMD 0
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204563] Oops: 0000 [#2] SMP
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.204695] Modules linked
in: enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE)
enhanceio(OE) target_core_user uio rbd libceph libcrc32c
iscsi_target_mod target_core_file target_core_pscsi target_core_iblock
target_core_mod configfs xt_multiport iptable_filter ip_tables
x_tables ipmi_devintf ipmi_ssif bonding x86_pkg_temp_thermal
intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd 8021q garp mrp stp llc sb_edac joydev edac_core
mei_me lpc_ich mei ioatdma ses enclosure ipmi_si 8250_fintek
ipmi_msghandler wmi shpchp mac_hid lp parport mlx4_en vxlan
ip6_udp_tunnel udp_tunnel hid_generic igb usbhid ahci hid mpt2sas
i2c_algo_bit libahci dca ptp raid_class mlx4_core scsi_transport_sas
pps_core
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.207888] CPU: 22 PID:
18130 Comm: kworker/22:1 Tainted: G      D  C OE
4.1.0-040100-generic #201506220235
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.207972] Hardware name:
Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a
12/05/2013
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208062] task:
ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208141] RIP:
0010:[<ffffffff8109d220>]  [<ffffffff8109d220>] kthread_data+0x10/0x20
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208261] RSP:
0018:ffff88105bb43838  EFLAGS: 00010096
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208322] RAX:
0000000000000000 RBX: 0000000000000016 RCX: ffffffff820ea340
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208374] ABORT_TASK: Found
referenced iSCSI task_tag: 3511431
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208375] ABORT_TASK:
ref_tag: 3511431 already complete, skipping
Sep  2 12:12:04 roc-4r-scd214 kernel: [86833.208376] ABORT_TASK:
Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3511431

Attachment: 2015-09-02_21-07-15.png
Description: PNG image

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux