e have experienced a repeatable issue when performing the following: Ceph backend with no issues, we can repeat any time at will in lab and production. Cloning an ESXi VM to another VM on the same datastore on which the original VM resides. Practically instantly, the LIO machine becomes unresponsive, Pacemaker fails over to another LIO machine and that too becomes unresponsive. Both running Ubuntu 14.04, kernel 4.1 (4.1.0-040100-generic x86_64), Ceph Hammer 0.94.2, and have been able to take quite a workoad with no issues. output of /var/log/syslog below. I also have a screen dump of a frozen system - attached. Thank you, Alex Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886254] CPU: 22 PID: 18130 Comm: kworker/22:1 Tainted: G C OE 4.1.0-040100-generic #201506220235 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886303] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886364] Workqueue: xcopy_wq target_xcopy_do_work [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886395] task: ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886440] RIP: 0010:[<ffffffffc03e4529>] [<ffffffffc03e4529>] sbc_check_prot+0x49/0x210 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886498] RSP: 0018:ffff88105bb43b88 EFLAGS: 00010246 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886525] RAX: 0000000000000400 RBX: ffff8810589eb008 RCX: 0000000000000400 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886554] RDX: ffff8810589eb0f8 RSI: 0000000000000000 RDI: 0000000000000000 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886584] RBP: ffff88105bb43bc8 R08: 0000000000000000 R09: 0000000000000001 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886613] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886643] R13: ffff88084860c000 R14: ffffffffc02372c0 R15: 0000000000000400 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886673] FS: 0000000000000000(0000) GS:ffff88105f480000(0000) knlGS:0000000000000000 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886719] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886747] CR2: 0000000000000010 CR3: 0000000001e0f000 CR4: 00000000001407e0 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886777] Stack: Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886798] 0000000b00000000 000000000000000c 0000000000000000 ffff8810589eb0f8 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886851] ffff8810589eb008 ffff88084860c000 ffffffffc02372c0 0000000000000400 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886904] ffff88105bb43c28 ffffffffc03e528a 0000000c00000000 000400000000000c Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886957] Call Trace: Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.886989] [<ffffffffc03e528a>] sbc_parse_cdb+0x66a/0xa20 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887022] [<ffffffffc0233195>] iblock_parse_cdb+0x15/0x20 [target_core_iblock] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887077] [<ffffffffc03de950>] target_setup_cmd_from_cdb+0x1c0/0x260 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887133] [<ffffffffc03ed1bd>] target_xcopy_setup_pt_cmd+0x8d/0x170 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887188] [<ffffffffc03edb16>] target_xcopy_read_source.isra.12+0x126/0x220 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887243] [<ffffffff81020509>] ? sched_clock+0x9/0x10 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887279] [<ffffffffc03edf01>] target_xcopy_do_work+0xf1/0x370 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887329] [<ffffffff810146a6>] ? __switch_to+0x1e6/0x580 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887361] [<ffffffff81096414>] process_one_work+0x144/0x490 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887390] [<ffffffff81096e7e>] worker_thread+0x11e/0x460 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887418] [<ffffffff81096d60>] ? create_worker+0x1f0/0x1f0 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887449] [<ffffffff8109ce59>] kthread+0xc9/0xe0 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887477] [<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887510] [<ffffffff8180d6a2>] ret_from_fork+0x42/0x70 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.887538] [<ffffffff8109cd90>] ? flush_kthread_worker+0x90/0x90 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.890342] Code: 7d f8 49 89 fd 4c 89 65 e0 44 0f b6 62 01 41 89 cf 48 8b be 80 00 00 00 41 8b b5 18 04 00 00 41 c0 ec 05 48 83 bb f0 01 00 00 00 <8b> 4f 10 41 89 f6 74 0a 8b 83 f8 01 00 00 85 c0 75 14 45 84 e4 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.890580] RIP [<ffffffffc03e4529>] sbc_check_prot+0x49/0x210 [target_core_mod] Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.890636] RSP <ffff88105bb43b88> Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.890659] CR2: 0000000000000010 Sep 2 12:11:55 roc-4r-scd214 kernel: [86831.890956] ---[ end trace 894b2880b8116889 ]--- Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.204150] BUG: unable to handle kernel paging request at ffffffffffffffd8 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.204291] IP: [<ffffffff8109d220>] kthread_data+0x10/0x20 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.204392] PGD 1e12067 PUD 1e14067 PMD 0 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.204563] Oops: 0000 [#2] SMP Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.204695] Modules linked in: enhanceio_rand(OE) enhanceio_lru(OE) enhanceio_fifo(OE) enhanceio(OE) target_core_user uio rbd libceph libcrc32c iscsi_target_mod target_core_file target_core_pscsi target_core_iblock target_core_mod configfs xt_multiport iptable_filter ip_tables x_tables ipmi_devintf ipmi_ssif bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd 8021q garp mrp stp llc sb_edac joydev edac_core mei_me lpc_ich mei ioatdma ses enclosure ipmi_si 8250_fintek ipmi_msghandler wmi shpchp mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_generic igb usbhid ahci hid mpt2sas i2c_algo_bit libahci dca ptp raid_class mlx4_core scsi_transport_sas pps_core Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.207888] CPU: 22 PID: 18130 Comm: kworker/22:1 Tainted: G D C OE 4.1.0-040100-generic #201506220235 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.207972] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208062] task: ffff8810441c3250 ti: ffff88105bb40000 task.ti: ffff88105bb40000 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208141] RIP: 0010:[<ffffffff8109d220>] [<ffffffff8109d220>] kthread_data+0x10/0x20 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208261] RSP: 0018:ffff88105bb43838 EFLAGS: 00010096 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208322] RAX: 0000000000000000 RBX: 0000000000000016 RCX: ffffffff820ea340 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208374] ABORT_TASK: Found referenced iSCSI task_tag: 3511431 Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208375] ABORT_TASK: ref_tag: 3511431 already complete, skipping Sep 2 12:12:04 roc-4r-scd214 kernel: [86833.208376] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 3511431
Attachment:
2015-09-02_21-07-15.png
Description: PNG image
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com