Hey all,
I've got a ceph cluster set up (0.56.4) on a custom centos image
(base centos 6, plus kernel 3.6.9) running as a Xen dom0. I'm seeing a
lot of messages like the ones at the bottom of this message. I'm
entirely willing to believe the hardware on these is going bad (it's
donated hardware) but have run stress tests on some of these and can't
figure out what could be failing. I'm likely to blame the Myricom fiber
cards (old, I had to hack the driver a bit to get them to run here...),
but this looks like it doesn't involve that.
Any help or advice is appreciated.
Thanks in advance,
Steve
2013-04-19 17:47:50.360892 osd.0 [WRN] slow request 33.009444 seconds
old, received at 2013-04-19 17:47:17.351358: osd_op(client.6318.1:285339
rb.0.1211.238e1f29.0000000000cd [write 3674112~520192] 2.2e1e015e RETRY)
currently waiting for ondisk
This is (eventually) accompanied by a panic much like:
general protection fault: 0000 [#1] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 0
Pid: 8478, comm: ceph-osd Tainted: P O 3.6.9-7.el6.x86_64 #1
To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff811221f9>] [<ffffffff811221f9>] put_page+0x9/0x40
RSP: e02b:ffff88000b247dc8 EFLAGS: 00010212
RAX: 0000000000000110 RBX: 0000000000000012 RCX: 0000000000000010
RDX: ffff8800517a2ec0 RSI: ffff880076616840 RDI: 483f76ecd761f21e
RBP: ffff88000b247dc8 R08: ffffea00004b78c8 R09: 0000000000000000
R10: ffff880076454e80 R11: 0000000000000293 R12: ffff88006cf9cc80
general protection fault: 0000 [#1] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 0
Pid: 8478, comm: ceph-osd Tainted: P O 3.6.9-7.el6.x86_64 #1
To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff811221f9>] [<ffffffff811221f9>] put_page+0x9/0x40
RSP: e02b:ffff88000b247dc8 EFLAGS: 00010212
RAX: 0000000000000110 RBX: 0000000000000012 RCX: 0000000000000010
RDX: ffff8800517a2ec0 RSI: ffff880076616840 RDI: 483f76ecd761f21e
RBP: ffff88000b247dc8 R08: ffffea00004b78c8 R09: 0000000000000000
R10: ffff880076454e80 R11: 0000000000000293 R12: ffff88006cf9cc80general
protection fault: 0000 [#1] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 0
Pid: 8478, comm: ceph-osd Tainted: P O 3.6.9-7.el6.x86_64 #1
To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff811221f9>] [<ffffffff811221f9>] put_page+0x9/0x40
RSP: e02b:ffff88000b247dc8 EFLAGS: 00010212
RAX: 0000000000000110 RBX: 0000000000000012 RCX: 0000000000000010
RDX: ffff8800517a2ec0 RSI: ffff880076616840 RDI: 483f76ecd761f21e
RBP: ffff88000b247dc8 R08: ffffea00004b78c8 R09: 0000000000000000
R10: ffff880076454e80 R11: 0000000000000293 R12: ffff88006cf9cc80
R13: 000000000000d1b8 R14: ffff88006cf7e230 R15: ffff880061cfdd70
FS: 00007ffc0d9e7700(0000) GS:ffff880076600000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 0000000062175000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ceph-osd (pid: 8478, threadinfo ffff88000b246000, task
ffff880003ec7460)
Stack:
ffff88000b247de8 ffffffff8146cb03 ffff88006cf9cc80 0000000000000000
ffff88000b247e08 ffffffff8146cb9e 0000000000000000 ffff88006cf7e1c0
ffff88000b247e38 ffffffff814bd53e ffff88006cf7e1c0 ffff880061cfdd40
Call Trace:
[<ffffffff8146cb03>] skb_release_data+0x73/0xf0
[<ffffffff8146cb9e>] __kfree_skb+0x1e/0xa0
[<ffffffff814bd53e>] tcp_close+0x8e/0x3d0
[<ffffffff814e4c5e>] inet_release+0x5e/0x80
[<ffffffff81461969>] sock_release+0x29/0x90
[<ffffffff814619e7>] sock_close+0x17/0x30
[<ffffffff8116e573>] __fput+0xb3/0x260
[<ffffffff8116e78e>] ____fput+0xe/0x10
[<ffffffff81073f7c>] task_work_run+0x6c/0x90
[<ffffffff810149ac>] do_notify_resume+0x8c/0xa0
[<ffffffff8116ad83>] ? filp_close+0x63/0x90
[<ffffffff81562362>] int_signal+0x12/0x17
Code: 66 90 e9 0f ff ff ff 89 c1 e9 2e ff ff ff f3 90 48 8b 02 a9 00 00
00 01 75 f4 eb 88 66 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <66> f7
07 00 c0 75 17 f0 ff 4f 1c 0f 94 c0 84 c0 75 05 c9 c3 0f
RIP [<ffffffff811221f9>] put_page+0x9/0x40
RSP <ffff88000b247dc8>
---[ end trace a36b2ce7db9c7446 ]---
general protection fault: 0000 [#2] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 1
Pid: 6400, comm: ceph-osd Tainted: P D O 3.6.9-7.el6.x86_64 #1
To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff811221f9>] [<ffffffff811221f9>] put_page+0x9/0x40
RSP: e02b:ffff8800620b1dc8 EFLAGS: 00010212
RAX: 0000000000000110 RBX: 0000000000000012 RCX: 0000000000000010
RDX: ffff88004b8feec0 RSI: ffff880076716840 RDI: b90db395813cdde1
RBP: ffff8800620b1dc8 R08: ffffea000017b1d8 R09: 0000000000000000
R10: ffff880076454e80 R11: 0000000000000293 R12: ffff880067434ac0
R13: 000000000000d1b8 R14: ffff88006e2ad7b0 R15: ffff88006ffc70b0
FS: 00007ffc45680700(0000) GS:ffff880076700000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 0000000062175000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ceph-osd (pid: 6400, threadinfo ffff8800620b0000, task
ffff88006e2a54e0)
Stack:
ffff8800620b1de8 ffffffff8146cb03 ffff880067434ac0 0000000000000000
ffff8800620b1e08 ffffffff8146cb9e 0000000000000000 ffff88006e2ad740
ffff8800620b1e38 ffffffff814bd53e ffff88006e2ad740 ffff88006ffc7080
Call Trace:
[<ffffffff8146cb03>] skb_release_data+0x73/0xf0
[<ffffffff8146cb9e>] __kfree_skb+0x1e/0xa0
[<ffffffff814bd53e>] tcp_close+0x8e/0x3d0
[<ffffffff814e4c5e>] inet_release+0x5e/0x80
[<ffffffff81461969>] sock_release+0x29/0x90
[<ffffffff814619e7>] sock_close+0x17/0x30
[<ffffffff8116e573>] __fput+0xb3/0x260
[<ffffffff8116e78e>] ____fput+0xe/0x10
[<ffffffff81073f7c>] task_work_run+0x6c/0x90
[<ffffffff810149ac>] do_notify_resume+0x8c/0xa0
[<ffffffff8116ad83>] ? filp_close+0x63/0x90
[<ffffffff81562362>] int_signal+0x12/0x17
Code: 66 90 e9 0f ff ff ff 89 c1 e9 2e ff ff ff f3 90 48 8b 02 a9 00 00
00 01 75 f4 eb 88 66 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 <66> f7
07 00 c0 75 17 f0 ff 4f 1c 0f 94 c0 84 c0 75 05 c9 c3 0f
RIP [<ffffffff811221f9>] put_page+0x9/0x40
RSP <ffff8800620b1dc8>
---[ end trace a36b2ce7db9c7447 ]---
kernel BUG at net/ceph/osd_client.c:598!
invalid opcode: 0000 [#4] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 0
Pid: 8556, comm: kworker/0:0 Tainted: P D O 3.6.9-7.el6.x86_64
#1 To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff815320ea>] [<ffffffff815320ea>]
__kick_osd_requests+0x17a/0x1f0
RSP: e02b:ffff88004e587d20 EFLAGS: 00010297
RAX: ffff8800159fb820 RBX: ffff8800046f1450 RCX: 0000000000000000
RDX: ffff8800159fb850 RSI: ffff880076600200 RDI: ffff880076600200
RBP: ffff88004e587d60 R08: ffff88007660e420 R09: ffd8264126645403
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880062018740
R13: ffff88006c9f8cb0 R14: ffff8800159fb800 R15: ffffffff8182b3a2
FS: 00007f25814cc7c0(0000) GS:ffff880076600000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 000000006d49d000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 8556, threadinfo ffff88004e586000, task
ffff880003fff560)
Stack:
ffff88004e587d40 ffff88006c9f8800 0000000000000000 ffff88006c9f8800
ffff880062018740 ffff880062018750 ffff880062018798 ffffffff8182b3a2
ffff88004e587d90 ffffffff815324fa ffff88006c9f8830 ffff88006c9f8860
Call Trace:
[<ffffffff815324fa>] osd_reset+0x5a/0xb0
[<ffffffff8152c0ab>] ceph_fault+0xfb/0x380
[<ffffffff8152d690>] ? try_read+0x6a0/0x6a0
[<ffffffff8152d84b>] con_work+0x1bb/0x2c0
[<ffffffff810706d9>] process_one_work+0x179/0x4b0
[<ffffffff81071dd6>] worker_thread+0x126/0x320
[<ffffffff81071cb0>] ? manage_workers+0x190/0x190
[<ffffffff8107766e>] kthread+0x9e/0xb0
[<ffffffff81563144>] kernel_thread_helper+0x4/0x10
[<ffffffff8155a0b8>] ? retint_restore_args+0x5/0x6
[<ffffffff81563140>] ? gs_change+0x13/0x13
Code: e7 e8 db f5 ff ff f6 05 56 b0 5a 00 04 75 20 48 8d 43 50 48 8b 53
50 49 39 c5 75 8d 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f c9 c3 <0f> 0b
eb fe 48 8b 55 c8 49 8b 0e 48 c7 c6 18 35 83 81 48 c7 c7
RIP [<ffffffff815320ea>] __kick_osd_requests+0x17a/0x1f0
RSP <ffff88004e587d20>
---[ end trace a36b2ce7db9c7449 ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff810770a0>] kthread_data+0x10/0x20
PGD 1a0d067 PUD 1a0e067 PMD 0
Oops: 0000 [#5] SMP
Modules linked in: cbc ip6table_filter ip6_tables ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
iptable_filter ip_tables bridge stp llc xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd btrfs
zlib_deflate openafs(PO) mx_driver(PO) mx_mcp(PO) autofs4 nfsv4
auth_rpcgss nfs fscache lockd sunrpc ipv6 tg3 ppdev freq_table mperf
pcspkr serio_raw k8temp edac_core edac_mce_amd i2c_amd756 amd_rng
i2c_amd8111 i2c_core parport_pc parport sg shpchp ext4 mbcache jbd2
sd_mod crc_t10dif floppy sata_sil pata_acpi ata_generic pata_amd
dm_mirror dm_region_hash dm_log dm_mod
CPU 0
Pid: 8556, comm: kworker/0:0 Tainted: P D O 3.6.9-7.el6.x86_64
#1 To be filled by O.E.M. To be filles by O.E.M./S2880 Thunder K8S
RIP: e030:[<ffffffff810770a0>] [<ffffffff810770a0>] kthread_data+0x10/0x20
RSP: e02b:ffff88004e5879f8 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff8800766139c0 RCX: 0000000000000000
RDX: ffffffff81d55aa0 RSI: 0000000000000000 RDI: ffff880003fff560
RBP: ffff88004e5879f8 R08: ffff880003fff5d0 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000006
FS: 00007f272ffff700(0000) GS:ffff880076600000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: fffffffffffffff8 CR3: 000000001589a000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 8556, threadinfo ffff88004e586000, task
ffff880003fff560)
Stack:
ffff88004e587a28 ffffffff8106fc41 ffff88004e587a28 ffff8800766139c0
0000000000000000 ffff880003fffb10 ffff88004e587ab8 ffffffff81558d03
ffff88004e587fd8 00000000000139c0 ffff88004e586010 00000000000139c0
Call Trace:
[<ffffffff8106fc41>] wq_worker_sleeping+0x21/0xa0
[<ffffffff81558d03>] __schedule+0x593/0x700
[<ffffffff81559199>] schedule+0x29/0x70
[<ffffffff8105a4cd>] do_exit+0x2bd/0x470
[<ffffffff8155aebb>] oops_end+0xab/0xf0
[<ffffffff810175db>] die+0x5b/0x90
[<ffffffff8155aa14>] do_trap+0xc4/0x160
[<ffffffff810154f5>] do_invalid_op+0x95/0xb0
[<ffffffff815320ea>] ? __kick_osd_requests+0x17a/0x1f0
[<ffffffff8106fded>] ? __queue_work+0xfd/0x3d0
[<ffffffff810df03d>] ? __call_rcu_core+0xbd/0x170
[<ffffffff81562fbb>] invalid_op+0x1b/0x20
[<ffffffff815320ea>] ? __kick_osd_requests+0x17a/0x1f0
[<ffffffff815324fa>] osd_reset+0x5a/0xb0
[<ffffffff8152c0ab>] ceph_fault+0xfb/0x380
[<ffffffff8152d690>] ? try_read+0x6a0/0x6a0
[<ffffffff8152d84b>] con_work+0x1bb/0x2c0
[<ffffffff810706d9>] process_one_work+0x179/0x4b0
[<ffffffff81071dd6>] worker_thread+0x126/0x320
[<ffffffff81071cb0>] ? manage_workers+0x190/0x190
[<ffffffff8107766e>] kthread+0x9e/0xb0
[<ffffffff81563144>] kernel_thread_helper+0x4/0x10
[<ffffffff8155a0b8>] ? retint_restore_args+0x5/0x6
[<ffffffff81563140>] ? gs_change+0x13/0x13
Code: 66 66 66 90 65 48 8b 04 25 40 c6 00 00 48 8b 80 58 05 00 00 8b 40
f0 c9 c3 66 90 55 48 89 e5 66 66 66 66 90 48 8b 87 58 05 00 00 <48> 8b
40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
RIP [<ffffffff810770a0>] kthread_data+0x10/0x20
RSP <ffff88004e5879f8>
CR2: fffffffffffffff8
---[ end trace a36b2ce7db9c744a ]---
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com