Hello, we are experiencing severe OSD timeouts, OSDs are not taken out and we see the following in syslog on Ubuntu 14.04.2 with Firefly 0.80.9.
Thank you for any advice.
Alex
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to handle kernel paging request at 000000190000001c
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP: [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in: xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp co
retemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich joy
dev mei_me mei ioatdma wmi 8021q ipmi_si garp 8250_fintek mrp ipmi_msghandler stp llc bonding mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_
generic usbhid hid igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca raid_class ptp scsi_transport_sas pps_core arcmsr
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711 Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task: ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP: 0010:[<ffffffff8118e476>] [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP: 0018:ffff880fbad571a8 EFLAGS: 00010246
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX: ffff880004000158 RBX: 000000000000000e RCX: 0000000000000000
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX: ffff880004000158 RSI: ffff880fbad571c0 RDI: 0000001900000000
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP: ffff880fbad57208 R08: 00000000000000c0 R09: 00000000000000ff
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10: 0000000000000000 R11: 0000000000000220 R12: 00000000000000b6
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13: ffff880fbad57268 R14: 000000000000000a R15: ffff880fbad572d8
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS: 00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2: 000000190000001c CR3: 0000001034f0e000 CR4: 00000000000407e0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack:
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262618] ffff880fbad571f8 ffff880cf6076b30 ffff880bdde05da8 00000000000000e6
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262669] 0000000000000100 ffff880cf6076b28 00000000000000b5 ffff880fbad57258
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262721] ffff880fbad57258 ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace:
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262801] [<ffffffff8119b482>] pagevec_lookup_entries+0x22/0x30
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262831] [<ffffffff8119bd84>] truncate_inode_pages_range+0xf4/0x700
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262862] [<ffffffff8119c415>] truncate_inode_pages+0x15/0x20
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262891] [<ffffffff8119c53f>] truncate_inode_pages_final+0x5f/0xa0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262949] [<ffffffffc0431c2c>] xfs_fs_evict_inode+0x3c/0xe0 [xfs]
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.262981] [<ffffffff81220558>] evict+0xb8/0x190
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263009] [<ffffffff81220671>] dispose_list+0x41/0x50
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263037] [<ffffffff8122176f>] prune_icache_sb+0x4f/0x60
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263067] [<ffffffff81208ab5>] super_cache_scan+0x155/0x1a0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263096] [<ffffffff8119d26f>] do_shrink_slab+0x13f/0x2c0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263126] [<ffffffff811a22b0>] ? shrink_lruvec+0x330/0x370
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263157] [<ffffffff811b4189>] ? isolate_migratepages_block+0x299/0x5c0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263188] [<ffffffff8119d558>] shrink_slab+0xd8/0x110
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263217] [<ffffffff811a25bf>] shrink_zone+0x2cf/0x300
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263246] [<ffffffff811b4d3d>] ? compact_zone+0x7d/0x4f0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263275] [<ffffffff811a2a64>] shrink_zones+0x104/0x2a0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263304] [<ffffffff811b53ad>] ? compact_zone_order+0x5d/0x70
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263336] [<ffffffff810f1666>] ? ktime_get+0x46/0xb0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263365] [<ffffffff811a2cd7>] do_try_to_free_pages+0xd7/0x160
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263396] [<ffffffff811a3017>] try_to_free_pages+0xb7/0x170
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263427] [<ffffffff8119571a>] __alloc_pages_nodemask+0x5ba/0x9c0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263460] [<ffffffff811dc9bc>] alloc_pages_current+0x9c/0x110
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263492] [<ffffffff811e4f2a>] allocate_slab+0x20a/0x2e0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263522] [<ffffffff811e5031>] new_slab+0x31/0x1f0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263553] [<ffffffff817f8dd9>] __slab_alloc+0x18e/0x2a3
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263584] [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263614] [<ffffffff816d77e7>] ? __alloc_skb+0x57/0x2b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263643] [<ffffffff811e9b7b>] __kmalloc_node_track_caller+0xbb/0x2b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263675] [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263704] [<ffffffff816d737c>] __kmalloc_reserve.isra.57+0x3c/0xa0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263734] [<ffffffff816d7817>] __alloc_skb+0x87/0x2b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263766] [<ffffffff81737de1>] sk_stream_alloc_skb+0x41/0x130
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263796] [<ffffffff817388b3>] tcp_sendmsg+0x2d3/0xa90
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263827] [<ffffffff81764477>] inet_sendmsg+0x67/0xa0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263858] [<ffffffff816cea54>] ? copy_msghdr_from_user+0x154/0x1b0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263891] [<ffffffff816cdcfd>] sock_sendmsg+0x4d/0x60
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263920] [<ffffffff816cef93>] ___sys_sendmsg+0x2b3/0x2c0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263950] [<ffffffff810a853c>] ? ttwu_do_wakeup+0x2c/0x100
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.263979] [<ffffffff810a8826>] ? ttwu_do_activate.constprop.121+0x66/0x70
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264011] [<ffffffff810abef5>] ? try_to_wake_up+0x215/0x2a0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264040] [<ffffffff810abfb0>] ? wake_up_state+0x10/0x20
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264071] [<ffffffff810fce86>] ? wake_futex+0x76/0xb0
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264099] [<ffffffff810fe192>] ? futex_wake+0x72/0x140
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264127] [<ffffffff81222675>] ? __fget_light+0x25/0x70
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264155] [<ffffffff816cf9b9>] __sys_sendmsg+0x49/0x90
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264184] [<ffffffff816cfa19>] SyS_sendmsg+0x19/0x20
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264215] [<ffffffff8180d272>] system_call_fastpath+0x16/0x75
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0 31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad 00 00 0
0 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c 8d 47 1c 89 c8 f0 0f b1 77 1c
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264499] RSP <ffff880fbad571a8>
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c
Jul 3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace ae271fe24c8d817e ]---
Jul 3 03:45:01 roc-4r-sca020 CRON[801140]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul 2 06:28:21 roc-4r-sca020 rsyslogd: message repeated 6 times: [ [origin software="rsyslogd" swVersion="7.4.4" x-pid="722" x-info="http://www.rsyslog.com"
] rsyslogd was HUPed]
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com