OSD crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, we are experiencing severe OSD timeouts, OSDs are not taken out and we see the following in syslog on Ubuntu 14.04.2 with Firefly 0.80.9.

Thank you for any advice.

Alex


Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to handle kernel paging request at 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP: [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in: xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp co
retemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich joy
dev mei_me mei ioatdma wmi 8021q ipmi_si garp 8250_fintek mrp ipmi_msghandler stp llc bonding mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_
generic usbhid hid igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca raid_class ptp scsi_transport_sas pps_core arcmsr
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711 Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task: ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP: 0010:[<ffffffff8118e476>]  [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP: 0018:ffff880fbad571a8  EFLAGS: 00010246
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX: ffff880004000158 RBX: 000000000000000e RCX: 0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX: ffff880004000158 RSI: ffff880fbad571c0 RDI: 0000001900000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP: ffff880fbad57208 R08: 00000000000000c0 R09: 00000000000000ff
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10: 0000000000000000 R11: 0000000000000220 R12: 00000000000000b6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13: ffff880fbad57268 R14: 000000000000000a R15: ffff880fbad572d8
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS:  00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2: 000000190000001c CR3: 0000001034f0e000 CR4: 00000000000407e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262618]  ffff880fbad571f8 ffff880cf6076b30 ffff880bdde05da8 00000000000000e6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262669]  0000000000000100 ffff880cf6076b28 00000000000000b5 ffff880fbad57258
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262721]  ffff880fbad57258 ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262801]  [<ffffffff8119b482>] pagevec_lookup_entries+0x22/0x30
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262831]  [<ffffffff8119bd84>] truncate_inode_pages_range+0xf4/0x700
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262862]  [<ffffffff8119c415>] truncate_inode_pages+0x15/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262891]  [<ffffffff8119c53f>] truncate_inode_pages_final+0x5f/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262949]  [<ffffffffc0431c2c>] xfs_fs_evict_inode+0x3c/0xe0 [xfs]
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262981]  [<ffffffff81220558>] evict+0xb8/0x190
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263009]  [<ffffffff81220671>] dispose_list+0x41/0x50
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263037]  [<ffffffff8122176f>] prune_icache_sb+0x4f/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263067]  [<ffffffff81208ab5>] super_cache_scan+0x155/0x1a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263096]  [<ffffffff8119d26f>] do_shrink_slab+0x13f/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263126]  [<ffffffff811a22b0>] ? shrink_lruvec+0x330/0x370
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263157]  [<ffffffff811b4189>] ? isolate_migratepages_block+0x299/0x5c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263188]  [<ffffffff8119d558>] shrink_slab+0xd8/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263217]  [<ffffffff811a25bf>] shrink_zone+0x2cf/0x300
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263246]  [<ffffffff811b4d3d>] ? compact_zone+0x7d/0x4f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263275]  [<ffffffff811a2a64>] shrink_zones+0x104/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263304]  [<ffffffff811b53ad>] ? compact_zone_order+0x5d/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263336]  [<ffffffff810f1666>] ? ktime_get+0x46/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263365]  [<ffffffff811a2cd7>] do_try_to_free_pages+0xd7/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263396]  [<ffffffff811a3017>] try_to_free_pages+0xb7/0x170
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263427]  [<ffffffff8119571a>] __alloc_pages_nodemask+0x5ba/0x9c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263460]  [<ffffffff811dc9bc>] alloc_pages_current+0x9c/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263492]  [<ffffffff811e4f2a>] allocate_slab+0x20a/0x2e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263522]  [<ffffffff811e5031>] new_slab+0x31/0x1f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263553]  [<ffffffff817f8dd9>] __slab_alloc+0x18e/0x2a3
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263584]  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263614]  [<ffffffff816d77e7>] ? __alloc_skb+0x57/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263643]  [<ffffffff811e9b7b>] __kmalloc_node_track_caller+0xbb/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263675]  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263704]  [<ffffffff816d737c>] __kmalloc_reserve.isra.57+0x3c/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263734]  [<ffffffff816d7817>] __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263766]  [<ffffffff81737de1>] sk_stream_alloc_skb+0x41/0x130
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263796]  [<ffffffff817388b3>] tcp_sendmsg+0x2d3/0xa90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263827]  [<ffffffff81764477>] inet_sendmsg+0x67/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263858]  [<ffffffff816cea54>] ? copy_msghdr_from_user+0x154/0x1b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263891]  [<ffffffff816cdcfd>] sock_sendmsg+0x4d/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263920]  [<ffffffff816cef93>] ___sys_sendmsg+0x2b3/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263950]  [<ffffffff810a853c>] ? ttwu_do_wakeup+0x2c/0x100
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263979]  [<ffffffff810a8826>] ? ttwu_do_activate.constprop.121+0x66/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264011]  [<ffffffff810abef5>] ? try_to_wake_up+0x215/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264040]  [<ffffffff810abfb0>] ? wake_up_state+0x10/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264071]  [<ffffffff810fce86>] ? wake_futex+0x76/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264099]  [<ffffffff810fe192>] ? futex_wake+0x72/0x140
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264127]  [<ffffffff81222675>] ? __fget_light+0x25/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264155]  [<ffffffff816cf9b9>] __sys_sendmsg+0x49/0x90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264184]  [<ffffffff816cfa19>] SyS_sendmsg+0x19/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264215]  [<ffffffff8180d272>] system_call_fastpath+0x16/0x75
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0 31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad 00 00 0
0 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c 8d 47 1c 89 c8 f0 0f b1 77 1c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP  [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264499]  RSP <ffff880fbad571a8>
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace ae271fe24c8d817e ]---
Jul  3 03:45:01 roc-4r-sca020 CRON[801140]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul  2 06:28:21 roc-4r-sca020 rsyslogd: message repeated 6 times: [ [origin software="rsyslogd" swVersion="7.4.4" x-pid="722" x-info="http://www.rsyslog.com"
] rsyslogd was HUPed]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux