Re: OSD crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Jan.  /proc/sys/vm/min_free_kbytes was set to 32M, I set it to 256M with system having 64 GB RAM.  Also my swappiness was set to 0, no problems in lab tests, but I wonder if we hit some limit on 24/7 OSD operation.

I will update after some days of running with these parameter.  Best regards, Alex

On Fri, Jul 3, 2015 at 6:27 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
What’s the value of /proc/sys/vm/min_free_kbytes on your system? Increase it to 256M (better do it if there’s lots of free memory) and see if it helps.
It can also be set too high, hard to find any formula how to set it correctly...

Jan


On 03 Jul 2015, at 10:16, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:

Hello, we are experiencing severe OSD timeouts, OSDs are not taken out and we see the following in syslog on Ubuntu 14.04.2 with Firefly 0.80.9.

Thank you for any advice.

Alex


Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261899] BUG: unable to handle kernel paging request at 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261923] IP: [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261941] PGD 1035954067 PUD 0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261955] Oops: 0000 [#1] SMP
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.261969] Modules linked in: xfs libcrc32c ipmi_ssif intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp co
retemp kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich joy
dev mei_me mei ioatdma wmi 8021q ipmi_si garp 8250_fintek mrp ipmi_msghandler stp llc bonding mac_hid lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel hid_
generic usbhid hid igb ahci mpt2sas mlx4_core i2c_algo_bit libahci dca raid_class ptp scsi_transport_sas pps_core arcmsr
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262182] CPU: 10 PID: 8711 Comm: ceph-osd Not tainted 4.1.0-040100-generic #201506220235
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262197] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0a 12/05/2013
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262215] task: ffff8800721f1420 ti: ffff880fbad54000 task.ti: ffff880fbad54000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262229] RIP: 0010:[<ffffffff8118e476>]  [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262248] RSP: 0018:ffff880fbad571a8  EFLAGS: 00010246
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262258] RAX: ffff880004000158 RBX: 000000000000000e RCX: 0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262303] RDX: ffff880004000158 RSI: ffff880fbad571c0 RDI: 0000001900000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262347] RBP: ffff880fbad57208 R08: 00000000000000c0 R09: 00000000000000ff
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262391] R10: 0000000000000000 R11: 0000000000000220 R12: 00000000000000b6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262435] R13: ffff880fbad57268 R14: 000000000000000a R15: ffff880fbad572d8
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262479] FS:  00007f98cb0e0700(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262551] CR2: 000000190000001c CR3: 0000001034f0e000 CR4: 00000000000407e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262596] Stack:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262618]  ffff880fbad571f8 ffff880cf6076b30 ffff880bdde05da8 00000000000000e6
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262669]  0000000000000100 ffff880cf6076b28 00000000000000b5 ffff880fbad57258
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262721]  ffff880fbad57258 ffff880fbad572d8 ffffffffffffffff ffff880cf6076b28
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262772] Call Trace:
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262801]  [<ffffffff8119b482>] pagevec_lookup_entries+0x22/0x30
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262831]  [<ffffffff8119bd84>] truncate_inode_pages_range+0xf4/0x700
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262862]  [<ffffffff8119c415>] truncate_inode_pages+0x15/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262891]  [<ffffffff8119c53f>] truncate_inode_pages_final+0x5f/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262949]  [<ffffffffc0431c2c>] xfs_fs_evict_inode+0x3c/0xe0 [xfs]
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.262981]  [<ffffffff81220558>] evict+0xb8/0x190
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263009]  [<ffffffff81220671>] dispose_list+0x41/0x50
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263037]  [<ffffffff8122176f>] prune_icache_sb+0x4f/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263067]  [<ffffffff81208ab5>] super_cache_scan+0x155/0x1a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263096]  [<ffffffff8119d26f>] do_shrink_slab+0x13f/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263126]  [<ffffffff811a22b0>] ? shrink_lruvec+0x330/0x370
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263157]  [<ffffffff811b4189>] ? isolate_migratepages_block+0x299/0x5c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263188]  [<ffffffff8119d558>] shrink_slab+0xd8/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263217]  [<ffffffff811a25bf>] shrink_zone+0x2cf/0x300
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263246]  [<ffffffff811b4d3d>] ? compact_zone+0x7d/0x4f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263275]  [<ffffffff811a2a64>] shrink_zones+0x104/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263304]  [<ffffffff811b53ad>] ? compact_zone_order+0x5d/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263336]  [<ffffffff810f1666>] ? ktime_get+0x46/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263365]  [<ffffffff811a2cd7>] do_try_to_free_pages+0xd7/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263396]  [<ffffffff811a3017>] try_to_free_pages+0xb7/0x170
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263427]  [<ffffffff8119571a>] __alloc_pages_nodemask+0x5ba/0x9c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263460]  [<ffffffff811dc9bc>] alloc_pages_current+0x9c/0x110
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263492]  [<ffffffff811e4f2a>] allocate_slab+0x20a/0x2e0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263522]  [<ffffffff811e5031>] new_slab+0x31/0x1f0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263553]  [<ffffffff817f8dd9>] __slab_alloc+0x18e/0x2a3
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263584]  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263614]  [<ffffffff816d77e7>] ? __alloc_skb+0x57/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263643]  [<ffffffff811e9b7b>] __kmalloc_node_track_caller+0xbb/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263675]  [<ffffffff816d7817>] ? __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263704]  [<ffffffff816d737c>] __kmalloc_reserve.isra.57+0x3c/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263734]  [<ffffffff816d7817>] __alloc_skb+0x87/0x2b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263766]  [<ffffffff81737de1>] sk_stream_alloc_skb+0x41/0x130
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263796]  [<ffffffff817388b3>] tcp_sendmsg+0x2d3/0xa90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263827]  [<ffffffff81764477>] inet_sendmsg+0x67/0xa0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263858]  [<ffffffff816cea54>] ? copy_msghdr_from_user+0x154/0x1b0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263891]  [<ffffffff816cdcfd>] sock_sendmsg+0x4d/0x60
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263920]  [<ffffffff816cef93>] ___sys_sendmsg+0x2b3/0x2c0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263950]  [<ffffffff810a853c>] ? ttwu_do_wakeup+0x2c/0x100
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.263979]  [<ffffffff810a8826>] ? ttwu_do_activate.constprop.121+0x66/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264011]  [<ffffffff810abef5>] ? try_to_wake_up+0x215/0x2a0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264040]  [<ffffffff810abfb0>] ? wake_up_state+0x10/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264071]  [<ffffffff810fce86>] ? wake_futex+0x76/0xb0
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264099]  [<ffffffff810fe192>] ? futex_wake+0x72/0x140
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264127]  [<ffffffff81222675>] ? __fget_light+0x25/0x70
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264155]  [<ffffffff816cf9b9>] __sys_sendmsg+0x49/0x90
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264184]  [<ffffffff816cfa19>] SyS_sendmsg+0x19/0x20
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264215]  [<ffffffff8180d272>] system_call_fastpath+0x16/0x75
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264243] Code: 00 4c 89 65 c0 31 d2 e9 86 00 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 3a 48 85 ff 0f 84 ad 00 00 0
0 40 f6 c7 03 0f 85 a9 00 00 00 <8b> 4f 1c 85 c9 74 e3 8d 71 01 4c 8d 47 1c 89 c8 f0 0f b1 77 1c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264467] RIP  [<ffffffff8118e476>] find_get_entries+0x66/0x160
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264499]  RSP <ffff880fbad571a8>
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264522] CR2: 000000190000001c
Jul  3 03:42:06 roc-4r-sca020 kernel: [554036.264824] ---[ end trace ae271fe24c8d817e ]---
Jul  3 03:45:01 roc-4r-sca020 CRON[801140]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jul  2 06:28:21 roc-4r-sca020 rsyslogd: message repeated 6 times: [ [origin software="rsyslogd" swVersion="7.4.4" x-pid="722" x-info="http://www.rsyslog.com"
] rsyslogd was HUPed]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux