Re: osd become unusable, blocked by xfsaild (?) and load > 5000

Jan Schermer <jan@xxxxxxxxxxx> · Tue, 8 Dec 2015 08:36:20 +0100

And how many pids do you have currently?
This should do it I think
# ps axH |wc -l

Jan

> On 08 Dec 2015, at 08:26, Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> wrote:
> 
> Hi Jan,
> 
> we initially had to bump it once we had more than 12 osds
> per box. But it'll change that to the values you provided.
> 
> Thx!
> 
> Benedikt
> 
> 2015-12-08 8:15 GMT+01:00 Jan Schermer <jan@xxxxxxxxxxx>:
>> What is the setting of sysctl kernel.pid_max?
>> You relly need to have this:
>> kernel.pid_max = 4194304
>> (I think it also sets this as well: kernel.threads-max = 4194304)
>> 
>> I think you are running out of processs IDs.
>> 
>> Jan
>> 
>>> On 08 Dec 2015, at 08:10, Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> wrote:
>>> 
>>> Hello Cephers,
>>> 
>>> lately, our ceph-cluster started to show some weird behavior:
>>> 
>>> the osd boxes show a load of 5000-15000 before the osds get marked down.
>>> Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly,
>>> you can read and write to any disk, only things you can't do are strace the osd
>>> processes, sync or reboot.
>>> 
>>> we only find some logs about the "xfsaild = XFS Access Item List Daemon"
>>> as hung_task warnings.
>>> 
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016108]
>>> [<ffffffff81093790>] ? kthread_create_on_node+0x1c0/0x1c0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016112] INFO: task
>>> xfsaild/dm-1:1445 blocked for more than 120 seconds.
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016329]       Tainted:
>>> G         C     3.19.0-39-generic #44~14.04.1-Ubuntu
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016558] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016802] xfsaild/dm-1
>>> D ffff8807faa03af8     0  1445      2 0x00000000
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016805]
>>> ffff8807faa03af8 ffff8808098989d0 0000000000013e80 ffff8807faa03fd8
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016808]
>>> 0000000000013e80 ffff88080bb775c0 ffff8808098989d0 ffff88011381b2a8
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016812]
>>> ffff8807faa03c50 7fffffffffffffff ffff8807faa03c48 ffff8808098989d0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016815] Call Trace:
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016819]
>>> [<ffffffff817b2fd9>] schedule+0x29/0x70
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016823]
>>> [<ffffffff817b609c>] schedule_timeout+0x20c/0x280
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016826]
>>> [<ffffffff810a40a5>] ? sched_clock_cpu+0x85/0xc0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016830]
>>> [<ffffffff810a0911>] ? try_to_wake_up+0x1f1/0x340
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016834]
>>> [<ffffffff817b3d04>] wait_for_completion+0xa4/0x170
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016836]
>>> [<ffffffff810a0ad0>] ? wake_up_state+0x20/0x20
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016840]
>>> [<ffffffff8108e86d>] flush_work+0xed/0x1c0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016846]
>>> [<ffffffff8108acc0>] ? destroy_worker+0x90/0x90
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016870]
>>> [<ffffffffc06f556e>] xlog_cil_force_lsn+0x7e/0x1f0 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016873]
>>> [<ffffffff810daddb>] ? lock_timer_base.isra.36+0x2b/0x50
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016878]
>>> [<ffffffff810dbdcf>] ? try_to_del_timer_sync+0x4f/0x70
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016901]
>>> [<ffffffffc06f3980>] _xfs_log_force+0x60/0x270 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016904]
>>> [<ffffffff810daba0>] ? internal_add_timer+0x80/0x80
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016926]
>>> [<ffffffffc06f3bba>] xfs_log_force+0x2a/0x90 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016948]
>>> [<ffffffffc06fe340>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016970]
>>> [<ffffffffc06fe480>] xfsaild+0x140/0x5a0 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016992]
>>> [<ffffffffc06fe340>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.016996]
>>> [<ffffffff81093862>] kthread+0xd2/0xf0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017000]
>>> [<ffffffff81093790>] ? kthread_create_on_node+0x1c0/0x1c0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017005]
>>> [<ffffffff817b72d8>] ret_from_fork+0x58/0x90
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017009]
>>> [<ffffffff81093790>] ? kthread_create_on_node+0x1c0/0x1c0
>>> Dec  7 15:36:32 ceph1-store204 kernel: [152066.017013] INFO: task
>>> xfsaild/dm-6:1616 blocked for more than 120 seconds.
>>> 
>>> kswapd is also reported as hung, but we don't have swap on the osds.
>>> 
>>> It looks like either all ceph-osd-threads are reporting in as willing to work,
>>> or it's the xfs-maintenance-process itself like described in [1,2]
>>> 
>>> Usually if we aint fast enough setting no{out,scrub,deep-scrub} this
>>> has an avalanche
>>> effect where we usually end up ipmi-power-cycling half of the cluster
>>> because all the osd-nodes
>>> are busy doing nothing (according to iostat or top, exept the load).
>>> 
>>> Is this a known bug for kernel 3.19.0-39 (ubuntu 14.04 with the vivid kernel)?
>>> Do the xfs-tweaks described here
>>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg25295.html
>>> (i know this is for a pull request modifying the write-paths)
>>> look decent or worth a try?
>>> 
>>> Currently we're running with "back to defaults" and less load
>>> (desperate try with the filestore settings, didnt change anything)
>>> ceph.conf-osd section:
>>> 
>>> [osd]
>>> filestore max sync interval = 15
>>> filestore min sync interval = 1
>>> osd max backfills = 1
>>> osd recovery op priority = 1
>>> 
>>> 
>>> as a baffled try to get it to survive more than a day at a stretch.
>>> 
>>> Maybe kernel 4.2 is worth a try?
>>> 
>>> Thx for any input
>>> Benedikt
>>> 
>>> 
>>> [1] https://www.reddit.com/r/linux/comments/18kvdb/xfsaild_is_creating_tons_of_system_threads_and/
>>> [2] http://serverfault.com/questions/497049/the-xfs-filesystem-is-broken-in-rhel-centos-6-x-what-can-i-do-about-it
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com