Re: Fwd: Ceph OSD suicide himself

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Mon, 11 Jul 2016 11:18:02 +0200

Le 11/07/2016 04:48, 한승진 a écrit :
> Hi cephers.
>
> I need your help for some issues.
>
> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
>
> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
>
> I've experienced one of OSDs was killed himself.
>
> Always it issued suicide timeout message.

This is probably a fragmentation problem : typical rbd access patterns
cause heavy BTRFS fragmentation.

If you already use the autodefrag mount option, you can try this which
performs much better for us :
https://github.com/jtek/ceph-utils/blob/master/btrfs-defrag-scheduler.rb

Note that it can take some time to fully defragment the filesystems but
it shouldn't put more stress than autodefrag while doing so.

If you don't already use it, set :
filestore btrfs snap = false
in ceph.conf an restart your OSDs.

Finally if you use journals on the filesystem and not on dedicated
partitions, you'll have to recreate them with the NoCow attribute
(there's no way to defragment journals in any way that doesn't kill
performance otherwise).

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com