Re: Fwd: Ceph OSD suicide himself

Brad Hubbard <bhubbard@xxxxxxxxxx> · Tue, 12 Jul 2016 10:51:05 +1000

On Mon, Jul 11, 2016 at 04:53:36PM +0200, Lionel Bouton wrote:
> Le 11/07/2016 11:56, Brad Hubbard a écrit :
> > On Mon, Jul 11, 2016 at 7:18 PM, Lionel Bouton
> > <lionel-subscription@xxxxxxxxxxx> wrote:
> >> Le 11/07/2016 04:48, 한승진 a écrit :
> >>> Hi cephers.
> >>>
> >>> I need your help for some issues.
> >>>
> >>> The ceph cluster version is Jewel(10.2.1), and the filesytem is btrfs.
> >>>
> >>> I run 1 Mon and 48 OSD in 4 Nodes(each node has 12 OSDs).
> >>>
> >>> I've experienced one of OSDs was killed himself.
> >>>
> >>> Always it issued suicide timeout message.
> >> This is probably a fragmentation problem : typical rbd access patterns
> >> cause heavy BTRFS fragmentation.
> > To the extent that operations take over 120 seconds to complete? Really?
> 
> Yes, really. I had these too. By default Ceph/RBD uses BTRFS in a very
> aggressive way, rewriting data all over the place and creating/deleting
> snapshots every filestore sync interval (5 seconds max by default IIRC).
> 
> As I said there are 3 main causes of performance degradation :
> - the snapshots,
> - the journal in a standard copy-on-write file (move it out of the FS or
> use NoCow),
> - the weak auto defragmentation of BTRFS (autodefrag mount option).
> 
> Each one of them is enough to impact or even destroy performance in the
> long run. The 3 combined make BTRFS unusable by default. This is why
> BTRFS is not recommended : if you want to use it you have to be prepared
> for some (heavy) tuning. The first 2 points are easy to address, for the
> last (which begins to be noticeable when you accumulate rewrites on your
> data) I'm not aware of any other tool than the one we developed and
> published on github (link provided in previous mail).
> 
> Another thing : you better have a recent 4.1.x or 4.4.x kernel on your
> OSDs if you use BTRFS. We've used it since 3.19.x but I wouldn't advise
> it now and would recommend 4.4.x if it's possible for you and 4.1.x
> otherwise.

Thanks for the information. I wasn't aware things were that bad with BTRFS as
I haven't had much to do with it up to this point.

Cheers,
Brad

> 
> Best regards,
> 
> Lionel

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com