Re: 2x replication: A BIG warning

Wido den Hollander <wido@xxxxxxxx> · Wed, 7 Dec 2016 14:58:58 +0100 (CET)

> Op 7 december 2016 om 11:29 schreef Kees Meijs <kees@xxxxxxxx>:
> 
> 
> Hi Wido,
> 
> Valid point. At this moment, we're using a cache pool with size = 2 and
> would like to "upgrade" to size = 3.
> 
> Again, you're absolutely right... ;-)
> 
> Anyway, any things to consider or could we just:
> 
>  1. Run "ceph osd pool set cache size 3".
>  2. Wait for rebalancing to complete.
>  3. Run "ceph osd pool set cache min_size 2".
> 

Indeed! It is a simple as that.

Your cache pool can also contain very valuable data you do not want to loose.

Wido

> Thanks!
> 
> Regards,
> Kees
> 
> On 07-12-16 09:08, Wido den Hollander wrote:
> > As a Ceph consultant I get numerous calls throughout the year to help people with getting their broken Ceph clusters back online.
> >
> > The causes of downtime vary vastly, but one of the biggest causes is that people use replication 2x. size = 2, min_size = 1.
> >
> > In 2016 the amount of cases I have where data was lost due to these settings grew exponentially.
> >
> > Usually a disk failed, recovery kicks in and while recovery is happening a second disk fails. Causing PGs to become incomplete.
> >
> > There have been to many times where I had to use xfs_repair on broken disks and use ceph-objectstore-tool to export/import PGs.
> >
> > I really don't like these cases, mainly because they can be prevented easily by using size = 3 and min_size = 2 for all pools.
> >
> > With size = 2 you go into the danger zone as soon as a single disk/daemon fails. With size = 3 you always have two additional copies left thus keeping your data safe(r).
> >
> > If you are running CephFS, at least consider running the 'metadata' pool with size = 3 to keep the MDS happy.
> >
> > Please, let this be a big warning to everybody who is running with size = 2. The downtime and problems caused by missing objects/replicas are usually big and it takes days to recover from those. But very often data is lost and/or corrupted which causes even more problems.
> >
> > I can't stress this enough. Running with size = 2 in production is a SERIOUS hazard and should not be done imho.
> >
> > To anyone out there running with size = 2, please reconsider this!
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com