> Op 7 december 2016 om 11:29 schreef Kees Meijs <kees@xxxxxxxx>: > > > Hi Wido, > > Valid point. At this moment, we're using a cache pool with size = 2 and > would like to "upgrade" to size = 3. > > Again, you're absolutely right... ;-) > > Anyway, any things to consider or could we just: > > 1. Run "ceph osd pool set cache size 3". > 2. Wait for rebalancing to complete. > 3. Run "ceph osd pool set cache min_size 2". > Indeed! It is a simple as that. Your cache pool can also contain very valuable data you do not want to loose. Wido > Thanks! > > Regards, > Kees > > On 07-12-16 09:08, Wido den Hollander wrote: > > As a Ceph consultant I get numerous calls throughout the year to help people with getting their broken Ceph clusters back online. > > > > The causes of downtime vary vastly, but one of the biggest causes is that people use replication 2x. size = 2, min_size = 1. > > > > In 2016 the amount of cases I have where data was lost due to these settings grew exponentially. > > > > Usually a disk failed, recovery kicks in and while recovery is happening a second disk fails. Causing PGs to become incomplete. > > > > There have been to many times where I had to use xfs_repair on broken disks and use ceph-objectstore-tool to export/import PGs. > > > > I really don't like these cases, mainly because they can be prevented easily by using size = 3 and min_size = 2 for all pools. > > > > With size = 2 you go into the danger zone as soon as a single disk/daemon fails. With size = 3 you always have two additional copies left thus keeping your data safe(r). > > > > If you are running CephFS, at least consider running the 'metadata' pool with size = 3 to keep the MDS happy. > > > > Please, let this be a big warning to everybody who is running with size = 2. The downtime and problems caused by missing objects/replicas are usually big and it takes days to recover from those. But very often data is lost and/or corrupted which causes even more problems. > > > > I can't stress this enough. Running with size = 2 in production is a SERIOUS hazard and should not be done imho. > > > > To anyone out there running with size = 2, please reconsider this! > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com