Re: 2x replica with NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I think that the replica 2x on HDD/SSD are the same. You should read quote from Wido bellow:

""Hi,

As a Ceph consultant I get numerous calls throughout the year to help people with getting their broken Ceph clusters back online.

The causes of downtime vary vastly, but one of the biggest causes is that people use replication 2x. size = 2, min_size = 1.

In 2016 the amount of cases I have where data was lost due to these settings grew exponentially.

Usually a disk failed, recovery kicks in and while recovery is happening a second disk fails. Causing PGs to become incomplete.

There have been to many times where I had to use xfs_repair on broken disks and use ceph-objectstore-tool to export/import PGs.

I really don't like these cases, mainly because they can be prevented easily by using size = 3 and min_size = 2 for all pools.

With size = 2 you go into the danger zone as soon as a single disk/daemon fails. With size = 3 you always have two additional copies left thus keeping your data safe(r).

If you are running CephFS, at least consider running the 'metadata' pool with size = 3 to keep the MDS happy.

Please, let this be a big warning to everybody who is running with size = 2. The downtime and problems caused by missing objects/replicas are usually big and it takes days to recover from those. But very often data is lost and/or corrupted which causes even more problems.

I can't stress this enough. Running with size = 2 in production is a SERIOUS hazard and should not be done imho.

To anyone out there running with size = 2, please reconsider this!

Thanks,

Wido""

On Thu, Jun 8, 2017 at 5:32 PM, <info@xxxxxxxxx> wrote:
Hi all,

i'm going to build an all-flash ceph cluster, looking around the existing documentation i see lots of guides and and use case scenarios from various vendor testing Ceph with replica 2x.

Now, i'm an old school Ceph user, I always considered 2x replica really dangerous for production data, especially when both OSDs can't decide which replica is the good one.
Why all NVMe storage vendor and partners use only 2x replica?
They claim it's safe because NVMe is better in handling errors, but i usually don't trust marketing claims :)
Is it true? Can someone confirm that NVMe is different compared to HDD and therefore replica 2 can be considered safe to be put in production?

Many Thanks
Giordano

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux