Re: how possible is that ceph cluster crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Olá Pedro...

These are extremely generic questions, and therefore, hard to answer.  Nick did a good job in defining the risks.

In our case, we are running a Ceph/CephFS system in production for over an year, and before that, we tried to understand Ceph for a year also.

Ceph is incredibility good is dealing with hardware failures so it is a powerfull tool if you are using commodity hardware. If your disks fail or even if a fraction of your hosts fail, it is able to cope and recover properly (until a given extent) if you have the proper crush rules in place (the default ones do a good job on that) and free space available. To be on the safe side:
- decouple mons from osds servers
- check the RAM requirement for your osds servers (depend in the number of osds in each server)
- have, at least, 3 mons in a production system
- use a 3x replica 
There is a good info page on hardware requirements in the ceph wikis.

However, the devil is on the details. Ceph is a complex system still in permanent development. Wrong configurations might lead to performance problems. If your network is not reliable, that might lead to flapping osds, which on its turn, might lead to problems in your pgs. When your osds starts to become full (a single full osd freezes all I/O to the cluster) many problems may start to appear. Finally there are bugs. Their number is not huge and there is a real good effort form the developers and from the community to address those in a fast and reliable way. However, sometimes it is difficult to diagnose what could be wrong because of the so many layers involved. It is not infrequent that we have to go and look to the source code to figure out (when possible) what may be happening. So, I would say that there is a learning curve that myself and others are still going through.

Abraço
Gonçalo





________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Pedro Benites [pbenites@xxxxxxxxxxxxxx]
Sent: 17 November 2016 04:50
To: ceph-users@xxxxxxxxxxxxxx
Subject:  how possible is that ceph cluster crash

Hi,

I have a ceph cluster with 50 TB, with 15 osds, it is working fine for
one year and I would like to grow it and migrate all my old storage,
about 100 TB to ceph, but I have a doubt. How possible is that the
cluster fail and everything went very bad? How reliable is ceph? What is
the risk about lose my data.? is necessary backup my data?

Regards.
Pedro.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux