Re: One mds daemon damaged, filesystem is offline. How to recover?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

3. I have allocated three (3) separate machines for the Ceph cluster. That is, I have 3 separate instances of MON, MGR, OSD and MDS running on 3 separate machines.

okay, so at least those are three different hosts, although in a production environment I would strongly recommend to use a dedicated MDS server. But why only three OSDs? In case of a disk failure the cluster is in a degraded state until you recover or rebuild that one OSD on that host. If you had more disks per node those PGs could at least be remapped to a different OSD and let the cluster recover. The other thing is to have the CephFS metadata pool on SSDs, that's a common recommendation to reduce latency. And since the metadata pool is usually quite small it wouldn't be that expensive.

Increasing the number of MONs to 5 is not unreasonable although most of our customers (as well as our own cluster) are fine with 3 MONs. But increasing the pool size to 5 can or will have an impact on the performance since it also increases the latency, every write has to be acked 5 times instead of 3. I think you'd be fine with pool size 3 (failure domain host) but you should move the metadata to SSDs and increase the overall number of OSDs.

There is no guarantee, you can only reduce the risks of data loss but prepare for it with backups.

[5] https://docs.ceph.com/en/latest/cephfs/createfs/


Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:

On Sunday, May 23, 2021, 01:16:12 AM GMT+8, Eugen Block <eblock@xxxxxx> wrote: Awesome! I'm glad it worked out this far! At least you have a working 
filesystem now even it means that you may have to use a backup.
But now I can say it: Having only three OSDs is really not the best 
idea. ;-) Are all those OSDs on the same host?

1. For safe side I did a full deep-scrubceph osd deep-scrub all

ceph -w shows no error, only following line repeating:2021-05-23 01:00:00.003140 mon.a [INF] overall HEALTH_OK

2021-05-23 02:00:00.007661 mon.a [INF] overall HEALTH_OK

That is, whatever in the cluster is clean.


2. I take daily rsync-based backups. 
I still not sure what the removed metadata object represented. 


3. I have allocated three (3) separate machines for the Ceph cluster. That is, I have 3 separate instances of MON, MGR, OSD and MDS running on 3 separate machines. I agree it is better allocate five (5) different machines with pool size 5. It further reduces the risk factor that losing quorum if one machine is already down. I think to avoid this kind of mess happening again, have to use data center-grade SSDs with PLP (Power Loss Protection). Mine is hard disks.  The issue with data center-grade SSDs with PLP is still low capacity and very expensive. One not so expensive option is to keep the journal on separate data center-grade SSD with PLP. But Ceph has to give me a guarantee that it's flushing or sync the journal to high capacity hard disks are fail safe. What's your understanding on this? Is it fail safe? Any link for me to further read.
Best regardsSagara


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux