Hi,
3. I have allocated three (3) separate machines for the Ceph
cluster. That is, I have 3 separate instances of MON, MGR, OSD and
MDS running on 3 separate machines.
okay, so at least those are three different hosts, although in a
production environment I would strongly recommend to use a dedicated
MDS server. But why only three OSDs? In case of a disk failure the
cluster is in a degraded state until you recover or rebuild that one
OSD on that host. If you had more disks per node those PGs could at
least be remapped to a different OSD and let the cluster recover.
The other thing is to have the CephFS metadata pool on SSDs, that's a
common recommendation to reduce latency. And since the metadata pool
is usually quite small it wouldn't be that expensive.
Increasing the number of MONs to 5 is not unreasonable although most
of our customers (as well as our own cluster) are fine with 3 MONs.
But increasing the pool size to 5 can or will have an impact on the
performance since it also increases the latency, every write has to be
acked 5 times instead of 3. I think you'd be fine with pool size 3
(failure domain host) but you should move the metadata to SSDs and
increase the overall number of OSDs.
There is no guarantee, you can only reduce the risks of data loss but
prepare for it with backups.
[5] https://docs.ceph.com/en/latest/cephfs/createfs/
Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:
On Sunday, May 23, 2021, 01:16:12 AM GMT+8, Eugen Block
<eblock@xxxxxx> wrote: Awesome! I'm glad it worked out this far! At
least you have a working
filesystem now even it means that you may have to use a backup.
But now I can say it: Having only three OSDs is really not the best
idea. ;-) Are all those OSDs on the same host?
1. For safe side I did a full deep-scrubceph osd deep-scrub all
ceph -w shows no error, only following line repeating:2021-05-23
01:00:00.003140 mon.a [INF] overall HEALTH_OK
2021-05-23 02:00:00.007661 mon.a [INF] overall HEALTH_OK
That is, whatever in the cluster is clean.
2. I take daily rsync-based backups.
I still not sure what the removed metadata object represented.
3. I have allocated three (3) separate machines for the Ceph
cluster. That is, I have 3 separate instances of MON, MGR, OSD and
MDS running on 3 separate machines.
I agree it is better allocate five (5) different machines with pool
size 5. It further reduces the risk factor that losing quorum if one
machine is already down.
I think to avoid this kind of mess happening again, have to use data
center-grade SSDs with PLP (Power Loss Protection). Mine is hard
disks.
The issue with data center-grade SSDs with PLP is still low capacity
and very expensive.
One not so expensive option is to keep the journal on separate data
center-grade SSD with PLP. But Ceph has to give me a guarantee that
it's flushing or sync the journal to high capacity hard disks are
fail safe. What's your understanding on this? Is it fail safe? Any
link for me to further read.
Best regardsSagara
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx