Replication question

Yury Kirsanov <y.kirsanov@xxxxxxxxx> · Sat, 30 Oct 2021 20:05:04 +1100

Hi everyone,
I have a CEPH cluster with 3 MON/MGR/MDS nodes, 3 OSD nodes each hosting
two OSDs (2 HDDs, 1 OSD per HDD). My pools are configured with a replica x
3 and my osd_pool_default_size is set to 2. So I have 6 total OSDs and 3
hosts for OSDs.

My CRUSH map is plain simple - root, then 3 hosts each having two OSDs. And
the CRUSH rule is set to choose HOST, not OSD in order to find data.

I was going to do maintenance and service on one of my OSD nodes so I tried
to set it 'out' as per CEPH manual hoping that after that all the data will
be redistributed among active 4 OSD nodes as I thought that replica size of
3 means that data is replicated among OSDs, not hosts even though CRUSH
rule has hosts in it.

After setting two OSDs to 'out' nothing happened except for 33% of data
becoming degraded. So I followed the manual, put OSDs back 'in' and
re-weighted them with a weight of 0. Nothing happened again. Data stayed at
33% degraded state.

So I removed OSDs completely from the CEPH system and CRUSH map. Again - no
migration even though I have 4 available OSDs active and up.

My question is - do I understand correctly that I need to either update my
CRUSH rule to select OSDs (which I know is bad) to place objects into PGs
or have more OSD hosts available so when one of them is going down I would
still have 3 active hosts and CEPH can re-distribute data between these 3
hosts to maintain replica size of x3? Or maybe I don't understand something?

Thanks!

Best regards,
Yury.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx