Re: Replication question

Etienne Menguy <etienne.menguy@xxxxxxxx> · Sun, 31 Oct 2021 00:38:31 +0200



Hi,

> My question is - do I understand correctly that I need to either update my
> CRUSH rule to select OSDs (which I know is bad) to place objects into PGs
> or have more OSD hosts available so when one of them is going down I would
> still have 3 active hosts and CEPH can re-distribute data between these 3
> hosts to maintain replica size of x3?
True, you could do this. But I think the best way would be to add a fourth server.

> Or maybe I don't understand something?
I don’t think so, in your crush rule you request a failure domain at host level, so if you only have 2 hosts but 3 replicas, ceph can’t replicate the third copy.
I don’t know if there is a way to bypass this while keeping the failure domain.

>From my experience if a host (even an osd) is temporary down you don’t want to recover.
It will generate load to recover but also once host is back on cluster to put back PG to their original OSD.

-
Etienne Menguy
etienne.menguy@xxxxxxxx


> On 30 Oct 2021, at 11:05, Yury Kirsanov <y.kirsanov@xxxxxxxxx> wrote:
> 
> Hi everyone,
> I have a CEPH cluster with 3 MON/MGR/MDS nodes, 3 OSD nodes each hosting
> two OSDs (2 HDDs, 1 OSD per HDD). My pools are configured with a replica x
> 3 and my osd_pool_default_size is set to 2. So I have 6 total OSDs and 3
> hosts for OSDs.
> 
> My CRUSH map is plain simple - root, then 3 hosts each having two OSDs. And
> the CRUSH rule is set to choose HOST, not OSD in order to find data.
> 
> I was going to do maintenance and service on one of my OSD nodes so I tried
> to set it 'out' as per CEPH manual hoping that after that all the data will
> be redistributed among active 4 OSD nodes as I thought that replica size of
> 3 means that data is replicated among OSDs, not hosts even though CRUSH
> rule has hosts in it.
> 
> After setting two OSDs to 'out' nothing happened except for 33% of data
> becoming degraded. So I followed the manual, put OSDs back 'in' and
> re-weighted them with a weight of 0. Nothing happened again. Data stayed at
> 33% degraded state.
> 
> So I removed OSDs completely from the CEPH system and CRUSH map. Again - no
> migration even though I have 4 available OSDs active and up.
> 
> My question is - do I understand correctly that I need to either update my
> CRUSH rule to select OSDs (which I know is bad) to place objects into PGs
> or have more OSD hosts available so when one of them is going down I would
> still have 3 active hosts and CEPH can re-distribute data between these 3
> hosts to maintain replica size of x3? Or maybe I don't understand something?
> 
> Thanks!
> 
> Best regards,
> Yury.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx