Re: Replication question

Yury Kirsanov <y.kirsanov@xxxxxxxxx> · Sun, 31 Oct 2021 19:42:55 +1100

Hi Etienee,
Thanks a lot for clarification, yes, I think I have to have a spare server
in my case and don't worry about 33% of degraded objects, the objects are
still in place and file system is writeable anyway and once another OSD
host was joined into the cluster CEPH immediately re-balanced cluster and
fixed everything.
Thanks again!

Best regards,
Yury.

On Sun, Oct 31, 2021 at 9:38 AM Etienne Menguy <etienne.menguy@xxxxxxxx>
wrote:

> Hi,
>
> My question is - do I understand correctly that I need to either update my
> CRUSH rule to select OSDs (which I know is bad) to place objects into PGs
> or have more OSD hosts available so when one of them is going down I would
> still have 3 active hosts and CEPH can re-distribute data between these 3
> hosts to maintain replica size of x3?
>
> True, you could do this. But I think the best way would be to add a fourth
> server.
>
> Or maybe I don't understand something?
>
> I don’t think so, in your crush rule you request a failure domain at host
> level, so if you only have 2 hosts but 3 replicas, ceph can’t replicate the
> third copy.
> I don’t know if there is a way to bypass this while keeping the failure
> domain.
>
> From my experience if a host (even an osd) is temporary down you don’t
> want to recover.
> It will generate load to recover but also once host is back on cluster to
> put back PG to their original OSD.
>
> -
> Etienne Menguy
> etienne.menguy@xxxxxxxx
>
>
>
>
> On 30 Oct 2021, at 11:05, Yury Kirsanov <y.kirsanov@xxxxxxxxx> wrote:
>
> Hi everyone,
> I have a CEPH cluster with 3 MON/MGR/MDS nodes, 3 OSD nodes each hosting
> two OSDs (2 HDDs, 1 OSD per HDD). My pools are configured with a replica x
> 3 and my osd_pool_default_size is set to 2. So I have 6 total OSDs and 3
> hosts for OSDs.
>
> My CRUSH map is plain simple - root, then 3 hosts each having two OSDs. And
> the CRUSH rule is set to choose HOST, not OSD in order to find data.
>
> I was going to do maintenance and service on one of my OSD nodes so I tried
> to set it 'out' as per CEPH manual hoping that after that all the data will
> be redistributed among active 4 OSD nodes as I thought that replica size of
> 3 means that data is replicated among OSDs, not hosts even though CRUSH
> rule has hosts in it.
>
> After setting two OSDs to 'out' nothing happened except for 33% of data
> becoming degraded. So I followed the manual, put OSDs back 'in' and
> re-weighted them with a weight of 0. Nothing happened again. Data stayed at
> 33% degraded state.
>
> So I removed OSDs completely from the CEPH system and CRUSH map. Again - no
> migration even though I have 4 available OSDs active and up.
>
> My question is - do I understand correctly that I need to either update my
> CRUSH rule to select OSDs (which I know is bad) to place objects into PGs
> or have more OSD hosts available so when one of them is going down I would
> still have 3 active hosts and CEPH can re-distribute data between these 3
> hosts to maintain replica size of x3? Or maybe I don't understand
> something?
>
> Thanks!
>
> Best regards,
> Yury.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx