Hi Etienee, Thanks a lot for clarification, yes, I think I have to have a spare server in my case and don't worry about 33% of degraded objects, the objects are still in place and file system is writeable anyway and once another OSD host was joined into the cluster CEPH immediately re-balanced cluster and fixed everything. Thanks again! Best regards, Yury. On Sun, Oct 31, 2021 at 9:38 AM Etienne Menguy <etienne.menguy@xxxxxxxx> wrote: > Hi, > > My question is - do I understand correctly that I need to either update my > CRUSH rule to select OSDs (which I know is bad) to place objects into PGs > or have more OSD hosts available so when one of them is going down I would > still have 3 active hosts and CEPH can re-distribute data between these 3 > hosts to maintain replica size of x3? > > True, you could do this. But I think the best way would be to add a fourth > server. > > Or maybe I don't understand something? > > I don’t think so, in your crush rule you request a failure domain at host > level, so if you only have 2 hosts but 3 replicas, ceph can’t replicate the > third copy. > I don’t know if there is a way to bypass this while keeping the failure > domain. > > From my experience if a host (even an osd) is temporary down you don’t > want to recover. > It will generate load to recover but also once host is back on cluster to > put back PG to their original OSD. > > - > Etienne Menguy > etienne.menguy@xxxxxxxx > > > > > On 30 Oct 2021, at 11:05, Yury Kirsanov <y.kirsanov@xxxxxxxxx> wrote: > > Hi everyone, > I have a CEPH cluster with 3 MON/MGR/MDS nodes, 3 OSD nodes each hosting > two OSDs (2 HDDs, 1 OSD per HDD). My pools are configured with a replica x > 3 and my osd_pool_default_size is set to 2. So I have 6 total OSDs and 3 > hosts for OSDs. > > My CRUSH map is plain simple - root, then 3 hosts each having two OSDs. And > the CRUSH rule is set to choose HOST, not OSD in order to find data. > > I was going to do maintenance and service on one of my OSD nodes so I tried > to set it 'out' as per CEPH manual hoping that after that all the data will > be redistributed among active 4 OSD nodes as I thought that replica size of > 3 means that data is replicated among OSDs, not hosts even though CRUSH > rule has hosts in it. > > After setting two OSDs to 'out' nothing happened except for 33% of data > becoming degraded. So I followed the manual, put OSDs back 'in' and > re-weighted them with a weight of 0. Nothing happened again. Data stayed at > 33% degraded state. > > So I removed OSDs completely from the CEPH system and CRUSH map. Again - no > migration even though I have 4 available OSDs active and up. > > My question is - do I understand correctly that I need to either update my > CRUSH rule to select OSDs (which I know is bad) to place objects into PGs > or have more OSD hosts available so when one of them is going down I would > still have 3 active hosts and CEPH can re-distribute data between these 3 > hosts to maintain replica size of x3? Or maybe I don't understand > something? > > Thanks! > > Best regards, > Yury. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx