Re: CEPH complete cluster failure: unknown PGS

Theofilos Mouratidis <mtheofilos@xxxxxxxxx> · Thu, 5 Oct 2023 11:41:48 +0200

Do your current Crush rules for your pools apply to the new OSD map
with those 4 nodes? If you have e.g. ec 4+2 in 8 node cluster and now
you have 4 nodes you went less than your min size, please check

Στις Πέμ 28 Σεπ 2023 στις 9:24 μ.μ., ο/η <v1tnam@xxxxxxxxx> έγραψε:
>
> I have an 8-node cluster with old hardware. a week ago 4 nodes went down and the CEPH cluster went nuts.
> All pgs became unknown and montors took too long to be in sync.
> So i reduced the number of mons to one and mgrs to one as well
>
> Now the recovery starts with 100% unknown pgs and then pgs start to move ot inactive . It generally fails to recover in the middle and starts from scratch.
>
> It's hold hardware and OSDs have lots of slow ops and probably number of bad sectors as well
>
> Any suggestions on how to tackle this. It's a nautilus cluster and pretty old (8-year old hardware)
>
> Thanks
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx