Re: Unhappy Cluster

"Alexander E. Patrakov" <patrakov@xxxxxxxxx> · Sat, 9 Sep 2023 09:50:31 +0800

Hello Dave,

I think your data is still intact. Nautilus, indeed, had issues when
recovering erasure-coded pools. You can try temporarily setting min_size to
4. This bug has been fixed in Octopus or later releases. From the release
notes at https://docs.ceph.com/en/latest/releases/octopus/:

Ceph will allow recovery below min_size for Erasure coded pools, wherever
> possible.

On Sat, Sep 9, 2023 at 5:58 AM Dave S <bigdave.schulz@xxxxxxxxx> wrote:

> Hi Everyone,
> I've been fighting with a ceph cluster that we have recently
> physically relocated and lost 2 OSDs during the ensuing power down and
> relocation. After powering everything back on we have
>              3   incomplete
>              3   remapped+incomplete
> And indeed we have 2 OSDs that died along the way.
> The reason I'm contacting the list is that I'm surprised that these
> PGs are incomplete.  We're running Erasure coding with K=4, M=2 which
> in my understanding we should be able to lose 2 OSDs without an issue.
> Am I mis-understanding this or does m=2 mean you can lose m-1 OSDs?
>
> Also, these two OSDs happened to be in the same server (#3 of 8 total
> servers).
>
> This is an older cluster running Nautilus 14.2.9.
>
> Any thoughts?
> Thanks
> -Dave
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx