Re: The reason of recovery_unfound pg

Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx> · Mon, 23 Aug 2021 10:14:40 +0900

Hi Dominic,

2021年8月21日(土) 7:17 <DHilsbos@xxxxxxxxxxxxxx>:
>
> Satoru;
>
> Ok.  What your cluster is telling you, then, is that it doesn't know which replica is the "most current" or "correct" replica.  You will need to determine that, and let ceph know which one to use as the "good" replica.  Unfortunately, I can't help you with this.  In fact, if this is critical data, I'd seriously consider engaging a contractor to help you recover the data, and help your cluster return to a fully operational status.
>
> I have found it helpful to set noout, and norebalance, when I intend to reboot or offline any of my OSDs.
>
> It's also critical to allow the cluster to return to a cluster state of HEALTH_OK in between reboots.
>
> Thank you,

Thank you very much for your answer and advice!

Best,
Satoru

>
> Dominic L. Hilsbos, MBA
> Vice President – Information Technology
> Perform Air International Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
>
> From: Satoru Takeuchi [mailto:satoru.takeuchi@xxxxxxxxx]
> Sent: Friday, August 20, 2021 2:48 PM
> To: Dominic Hilsbos
> Cc: ceph-users
> Subject: Re:  Re: The reason of recovery_unfound pg
>
> Hi Dominic,
>
> 2021年8月21日(土) 1:05 <DHilsbos@xxxxxxxxxxxxxx>:
> Satoru;
>
> You said " after restarting all nodes one by one."  After each reboot, did you allow the cluster the time necessary to come back to a "HEALTH_OK" status?
>
>
> No, the we rebooted with the following policy.
>
> 1. Reboot one machine.
> 2. Wait until completing reboot as a Kubernetes level (not Ceph cluster level).
> 3. If there are other nodes to be rebooted, go to step 1.
>
> I should have explained this logic to you as well.
> I realized that above logic is wrong and I should wait coming back to HEALTH_OK.
> Unfortunately I doesn't understand the meaning of pg state well and there seem
> to be several states which mean "pg might be lost".
>
> https://docs.ceph.com/en/latest/rados/operations/pg-states/
>
> Could you tell me that pg can become `recovery_unfoud` state in this case?
>
> Thanks,
> Satoru
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx