Re: The reason of recovery_unfound pg

<DHilsbos@xxxxxxxxxxxxxx> · Fri, 20 Aug 2021 22:17:00 +0000

Satoru;

Ok.  What your cluster is telling you, then, is that it doesn't know which replica is the "most current" or "correct" replica.  You will need to determine that, and let ceph know which one to use as the "good" replica.  Unfortunately, I can't help you with this.  In fact, if this is critical data, I'd seriously consider engaging a contractor to help you recover the data, and help your cluster return to a fully operational status.

I have found it helpful to set noout, and norebalance, when I intend to reboot or offline any of my OSDs.

It's also critical to allow the cluster to return to a cluster state of HEALTH_OK in between reboots.

Thank you,

Dominic L. Hilsbos, MBA
Vice President – Information Technology
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx
www.PerformAir.com

From: Satoru Takeuchi [mailto:satoru.takeuchi@xxxxxxxxx] 
Sent: Friday, August 20, 2021 2:48 PM
To: Dominic Hilsbos
Cc: ceph-users
Subject: Re:  Re: The reason of recovery_unfound pg

Hi Dominic,

2021年8月21日(土) 1:05 <DHilsbos@xxxxxxxxxxxxxx>:
Satoru;

You said " after restarting all nodes one by one."  After each reboot, did you allow the cluster the time necessary to come back to a "HEALTH_OK" status?

No, the we rebooted with the following policy.

1. Reboot one machine.
2. Wait until completing reboot as a Kubernetes level (not Ceph cluster level).
3. If there are other nodes to be rebooted, go to step 1.

I should have explained this logic to you as well.
I realized that above logic is wrong and I should wait coming back to HEALTH_OK.
Unfortunately I doesn't understand the meaning of pg state well and there seem
to be several states which mean "pg might be lost".

https://docs.ceph.com/en/latest/rados/operations/pg-states/

Could you tell me that pg can become `recovery_unfoud` state in this case?

Thanks,
Satoru
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx