Hi Dominic, 2021年8月21日(土) 1:05 <DHilsbos@xxxxxxxxxxxxxx>: > Satoru; > > You said " after restarting all nodes one by one." After each reboot, did > you allow the cluster the time necessary to come back to a "HEALTH_OK" > status? > No, the we rebooted with the following policy. 1. Reboot one machine. 2. Wait until completing reboot as a Kubernetes level (not Ceph cluster level). 3. If there are other nodes to be rebooted, go to step 1. I should have explained this logic to you as well. I realized that above logic is wrong and I should wait coming back to HEALTH_OK. Unfortunately I doesn't understand the meaning of pg state well and there seem to be several states which mean "pg might be lost". https://docs.ceph.com/en/latest/rados/operations/pg-states/ Could you tell me that pg can become `recovery_unfoud` state in this case? Thanks, Satoru _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx