Hi Cephers,
I recently had a power problem and the entire cluster was brought down, came up, went down, and came up again. Afterword, 3 OSDs were mostly dead (HDD failures). Luckily (I think) the drives were alive enough that I could copy the data off and leave the journal alone.
Since my pool "data" size is 3... of course a couple of placement groups were only on those three drives.
Now I've added 4 new OSDs, and everything has recovered, except pg 0.f3. When I query the pg, I see the cluster is looking for OSD 14 or 23 because one of them maybe_went_rw. (5, 14, and 23 are now kaput and "ceph osd lost --yes-i-really-mean-it")
Ceph indicates OSD 29 is now the primary for pg 0.f3. I copied all the data to the appropriate directory, started OSD.29 again, and here is where my question comes in:
How do I convince the cluster that it's okay to bring 0.f3 'up' and backfill to the other OSDs from 29? (I could even manually backfill 15 and 22, but I suspect the cluster will still think there's a problem)
'ceph health detail' shows this about 0.f3:
pg 0.f3 is incomplete, acting [29,22,15] (reducing pool data min_size from 2 may help; search ceph.com/docs for 'incomplete')
Thanks in advance!
-Aaron
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com