ceph pg query says all the OSDs are being probed. If those 6 OSDs are staying up, it probably just needs some time. The OSDs need to stay up longer than 15 mniutes. If any of them are getting marked down at all, that'll cause problems. I'd like to see the past intervals in the recovery state get smaller. All of those entries indicate potential history that needs to be reconciled. If that array is getting smaller, then recovery is proceeding.
You could try pushing it a bit with a ceph pg scrub 0.37. If that finishes with out any improvement, try ceph pg deep-scrub 0.37 . Sometimes it helps move things faster, and sometimes it doesn't.
On Wed, Apr 22, 2015 at 11:54 AM, MEGATEL / Rafał Gawron <rafal.gawron@xxxxxxxxxxxxxx> wrote:
All osd are works fine now
ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY-1 1080.71985 root default-2 120.07999 host s10 60.03999 osd.0 up 1.00000 1.000001 60.03999 osd.1 up 1.00000 1.00000-3 120.07999 host s22 60.03999 osd.2 up 1.00000 1.000003 60.03999 osd.3 up 1.00000 1.00000-4 120.07999 host s34 60.03999 osd.4 up 1.00000 1.000005 60.03999 osd.5 up 1.00000 1.00000-5 120.07999 host s46 60.03999 osd.6 up 1.00000 1.000007 60.03999 osd.7 up 1.00000 1.00000-6 120.07999 host s59 60.03999 osd.9 up 1.00000 1.000008 60.03999 osd.8 up 1.00000 1.00000-7 120.07999 host s610 60.03999 osd.10 up 1.00000 1.0000011 60.03999 osd.11 up 1.00000 1.00000
-8 120.07999 host s712 60.03999 osd.12 up 1.00000 1.0000013 60.03999 osd.13 up 1.00000 1.00000
-9 120.07999 host s814 60.03999 osd.14 up 1.00000 1.00000
15 60.03999 osd.15 up 1.00000 1.00000-10 120.07999 host s917 60.03999 osd.17 up 1.00000 1.0000016 60.03999 osd.16 up 1.00000 1.00000
Early I had power failure and my cluster was down.
After up is recovering but now I have :
1 pgs incomplete
1 pgs stuck inactive1 pgs stuck unclean
Cluster don't can revovery this pg.
I try out some osd and add to my cluster but recovery after this things don't rebuild my cluster.
Od: Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx>
Wysłane: 22 kwietnia 2015 20:40
Do: MEGATEL / Rafał Gawron
Temat: Re: Odp.: CEPH 1 pgs incompleteSo you have flapping OSDs. None of the 6 OSDs involved in that PG are staying up long enough to complete the recovery.
What's happened is that because of how quickly the OSDs are coming up and failing, no single OSD has a complete copy of the data. There should be a complete copy of the data, but different osds have different chunks of it.
Figure out why those 6 OSDs are failing, and Ceph should recover. Do you see anything interesting in those OSD logs? If not, you might need to increase the logging levels.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com