On 07/17/2014 09:44 PM, Caius Howcroft wrote: > I wonder if someone can just clarify something for me. > > I have a cluster which I have upgraded to firefly. I'm having pg > inconsistencies due to the recent reported xfs bug. However, I'm > running pg repair X.YYY and I would like to just understand what, > exactly this is doing. It looks like its copying from the primary to > the other two (if size=3), but is it still doing this if the primary > is odd one out? i.e. what happens if the primary get corrupted? I > thought pg repair should fail in this case, but now I'm not so sure. > If the primary OSD is down CRUSH will select a different OSD as the primary. Ceph doesn't know which object is corrupted. It simply knows that the secondary OSD does not have the same copy as the primary. btrfs could help here since it has online checksumming, XFS doesn't. > Also is there a way to get the information about which objects and on > which osd are inconsistent, basically the stuff I see in the mon logs > but get it from a json dump or from admin socket ? I would like to > track these errors better by feeding into our metrics collection. $ ceph pg <pg id> query That should tell you more. > Thanks > Caius > > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on