pg repair info

wido@xxxxxxxx (Wido den Hollander) · Thu, 17 Jul 2014 22:25:50 +0200

On 07/17/2014 09:44 PM, Caius Howcroft wrote:
> I wonder if someone can just clarify something for me.
>
> I have a cluster which I have upgraded to firefly. I'm having pg
> inconsistencies due to the recent reported xfs bug. However, I'm
> running pg repair X.YYY and I would like to just understand what,
> exactly this is doing. It looks like its copying from the primary to
> the other two (if size=3), but is it still doing this if the primary
> is odd one out? i.e. what happens if the primary get corrupted? I
> thought pg repair should fail in this case, but now I'm not so sure.
>

If the primary OSD is down CRUSH will select a different OSD as the primary.

Ceph doesn't know which object is corrupted. It simply knows that the 
secondary OSD does not have the same copy as the primary.

btrfs could help here since it has online checksumming, XFS doesn't.

> Also is there a way to get the information about which objects and on
> which osd are inconsistent, basically the stuff I see in the mon logs
> but get it from a json dump  or from admin socket ? I would like to
> track these errors better by feeding into our metrics collection.

$ ceph pg <pg id> query

That should tell you more.

> Thanks
> Caius
>
>

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on