Re: One PG active+clean+inconsistent and repair says object size mismatch

Sage Weil <sage@xxxxxxxxxxx> · Fri, 29 Mar 2013 12:56:21 -0700 (PDT)

On Fri, 29 Mar 2013, Wido den Hollander wrote:
> Hi,
> 
> I'd assume more people are going to encounter this, so I thought an e-mail to
> the ceph-users list would be best.
> 
> On a cluster I have one PG which is active+clean+inconsistent.
> 
> I tried this:
> $ ceph pg repair 2.6a5
> 
> In my logs it showed:
> 
> 2013-03-29 20:27:07.177416 osd.4 [ERR] repair 2.6a5
> 93f8cea5/rb.0.2.000000000251/head//2 on disk size (2097152) does not match
> object info size (4194304)
> 2013-03-29 20:27:07.177869 osd.4 [ERR] 2.6a5 repair 1 errors, 0 fixed
> 
> osd.4, osd.22 and osd.39 are acting where 4 is primary.
> 
> On osd.4 I verified that the on-disk size of the object is indeed 2097152
> bytes.
> 
> However, on osd.22 and osd.39 the object rb.0.2.000000000251__head_93F8CEA5__2
> is also 2097152 big.
> 
> According to the PG this object should be exactly 4MB big, but is that
> correct? I can't verify if it should really have been that size, since it
> could be a filesystem which only partially wrote to that object.
> 
> A stat() tells me that the last change of this file was 2012-10-17, so it
> can't be due to a recent change to the file/object.
> 
> My initial idea was to copy the object to osd.4 from one of the other OSDs,
> but the md5sum is the same on all 3 OSDs.
> 
> So my question is, why is this PG inconsistent? This object is the only object
> in that PG, so it has to be the issue.
> 
> I'm running 0.56.4 with the 3.8 kernel with btrfs.

At some point in the past, the on-disk object size got out of sync with 
the metadata.  When we've seen this, its usually been due to crashes and 
journal replay issues (not btrfs!), although there have been many other 
fixes in the argonaut -> bobtail timeframe that may have caused this.  Is 
this the first scrub in a while?  Or was the cluster doing a lot of 
recovery/rebalancing recently?  It sounds like this happened long ago but 
was just now noticed.

Usually a bug triggers on one copy, and then the change get propagated to 
the others during recovery.  If there was recent activity, it may be a 
problem that is still present...

In any case, the simplest way to repair is to truncate --size 4194304 
<filename> on all of the replicas.

sage

> 
> -- 
> Wido den Hollander
> 42on B.V.
> 
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com