Re: One PG active+clean+inconsistent and repair says object size mismatch

Wido den Hollander <wido@xxxxxxxx> · Fri, 29 Mar 2013 21:14:03 +0100

On 03/29/2013 08:56 PM, Sage Weil wrote:
On Fri, 29 Mar 2013, Wido den Hollander wrote:
Hi,

I'd assume more people are going to encounter this, so I thought an e-mail to
the ceph-users list would be best.

On a cluster I have one PG which is active+clean+inconsistent.

I tried this:
$ ceph pg repair 2.6a5

In my logs it showed:

2013-03-29 20:27:07.177416 osd.4 [ERR] repair 2.6a5
93f8cea5/rb.0.2.000000000251/head//2 on disk size (2097152) does not match
object info size (4194304)
2013-03-29 20:27:07.177869 osd.4 [ERR] 2.6a5 repair 1 errors, 0 fixed

osd.4, osd.22 and osd.39 are acting where 4 is primary.

On osd.4 I verified that the on-disk size of the object is indeed 2097152
bytes.

However, on osd.22 and osd.39 the object rb.0.2.000000000251__head_93F8CEA5__2
is also 2097152 big.

According to the PG this object should be exactly 4MB big, but is that
correct? I can't verify if it should really have been that size, since it
could be a filesystem which only partially wrote to that object.

A stat() tells me that the last change of this file was 2012-10-17, so it
can't be due to a recent change to the file/object.

My initial idea was to copy the object to osd.4 from one of the other OSDs,
but the md5sum is the same on all 3 OSDs.

So my question is, why is this PG inconsistent? This object is the only object
in that PG, so it has to be the issue.

I'm running 0.56.4 with the 3.8 kernel with btrfs.
At some point in the past, the on-disk object size got out of sync with
the metadata.  When we've seen this, its usually been due to crashes and
journal replay issues (not btrfs!), although there have been many other
fixes in the argonaut -> bobtail timeframe that may have caused this.  Is
this the first scrub in a while?  Or was the cluster doing a lot of
recovery/rebalancing recently?  It sounds like this happened long ago but
was just now noticed.
The cluster was indeed doing a lot of rebalancing and recovery lately. 
Some OSDs have been out and down for quite some time in this cluster.
Usually a bug triggers on one copy, and then the change get propagated to
the others during recovery.  If there was recent activity, it may be a
problem that is still present...

In any case, the simplest way to repair is to truncate --size 4194304
<filename> on all of the replicas.

Yes, that worked. I ran the truncate for all 3 files and then issues a 
pg repair. That resolved it.
Thanks!

Wido

sage

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com