Re: Inconsistent PG won't repair

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Rich,

Is the object inconsistent and 0-bytes on all OSDs?

We ran into a similar issue on Jewel, where an object was empty across the board but had inconsistent metadata. Ultimately it was resolved by doing a "rados get" and then a "rados put" on the object. *However* that was a last ditch effort after I couldn't get any other repair option to work, and I have no idea if that will cause any issues down the road :)

--Lincoln

> On Oct 20, 2017, at 10:16 AM, Richard Bade <hitrich@xxxxxxxxx> wrote:
> 
> Hi Everyone,
> In our cluster running 0.94.10 we had a pg pop up as inconsistent
> during scrub. Previously when this has happened running ceph pg repair
> [pg_num] has resolved the problem. This time the repair runs but it
> remains inconsistent.
> ~$ ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 2 scrub errors; noout flag(s) set
> pg 3.f05 is active+clean+inconsistent, acting [171,23,131]
> 1 scrub errors
> 
> The error in the logs is:
> cstor01 ceph-mon: osd.171 10.233.202.21:6816/12694 45 : deep-scrub
> 3.f05 3/68ab5f05/rbd_data.19cdf512ae8944a.000000000001bb56/snapdir
> expected clone 3/68ab5f05/rbd_data.19cdf512ae8944a.000000000001bb56/148d2
> 
> Now, I've tried several things to resolve this. I've tried stopping
> each of the osd's in turn and running a repair. I've located the rbd
> image and removed it to empty out the object. The object is now zero
> bytes but still inconsistent. I've tried stopping each osd, removing
> the object and starting the osd again. It correctly identifies the
> object as missing and repair works to fix this but it still remains
> inconsistent.
> I've run out of ideas.
> The object is now zero bytes:
> ~$ find /var/lib/ceph/osd/ceph-23/current/3.f05_head/ -name
> "*19cdf512ae8944a.000000000001bb56*" -ls
> 537598582      0 -rw-r--r--   1 root     root            0 Oct 21
> 03:54 /var/lib/ceph/osd/ceph-23/current/3.f05_head/DIR_5/DIR_0/DIR_F/DIR_5/DIR_B/rbd\\udata.19cdf512ae8944a.000000000001bb56__snapdir_68AB5F05__3
> 
> How can I resolve this? Is there some way to remove the empty object
> completely? I saw reference to ceph-objectstore-tool which has some
> options to remove-clone-metadata but I don't know how to use this.
> Will using this to remove the mentioned 148d2 expected clone resolve
> this? Or would this do the opposite as it would seem that it can't
> find that clone?
> Documentation on this tool is sparse.
> 
> Any help here would be appreciated.
> 
> Regards,
> Rich
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux