Re: Problem with inconsistent PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 10 Feb 2012, Jens Rehpöhler wrote:
> Hi Liste,
> 
> today i've got another problem.
> 
> ceph -w shows up with an inconsistent PG over night:
> 
> 2012-02-10 08:38:48.701775    pg v441251: 1982 pgs: 1981 active+clean, 1
> active+clean+inconsistent; 1790 GB data, 3368 GB used, 18977 GB / 22345
> GB avail
> 2012-02-10 08:38:49.702789    pg v441252: 1982 pgs: 1981 active+clean, 1
> active+clean+inconsistent; 1790 GB data, 3368 GB used, 18977 GB / 22345
> GB avail
> 
> I've identified it with "ceph pg dump - | grep inconsistent
> 
> 109.6    141    0    0    0    463820288    111780    111780   
> active+clean+inconsistent    485'7115    480'7301    [3,4]    [3,4]   
> 485'7061    2012-02-10 08:02:12.043986
> 
> Now I've tried to repair it with: ceph pg repair 109.6
> 
> 2012-02-10 08:35:52.276325 mon <- [pg,repair,109.6]
> 2012-02-10 08:35:52.276776 mon.1 -> 'instructing pg 109.6 on osd.3 to
> repair' (0)
> 
> but i only get the following result:
> 
> 2012-02-10 08:36:18.447553   log 2012-02-10 08:36:08.455420 osd.3
> 10.10.10.8:6801/25980 6913 : [ERR] 109.6 osd.4: soid
> 1ef398ce/rb.0.0.0000000000bd/headsize 2736128 != known size 3145728
> 2012-02-10 08:36:18.447553   log 2012-02-10 08:36:08.455426 osd.3
> 10.10.10.8:6801/25980 6914 : [ERR] 109.6 scrub 0 missing, 1 inconsistent
> objects
> 2012-02-10 08:36:18.447553   log 2012-02-10 08:36:08.455799 osd.3
> 10.10.10.8:6801/25980 6915 : [ERR] 109.6 scrub 1 errors
> 
> Can someone please explain me what to do in this case and how to recover
> the pg ?

So the "fix" is just to truncate the file to the expected size, 3145728, 
by finding it in the current/ directory.  The name/path will be slightly 
weird; look for 'rb.0.0.0000000000bd'.

The data is still suspect, though.  Did the ceph-osd restart or crash 
recently?  I would do that, repair (it should succeed), and then fsck the 
file system in that rbd image.

We just fixed a bug that was causing transactions to leak across 
checkpoint/snapshot boundaries.  That could be responsible for causing all 
sorts of subtle corruptions, including this one.  It'll be included in 
v0.42 (out next week).

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux