fix active+clean+inconsistent on cephfs when digest != digest

core <core@xxxxxxxxxxx> · Tue, 19 May 2015 20:16:05 +0200

Hi list,

I was struggeling quiet a while with the problem that on my cephfs data pool some PG’s stays inconsistent and could not be repaired. The message in OSD’s log was like 

>> repair 11.23a 57b4363a/20000015b67.000006e1/head//11 on disk data digest 0x325d0322 != 0xe8c0243

and then the repair finished without fixing the error.

After searching a long time I finally stumbled over this mail : http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-July/041618.html 
which helped to understand and solve the problem.

It finally turned out to be an issue that can be easily remediated, so if others come to the same problem this procedure may help - if you have other suggestions on how to fix this issues please feel free to comment.
Basically it’s only about identifying the problematic inode and remove it, means copy the file (which generates a new inode number) and remove the old one.

My cluster : 

Hammer version: ceph version 0.94.1
Nodes : 4
OSD’s : 20 x 2TB 
problem Pool: cephfs data pool, size = 2

The problem: during power outages all cluster nodes came back uncontrolled and disappeared again (flapping) so after a while everything got shuffled around and in normal conditions I would say it’s broken. Ceph survived ! cool :-)

Anyway there were this “active+clean+inconsistent” PG’s that even “ceph pg repair <pgid>” could not fix anymore because the base digest was wrong.

This procedure only works for CEPHFS data pools. So here is what I did : 

—————————

1) check which pg’s are inconsistent
> ceph health detail

2) check which osd’s are active for this PG's
> ceph pg map <pgid>

3) check the osd’s log for repair errors to find the affected inode 
> grep <pgid> /var/log/ceph/*osd*log
>> repair 11.23a 57b4363a/20000015b67.000006e1/head//11 on disk data digest 0x325d0322 != 0xe8c0243

the first part is the inode number in hex : 20000015b67.000006e1 => this is the inode number : 20000015b67

4) having the hex inode number, convert it to integer to be passed to find : 
> (python) print int(“0x20000015b67”, 0)
> (python) 2199023305503

5) now we have the inode number and can search on the cephfs for this inode number : 
> find /cephfs -inum 2199023305503

6) usually you’ll end up with 1 file that you can just copy like : 
> cp <original> <original_new>

7) remove the original file (and the inode)
> rm <original> && mv <original_new> <original>

8) now run ceph pg repair once again
> ceph pg repair <pgid>

9) the broken digest will be removed as the inode does not exist anymore and the inconsistency goes away.
> problem solved :-)

—————————

That’s all - cluster is clean again and I am quiet impressed about the stability (and for sure performance) and would like to thank the developers for this great piece of code ;)

Thanks and best regards

marco

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com