cephfs metadata damage and scrub error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm having some issues with a ceph cluster.  It's an 8 node cluster rnning Jewel ceph-10.2.7-0.el7.x86_64 on CentOS 7.
This cluster provides RBDs and a CephFS filesystem to a number of clients.

ceph health detail is showing the following errors:

pg 2.9 is active+clean+inconsistent, acting [3,10,11,23]
1 scrub errors
mds0: Metadata damage detected


The pg 2.9 is in the cephfs_metadata pool (id 2).

I've looked at the OSD logs for OSD 3, which is the primary for this PG, but the only thing that appears relating to this PG is the following:

log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors

After initiating a ceph pg repair 2.9, I see the following in the primary OSD log:

log_channel(cluster) log [ERR] : 2.9 repair 1 errors, 0 fixed
log_channel(cluster) log [ERR] : 2.9 deep-scrub 1 errors


I found the below command in a previous ceph-users post.  Running this returns the following:

# rados list-inconsistent-obj 2.9
{"epoch":23738,"inconsistents":[{"object":{"name":"10000411194.00000000","nspace":"","locator":"","snap":"head","version":14737091},"errors":["omap_digest_mismatch"],"union_shard_errors":[],"selected_object_info":"2:9758b358:::10000411194.00000000:head(33456'14737091 mds.0.214448:248532 dirty|omap|data_digest s 0 uv 14737091 dd ffffffff)","shards":[{"osd":3,"errors":[],"size":0,"omap_digest":"0x6748eef3","data_digest":"0xffffffff"},{"osd":10,"errors":[],"size":0,"omap_digest":"0xa791d5a4","data_digest":"0xffffffff"},{"osd":11,"errors":[],"size":0,"omap_digest":"0x53f46ab0","data_digest":"0xffffffff"},{"osd":23,"errors":[],"size":0,"omap_digest":"0x97b80594","data_digest":"0xffffffff"}]}]}


So from this, I think that the object in PG 2.9 with the problem is 10000411194.00000000.

This is what I see on the filesystem on the 4 OSD's this PG resides on:

-rw-r--r--. 1 ceph ceph 0 Apr 27 12:31 /var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
-rw-r--r--. 1 ceph ceph 0 Apr 15 22:05 /var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
-rw-r--r--. 1 ceph ceph 0 Apr 15 22:07 /var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
-rw-r--r--. 1 ceph ceph 0 Apr 16 03:58 /var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2

The extended attrs are as follows, although I have no idea what any of them mean.

# file: var/lib/ceph/osd/ceph-11/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
user.ceph._@1=0s//////8=
user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
user.ceph._parent@1=0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
user.cephos.seq=0sAQEQAAAAgcAqFAAAAAAAAAAAAgAAAAA=
user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute path names

# file: var/lib/ceph/osd/ceph-3/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
user.ceph._@1=0s//////8=
user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
user.ceph._parent@1=0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
user.cephos.seq=0sAQEQAAAAZaQ9GwAAAAAAAAAAAgAAAAA=
user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute path names

# file: var/lib/ceph/osd/ceph-10/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
user.ceph._@1=0s//////8=
user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
user.ceph._parent@1=0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
user.cephos.seq=0sAQEQAAAA1T1dEQAAAAAAAAAAAgAAAAA=
user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute path names

# file: var/lib/ceph/osd/ceph-23/current/2.9_head/DIR_9/DIR_E/DIR_A/DIR_1/10000411194.00000000__head_1ACD1AE9__2
user.ceph._=0sDwj5AAAABAM1AAAAAAAAABQAAAAxMDAwMDQxMTE5NC4wMDAwMDAwMP7/////////6RrNGgAAAAAAAgAAAAAAAAAGAxwAAAACAAAAAAAAAP////8AAAAAAAAAAP//////////AAAAABUn4QAAAAAAu4IAAK4m4QAAAAAAu4IAAAICFQAAAAIAAAAAAAAAAOSZDAAAAAAAsEUDAAAAAAAAAAAAjUoIWUgWsQQCAhUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVJ+EAAAAAAAAAAAAAAAAAABwAAACNSghZESm8BP///w==
user.ceph._@1=0s//////8=
user.ceph._layout=0sAgIYAAAAAAAAAAAAAAAAAAAA//////////8AAAAA
user.ceph._parent=0sBQRPAQAAlBFBAAABAAAIAAAAAgIjAAAAjxFBAAABAAAPAAAAdHViZWFtYXRldXIubmV0qdgAAAAAAAACAh0AAAB/EUEAAAEAAAkAAAB3cC1yb2NrZXREAAAAAAAAAAICGQAAABYNQQAAAQAABQAAAGNhY2hlUgAAAAAAAAACAh4AAAAQDUEAAAEAAAoAAAB3cC1jb250ZW50NAMAAAAAAAACAhgAAAANDUEAAAEAAAQAAABodG1sIAEAAAAAAAACAikAAADagTMAAAEAABUAAABuZ2lueC1waHA3LWNsdmdmLWRhdGGJAAAAAAAAAAICMwAAADkAAAAAAQ==
user.ceph._parent@1=0sAAAfAAAANDg4LTU3YjI2NTdmMmZhMTMtbWktcHJveWVjdG8tMXSQCAAAAAAAAgIcAAAAAQAAAAAAAAAIAAAAcHJvamVjdHPBAgcAAAAAAAIAAAAAAAAAAAAAAA==
user.ceph.snapset=0sAgIZAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAA==
user.cephos.seq=0sAQEQAAAADiM7AAAAAAAAAAAAAgAAAAA=
user.cephos.spill_out=0sMAA=getfattr: Removing leading '/' from absolute path names


With metadata damage issue, I can get the list of inodes with the command below.

$ ceph tell mds.0 damage ls | python -m "json.tool"
[
    {
        "damage_type": "dir_frag",
        "frag": "*",
        "id": 5129156,
        "ino": 1099556021325
    },
    {
        "damage_type": "dir_frag",
        "frag": "*",
        "id": 8983971,
        "ino": 1099548098243
    },
    {
        "damage_type": "dir_frag",
        "frag": "*",
        "id": 33278608,
        "ino": 1099548257921
    },
    {
        "damage_type": "dir_frag",
        "frag": "*",
        "id": 33455691,
        "ino": 1099548271575
    },
    {
        "damage_type": "dir_frag",
        "frag": "*",
        "id": 38203788,
        "ino": 1099548134708
    },
...

All of the inodes (approx 800 of them) are for various directories within a wordpress cache directory.
I ran an rm -rf on each of the directories as I do not need the content.  The content of the directories was removed, but the directories are unable to be removed as rmdir reports they are not empty, despite having 0 files listed with ls.

I'm not sure if these two issues are related to each other.  They were noticed within a day of each other.  I think the metadata damage error appeared before the scrub error.

I'm at a bit of a loss with how to proceed and I don't want to make things worse.

I'd really appreciate any help that anyone can give to try and resolve these problems.

Thanks

J
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux