Re: CEPH MDS Damaged Metadata - recovery steps

James Wilkins <james.wilkins@xxxxxxxxxxxxx> · Tue, 4 Jun 2019 08:07:28 +0000

(Thanks Yan for confirming fix - we'll implement now)

@Marc

Yep - x3 replica on meta-data pools

We have 4 clusters (all running same version) and have experienced meta-data corruption on the majority of them at some time or the other - normally a scan fixes - I suspect due to the use case - think LAMP stacks with various drupal/wordpress caching plugins - which are running within openshift containers and utilising CephFS as a storage backend.  These clusters have all been life-cycled up from Jewel if that matters.

Example;

# ceph osd dump | grep metadata
pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 528539 flags hashpspool stripe_width 0 application cephfs

Only other thing of note is on this particular cluster the meta-data pool is quite large for the number of files - see below re 281GB - other clusters metadata is a lot smaller for similar dataset.

# ceph df
GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED
    66.9TiB     29.1TiB      37.8TiB         56.44
POOLS:
    NAME                ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd                 0      8.86TiB     64.57       4.86TiB      2637530
    cephfs_data         1      2.59TiB     34.72       4.86TiB     25341863
    cephfs_metadata     2       281GiB      5.34       4.86TiB      6755178

Cheers,

James

On 04/06/2019, 08:59, "Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> wrote:

    How did this get damaged? You had 3x replication on the pool?

    -----Original Message-----
    From: Yan, Zheng [mailto:ukernel@xxxxxxxxx] 
    Sent: dinsdag 4 juni 2019 1:14
    To: James Wilkins
    Cc: ceph-users
    Subject: Re:  CEPH MDS Damaged Metadata - recovery steps

    On Mon, Jun 3, 2019 at 3:06 PM James Wilkins 
    <james.wilkins@xxxxxxxxxxxxx> wrote:
    >
    > Hi all,
    >
    > After a bit of advice to ensure we’re approaching this the right way.
    >
    > (version: 12.2.12, multi-mds, dirfrag is enabled)
    >
    > We have corrupt meta-data as identified by ceph
    >
    >     health: HEALTH_ERR
    >             2 MDSs report damaged metadata
    >
    > Asking the mds via damage ls
    >
    >     {
    >         "damage_type": "dir_frag",
    >         "id": 2265410500,
    >         "ino": 2199349051809,
    >         "frag": "*",
    >         "path": 
    "/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html/s
    hop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1"
    >     }
    >
    >
    > We’ve done the steps outlined here -> 
    > http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/ namely
    >
    > cephfs-journal-tool –fs:all journal reset (both ranks) 
    > cephfs-data-scan scan extents / inodes / links has completed
    >
    > However when attempting to access the named folder we get:
    >
    > 2019-05-31 03:16:04.792274 7f56f6fb5700 -1 log_channel(cluster) log 
    > [ERR] : dir 0x200136b41a1 object missing on disk; some files may be 
    > lost 
    > (/projects/17343-5bcdaf07f4055-managed-server-0/apache-echfq-data/html
    > /shop/app/cache/prod/smarty/cache/iqitreviews/simple/21832/1)
    >
    > We get this error followed shortly by an MDS failover
    >
    > Two questions really
    >
    > What’s not immediately clear from the documentation is should we/do 
    we also need to run the below?
    >
    > # Session table
    > cephfs-table-tool 0 reset session
    > # SnapServer
    > cephfs-table-tool 0 reset snap
    > # InoTable
    > cephfs-table-tool 0 reset inode
    > # Root inodes ("/" and MDS directory)
    > cephfs-data-scan init
    >

    No, don't do this.

    > And secondly – our current train of thought is we need to grab the 
    inode number of the parent folder and delete this from the metadata pool 
    via rados rmomapkey – is this correct?
    >

    Yes, find inode number of directory 21832. check if omap key '1_head'
    exist in object <inode of directory in hex>.00000000. If it exists, 
    remove it.

    > Any input appreciated
    >
    > Cheers,
    >
    >
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@xxxxxxxxxxxxxx
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com