Re: Extremally need help. Openshift cluster is down :c

Eugen Block <eblock@xxxxxx> · Sat, 11 Feb 2023 12:47:50 +0000

Hi,

do you have log output from the read-only MDS, probably in debug mode?

Zitat von kreept.sama@xxxxxxxxx:

Hello everyone and sorry. Maybe someone has already faced this problem.
A day ago, we restored our Openshift cluster, however, at the  
moment, the PVCs cannot connect to the pod. We looked at the status  
of the ceph and found that our MDS were in standby mode, then found  
that the metadata was corrupted. After some manipulations, we were  
able to turn on our MDS daemons, but there is still no record on the  
cluster, the ceph status command shows the following.

sh-4.4$ ceph -s
  cluster:
    id:     9213604e-b0b6-49d5-bcb3-f55ab3d79119
    health: HEALTH_ERR
            1 MDSs report damaged metadata
            1 MDSs are read only
            6 daemons have recently crashed
  services:
    mon: 5 daemons, quorum bd,bj,bm,bn,bo (age 26h)
    mgr: a(active, since 25h)
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 41h), 9 in (since 42h)
    rgw: 1 daemon active (1 hosts, 1 zones)
  data:
    volumes: 1/1 healthy
    pools:   10 pools, 225 pgs
    objects: 1.60M objects, 234 GiB
    usage:   606 GiB used, 594 GiB / 1.2 TiB avail
    pgs:     225 active+clean
  io:
    client:   852 B/s rd, 1 op/s rd, 0 op/s wr

Now we trying to follow this instructions:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

What else have we tried:

cephfs-journal-tool --rank=1:0 event recover_dentries summary
cephfs-journal-tool --rank=1:0 journal reset
cephfs-table-tool all reset session
ceph tell mds.gml--cephfs-a scrub start / recursive repair force
ceph tell mds.gml--cephfs-b scrub start / recursive repair force
ceph mds repaired 0

ceph tell mds.gml--cephfs-a damage ls

[
    {
        "damage_type": "dir_frag",
        "id": 26851730,
        "ino": 1100162409473,
        "frag": "*",
        "path":  
"/volumes/csi/csi-vol-5ad18c03-3205-11ed-9ba7-0a580a810206/e5664004-51e0-4bff-85c8-029944b431d8/store/096/096a1497-78ab-4802-a5a7-d09e011fd3a5/202301_1027796_1027796_0"
    },
………

    {
        "damage_type": "dir_frag",
        "id": 118336643,
        "ino": 1100162424469,
        "frag": "*",
        "path":  
"/volumes/csi/csi-vol-5ad18c03-3205-11ed-9ba7-0a580a810206/e5664004-51e0-4bff-85c8-029944b431d8/store/096/096a1497-78ab-4802-a5a7-d09e011fd3a5/202301_1027832_1027832_0"
    },

Now we trying:

# Session table
cephfs-table-tool 0 reset session
# SnapServer
cephfs-table-tool 0 reset snap
# InoTable
cephfs-table-tool 0 reset inode
# Journal
cephfs-journal-tool --rank=0 journal reset
# Root inodes ("/" and MDS directory)
cephfs-data-scan init

cephfs-data-scan scan_extents <data pool>
cephfs-data-scan scan_inodes <data pool>
cephfs-data-scan scan_links

Is it right way and cant it be our salvation?
Thank you!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx