Extremally need help. Openshift cluster is down :c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone and sorry. Maybe someone has already faced this problem. 
A day ago, we restored our Openshift cluster, however, at the moment, the PVCs cannot connect to the pod. We looked at the status of the ceph and found that our MDS were in standby mode, then found that the metadata was corrupted. After some manipulations, we were able to turn on our MDS daemons, but there is still no record on the cluster, the ceph status command shows the following.

sh-4.4$ ceph -s
  cluster:
    id:     9213604e-b0b6-49d5-bcb3-f55ab3d79119
    health: HEALTH_ERR
            1 MDSs report damaged metadata
            1 MDSs are read only
            6 daemons have recently crashed
  services:
    mon: 5 daemons, quorum bd,bj,bm,bn,bo (age 26h)
    mgr: a(active, since 25h)
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 41h), 9 in (since 42h)
    rgw: 1 daemon active (1 hosts, 1 zones)
  data:
    volumes: 1/1 healthy
    pools:   10 pools, 225 pgs
    objects: 1.60M objects, 234 GiB
    usage:   606 GiB used, 594 GiB / 1.2 TiB avail
    pgs:     225 active+clean
  io:
    client:   852 B/s rd, 1 op/s rd, 0 op/s wr

Now we trying to follow this instructions:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

What else have we tried:

cephfs-journal-tool --rank=1:0 event recover_dentries summary
cephfs-journal-tool --rank=1:0 journal reset
cephfs-table-tool all reset session
ceph tell mds.gml--cephfs-a scrub start / recursive repair force
ceph tell mds.gml--cephfs-b scrub start / recursive repair force
ceph mds repaired 0

ceph tell mds.gml--cephfs-a damage ls

[
    {
        "damage_type": "dir_frag",
        "id": 26851730,
        "ino": 1100162409473,
        "frag": "*",
        "path": "/volumes/csi/csi-vol-5ad18c03-3205-11ed-9ba7-0a580a810206/e5664004-51e0-4bff-85c8-029944b431d8/store/096/096a1497-78ab-4802-a5a7-d09e011fd3a5/202301_1027796_1027796_0"
    },
………

    {
        "damage_type": "dir_frag",
        "id": 118336643,
        "ino": 1100162424469,
        "frag": "*",
        "path": "/volumes/csi/csi-vol-5ad18c03-3205-11ed-9ba7-0a580a810206/e5664004-51e0-4bff-85c8-029944b431d8/store/096/096a1497-78ab-4802-a5a7-d09e011fd3a5/202301_1027832_1027832_0"
    },

Now we trying: 

# Session table
cephfs-table-tool 0 reset session
# SnapServer
cephfs-table-tool 0 reset snap
# InoTable
cephfs-table-tool 0 reset inode
# Journal
cephfs-journal-tool --rank=0 journal reset
# Root inodes ("/" and MDS directory)
cephfs-data-scan init

cephfs-data-scan scan_extents <data pool>
cephfs-data-scan scan_inodes <data pool>
cephfs-data-scan scan_links

Is it right way and cant it be our salvation? 
Thank you!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux