Re: One mds daemon damaged, filesystem is offline. How to recover?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Hi Eugen
 Now the Ceph is HEALTH_OK.

 > I think what we need to do now is:
> 1. Get the MDS.0 recover, discard if necessary part of the object  
> 200.00006048 and bring the MSD.0 up.

Yes, I agree, I just can't tell what the best way is here, maybe  
remove all three objects from the disks (make a backup before doing  
that, just in case) and try the steps to recover the journal (also  
make a backup of the journal first):

mds01:~ # systemctl stop ceph-mds@mds01.service

mds01:~ # cephfs-journal-tool journal export myjournal.bin

mds01:~ # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary

mds01:~ # cephfs-journal-tool --rank=cephfs:0 journal reset

mds01:~ # cephfs-table-tool all reset session

mds01:~ # systemctl start ceph-mds(a)mds01.service

mds01:~ # ceph mds repaired 0

mds01:~ # ceph daemon mds.mds01 scrub_path / recursive repair

Only the last step above failed as follows:
# ceph daemon mds.a scrub_path / recursive repair
"mds_not_active"
failed

But the ceph -w showed:
2021-05-22 23:30:00.199164 mon.a [INF] Health check cleared: MDS_DAMAGE (was: 1 mds daemon damaged)
2021-05-22 23:30:00.208558 mon.a [INF] Standby daemon mds.c assigned to filesystem cephfs as rank 0
2021-05-22 23:30:00.208614 mon.a [INF] Health check cleared: MDS_ALL_DOWN (was: 1 filesystem is offline)
2021-05-22 23:30:04.029282 mon.a [INF] daemon mds.c is now active in filesystem cephfs as rank 0
2021-05-22 23:30:04.378670 mon.a [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)

Since most errors fixed, I tried to repair 2.44:
ceph pg repair 2.44
ceph -w
2021-05-23 00:00:00.009926 mon.a [ERR] overall HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent
2021-05-23 00:01:17.454975 mon.a [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 4 scrub errors)
2021-05-23 00:01:17.454993 mon.a [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2021-05-23 00:01:17.455002 mon.a [INF] Cluster is now healthy
2021-05-23 00:01:13.544097 osd.0 [ERR] 2.44 repair : stat mismatch, got 108/109 objects, 0/0 clones, 108/109 dirty, 108/109 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/1555896 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2021-05-23 00:01:13.544154 osd.0 [ERR] 2.44 repair 1 errors, 1 fixed

 # ceph -s
  cluster:
    id:     abc...
    health: HEALTH_OK
   services:
    mon: 3 daemons, quorum a,b,c (age 22h)
    mgr: a(active, since 22h), standbys: b, c
    mds: cephfs:1 {0=c=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 22h), 3 in (since 22h)
 
  task status:
    scrub status:
        mds.c: idle
   data:    pools:   3 pools, 192 pgs
    objects: 281.06k objects, 327 GiB
    usage:   2.4 TiB used, 8.1 TiB / 11 TiB avail
    pgs:     192 active+clean


I mounted the CephFS as before and tried following:cephfs-data-scan pg_files /mnt/ceph/Home/sagara 2.44

But it complains invalid path. I'm trying to see what files are effected by the missing object in PG 2.44.
Thank you very much helping this far. 
But I still prefer to understand whether any file effected by this disaster.
Best regardsSagara


  
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux