Awesome! I'm glad it worked out this far! At least you have a working
filesystem now even it means that you may have to use a backup.
But now I can say it: Having only three OSDs is really not the best
idea. ;-) Are all those OSDs on the same host?
Zitat von Sagara Wijetunga <sagarawmw@xxxxxxxxx>:
Hi Eugen
Now the Ceph is HEALTH_OK.
> I think what we need to do now is:
1. Get the MDS.0 recover, discard if necessary part of the object
200.00006048 and bring the MSD.0 up.
Yes, I agree, I just can't tell what the best way is here, maybe
remove all three objects from the disks (make a backup before doing
that, just in case) and try the steps to recover the journal (also
make a backup of the journal first):
mds01:~ # systemctl stop ceph-mds@mds01.service
mds01:~ # cephfs-journal-tool journal export myjournal.bin
mds01:~ # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
mds01:~ # cephfs-journal-tool --rank=cephfs:0 journal reset
mds01:~ # cephfs-table-tool all reset session
mds01:~ # systemctl start ceph-mds(a)mds01.service
mds01:~ # ceph mds repaired 0
mds01:~ # ceph daemon mds.mds01 scrub_path / recursive repair
Only the last step above failed as follows:
# ceph daemon mds.a scrub_path / recursive repair
"mds_not_active"
failed
But the ceph -w showed:
2021-05-22 23:30:00.199164 mon.a [INF] Health check cleared:
MDS_DAMAGE (was: 1 mds daemon damaged)
2021-05-22 23:30:00.208558 mon.a [INF] Standby daemon mds.c assigned
to filesystem cephfs as rank 0
2021-05-22 23:30:00.208614 mon.a [INF] Health check cleared:
MDS_ALL_DOWN (was: 1 filesystem is offline)
2021-05-22 23:30:04.029282 mon.a [INF] daemon mds.c is now active in
filesystem cephfs as rank 0
2021-05-22 23:30:04.378670 mon.a [INF] Health check cleared:
FS_DEGRADED (was: 1 filesystem is degraded)
Since most errors fixed, I tried to repair 2.44:
ceph pg repair 2.44
ceph -w
2021-05-23 00:00:00.009926 mon.a [ERR] overall HEALTH_ERR 4 scrub
errors; Possible data damage: 1 pg inconsistent
2021-05-23 00:01:17.454975 mon.a [INF] Health check cleared:
OSD_SCRUB_ERRORS (was: 4 scrub errors)
2021-05-23 00:01:17.454993 mon.a [INF] Health check cleared:
PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2021-05-23 00:01:17.455002 mon.a [INF] Cluster is now healthy
2021-05-23 00:01:13.544097 osd.0 [ERR] 2.44 repair : stat mismatch,
got 108/109 objects, 0/0 clones, 108/109 dirty, 108/109 omap, 0/0
pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/1555896 bytes, 0/0
manifest objects, 0/0 hit_set_archive bytes.
2021-05-23 00:01:13.544154 osd.0 [ERR] 2.44 repair 1 errors, 1 fixed
# ceph -s
cluster:
id: abc...
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 22h)
mgr: a(active, since 22h), standbys: b, c
mds: cephfs:1 {0=c=up:active} 2 up:standby
osd: 3 osds: 3 up (since 22h), 3 in (since 22h)
task status:
scrub status:
mds.c: idle
data: pools: 3 pools, 192 pgs
objects: 281.06k objects, 327 GiB
usage: 2.4 TiB used, 8.1 TiB / 11 TiB avail
pgs: 192 active+clean
I mounted the CephFS as before and tried following:cephfs-data-scan
pg_files /mnt/ceph/Home/sagara 2.44
But it complains invalid path. I'm trying to see what files are
effected by the missing object in PG 2.44.
Thank you very much helping this far.
But I still prefer to understand whether any file effected by this disaster.
Best regardsSagara
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx