Hello, I've inherited a Ceph cluster from someone who has left zero documentation or any handover. A couple days ago it decided to show the entire company what it is capable of.. The health report looks like this: [root@host mnt]# ceph -s cluster: id: 809718aa-3eac-4664-b8fa-38c46cdbfdab health: HEALTH_ERR 1 MDSs report damaged metadata 1 MDSs are read only 2 MDSs report slow requests 6 MDSs behind on trimming Reduced data availability: 2 pgs stale Degraded data redundancy: 2593/186803520 objects degraded (0.001%), 2 pgs degraded, 2 pgs undersized 1 slow requests are blocked > 32 sec. Implicated osds 716 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38 services: mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0 mgr: a(active) mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up {[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf 7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active}, 4 up:sta ndby-replay osd: 39 osds: 39 up, 38 in data: pools: 5 pools, 706 pgs objects: 91212k objects, 4415 GB usage: 10415 GB used, 13024 GB / 23439 GB avail pgs: 2593/186803520 objects degraded (0.001%) 703 active+clean 2 stale+active+undersized+degraded 1 active+clean+scrubbing+deep io: client: 168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr The offending broken MDS entry (damaged metadata) seems to be this: mds.ceph-fs-5b997cbf7b-5tjwh: [ { "damage_type": "dir_frag", "id": 1190692215, "ino": 2199023258131, "frag": "*", "path": "/f/01/59" } ] Is there any idea how I can diagnose and find out what is wrong? For the other issues I'm not even sure what/where I need to look into. Cheers, Sangwhan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com