On 7/25/19 6:49 AM, Sangwhan Moon wrote: > Hello, > > I've inherited a Ceph cluster from someone who has left zero documentation or any handover. A couple days ago it decided to show the entire company what it is capable of.. > > The health report looks like this: > > [root@host mnt]# ceph -s > cluster: > id: 809718aa-3eac-4664-b8fa-38c46cdbfdab > health: HEALTH_ERR > 1 MDSs report damaged metadata > 1 MDSs are read only > 2 MDSs report slow requests > 6 MDSs behind on trimming > Reduced data availability: 2 pgs stale > Degraded data redundancy: 2593/186803520 objects degraded (0.001%), 2 pgs degraded, 2 pgs undersized > 1 slow requests are blocked > 32 sec. Implicated osds > 716 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38\ I would start here: > > services: > mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0 > mgr: a(active) > mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up {[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf > 7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active}, 4 up:sta > ndby-replay > osd: 39 osds: 39 up, 38 in > > data: > pools: 5 pools, 706 pgs > objects: 91212k objects, 4415 GB > usage: 10415 GB used, 13024 GB / 23439 GB avail > pgs: 2593/186803520 objects degraded (0.001%) > 703 active+clean > 2 stale+active+undersized+degraded This is a problem! Can you check: $ ceph pg dump_stuck The PGs will start with a number like 8.1a where '8' it the pool ID. Then check: $ ceph df To which pools to those PGs belong? Then check: $ ceph pg <PGID> query And the bottom somewhere should show why these PGs are not active. You might even want to try a restart of these OSDs involved with those two PGs. Wido > 1 active+clean+scrubbing+deep > > io: > client: 168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr > > The offending broken MDS entry (damaged metadata) seems to be this: > > mds.ceph-fs-5b997cbf7b-5tjwh: [ > { > "damage_type": "dir_frag", > "id": 1190692215, > "ino": 2199023258131, > "frag": "*", > "path": "/f/01/59" > } > ] > > Is there any idea how I can diagnose and find out what is wrong? For the other issues I'm not even sure what/where I need to look into. > > Cheers, > Sangwhan > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com