HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

"Sangwhan Moon" <sangwhan@xxxxxx> · Thu, 25 Jul 2019 13:49:22 +0900

Hello,

I've inherited a Ceph cluster from someone who has left zero documentation or any handover. A couple days ago it decided to show the entire company what it is capable of..

The health report looks like this:

[root@host mnt]# ceph -s
  cluster:
    id:     809718aa-3eac-4664-b8fa-38c46cdbfdab
    health: HEALTH_ERR
            1 MDSs report damaged metadata
            1 MDSs are read only
            2 MDSs report slow requests
            6 MDSs behind on trimming
            Reduced data availability: 2 pgs stale
            Degraded data redundancy: 2593/186803520 objects degraded (0.001%), 2 pgs degraded, 2 pgs undersized
            1 slow requests are blocked > 32 sec. Implicated osds
            716 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38

  services:
    mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0
    mgr: a(active)
    mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up  {[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf
7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active}, 4 up:sta
ndby-replay
    osd: 39 osds: 39 up, 38 in

  data:
    pools:   5 pools, 706 pgs
    objects: 91212k objects, 4415 GB
    usage:   10415 GB used, 13024 GB / 23439 GB avail
    pgs:     2593/186803520 objects degraded (0.001%)
             703 active+clean
             2   stale+active+undersized+degraded
             1   active+clean+scrubbing+deep

  io:
    client:   168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr

The offending broken MDS entry (damaged metadata) seems to be this:

mds.ceph-fs-5b997cbf7b-5tjwh: [
    {
        "damage_type": "dir_frag",
        "id": 1190692215,
        "ino": 2199023258131,
        "frag": "*",
        "path": "/f/01/59"
    }
]

Is there any idea how I can diagnose and find out what is wrong? For the other issues I'm not even sure what/where I need to look into.

Cheers,
Sangwhan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com