Re: HEALTH_ERR with a kitchen sink of problems: MDS damaged, readonly, and so forth

"Sangwhan Moon" <sangwhan@xxxxxx> · Thu, 25 Jul 2019 15:57:34 +0900

Original Message:
> 
> 
> On 7/25/19 7:49 AM, Sangwhan Moon wrote:
> > Hello,
> > 
> > Original Message:
> >>
> >>
> >> On 7/25/19 6:49 AM, Sangwhan Moon wrote:
> >>> Hello,
> >>>
> >>> I've inherited a Ceph cluster from someone who has left zero documentation or any handover. A couple days ago it decided to show the entire company what it is capable of..
> >>>
> >>> The health report looks like this:
> >>>
> >>> [root@host mnt]# ceph -s
> >>>   cluster:
> >>>     id:     809718aa-3eac-4664-b8fa-38c46cdbfdab
> >>>     health: HEALTH_ERR
> >>>             1 MDSs report damaged metadata
> >>>             1 MDSs are read only
> >>>             2 MDSs report slow requests
> >>>             6 MDSs behind on trimming
> >>>             Reduced data availability: 2 pgs stale
> >>>             Degraded data redundancy: 2593/186803520 objects degraded (0.001%), 2 pgs degraded, 2 pgs undersized
> >>>             1 slow requests are blocked > 32 sec. Implicated osds
> >>>             716 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38\
> >>
> >> I would start here:
> >>
> >>>
> >>>   services:
> >>>     mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0
> >>>     mgr: a(active)
> >>>     mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up  {[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf
> >>> 7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active}, 4 up:sta
> >>> ndby-replay
> >>>     osd: 39 osds: 39 up, 38 in
> >>>
> >>>   data:
> >>>     pools:   5 pools, 706 pgs
> >>>     objects: 91212k objects, 4415 GB
> >>>     usage:   10415 GB used, 13024 GB / 23439 GB avail
> >>>     pgs:     2593/186803520 objects degraded (0.001%)
> >>>              703 active+clean
> >>>              2   stale+active+undersized+degraded
> >>
> >> This is a problem! Can you check:
> >>
> >> $ ceph pg dump_stuck
> >>
> >> The PGs will start with a number like 8.1a where '8' it the pool ID.
> >>
> >> Then check:
> >>
> >> $ ceph df
> >>
> >> To which pools to those PGs belong?
> >>
> >> Then check:
> >>
> >> $ ceph pg <PGID> query
> >>
> >> And the bottom somewhere should show why these PGs are not active. You
> >> might even want to try a restart of these OSDs involved with those two PGs.
> > 
> > Thanks a lot for the suggestions - I just checked and it says that the problematic PGs are 4.4f and 4.59 - but querying those seem result in the following error:
> > 
> > Error ENOENT: i don't have pgid 4.4f
> > 
> > (same applies for 4.59 - they do seem to show up in "ceph pg ls" though.)
> > 
> > In ceph pg ls, it shows that for these PGs UP, UP_PRIMARY ACTING, ACTING_PRIMARY all only have one OSD associated with it. (24, 13 - although both the PG ID mentioned above and these numbers probably don't help much with the diagnosis) Should restarting be a safe thing to try first?
> > 
> > ceph health detail says the following:
> > 
> > MDS_DAMAGE 1 MDSs report damaged metadata
> >     mdsceph-fs-5b997cbf7b-5tjwh(mds.0): Metadata damage detected
> > MDS_READ_ONLY 1 MDSs are read only
> >     mdsceph-fs-5b997cbf7b-5tjwh(mds.0): MDS in read-only mode
> > MDS_SLOW_REQUEST 2 MDSs report slow requests
> >     mdsuser-fs-5668c75f9f-hflps(mds.0): 3 slow requests are blocked > 30 sec
> >     mdsuser-fs-5668c75f9f-jf59x(mds.1): 980 slow requests are blocked > 30 sec
> > MDS_TRIM 6 MDSs behind on trimming
> >     mdsuser-fs-5668c75f9f-hflps(mds.0): Behind on trimming (342/128) max_segments: 128, num_segments: 342
> >     mdsuser-fs-5668c75f9f-jf59x(mds.1): Behind on trimming (461/128) max_segments: 128, num_segments: 461
> >     mdsuser-fs-5668c75f9f-h8p2t(mds.0): Behind on trimming (342/128) max_segments: 128, num_segments: 342
> >     mdsuser-fs-5668c75f9f-7gs67(mds.1): Behind on trimming (461/128) max_segments: 128, num_segments: 461
> >     mdsceph-fs-5b997cbf7b-5tjwh(mds.0): Behind on trimming (386/128) max_segments: 128, num_segments: 386
> >     mdsceph-fs-5b997cbf7b-hmrxr(mds.0): Behind on trimming (386/128) max_segments: 128, num_segments: 386
> > PG_AVAILABILITY Reduced data availability: 2 pgs stale
> >     pg 4.4f is stuck stale for 171783.855465, current state stale+active+undersized+degraded, last acting [24]
> >     pg 4.59 is stuck stale for 171751.961506, current state stale+active+undersized+degraded, last acting [13]
> > PG_DEGRADED Degraded data redundancy: 2593/186805106 objects degraded (0.001%), 2 pgs degraded, 2 pgs undersized
> >     pg 4.4f is stuck undersized for 171797.245359, current state stale+active+undersized+degraded, last acting [24]>     pg 4.59 is stuck undersized for 171797.257707, current state
> stale+active+undersized+degraded, last acting [13]
> 
> So where are osd.24 and osd.13?
> 
> To which pool do these PGs belong?
> 
> But these PGs are probably the root-cause of all the issues you are seeing.
> 

Both (well all of them) are containers issued by Rook, from what I see.

The problematic PGs belong to the pool user-fs-data0 here (from ceph osd lspools)

GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    23439G     12993G       10446G         44.57
POOLS:
    NAME                 ID     USED       %USED     MAX AVAIL     OBJECTS
    ceph-fs-metadata     1       1060M      0.02         4193G       293159
    ceph-fs-data0        2       4326G     50.78         4193G     92870669
    user-fs-metadata     3       1652M      0.04         4193G        60806
    user-fs-data0        4      88900M      2.01         4193G       179042
    replicapool          5      12980k         0         4193G           40

> > REQUEST_SLOW 3 slow requests are blocked > 32 sec. Implicated osds
> >     3 ops are blocked > 2097.15 sec
> > REQUEST_STUCK 717 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38
> >     286 ops are blocked > 268435 sec
> >     211 ops are blocked > 134218 sec
> >     5 ops are blocked > 67108.9 sec
> >     2 ops are blocked > 33554.4 sec
> >     134 ops are blocked > 16777.2 sec
> >     79 ops are blocked > 8388.61 sec
> >     osds 25,31,38 have stuck requests > 268435 sec
> > 
> > Cheers,
> > Sangwhan
> > 
> >>
> >> Wido
> >>
> >>>              1   active+clean+scrubbing+deep
> >>>
> >>>   io:
> >>>     client:   168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr
> >>>
> >>> The offending broken MDS entry (damaged metadata) seems to be this:
> >>>
> >>> mds.ceph-fs-5b997cbf7b-5tjwh: [
> >>>     {
> >>>         "damage_type": "dir_frag",
> >>>         "id": 1190692215,
> >>>         "ino": 2199023258131,
> >>>         "frag": "*",
> >>>         "path": "/f/01/59"
> >>>     }
> >>> ]
> >>>
> >>> Is there any idea how I can diagnose and find out what is wrong? For the other issues I'm not even sure what/where I need to look into.
> >>>
> >>> Cheers,
> >>> Sangwhan
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com