Re: Slow Request on OSD

Wido den Hollander <wido@xxxxxxxx> · Thu, 1 Sep 2016 09:18:43 +0200 (CEST)

> Op 31 augustus 2016 om 23:21 schreef Reed Dier <reed.dier@xxxxxxxxxxx>:
> 
> 
> Multiple XFS corruptions, multiple leveldb issues. Looked to be result of write cache settings which have been adjusted now.
> 

That is bad news, really bad.

> You’ll see below that there are tons of PG’s in bad states, and it was slowly but surely bringing the number of bad PGs down, but it seems to have hit a brick wall with this one slow request operation.
> 

No, you have more issues. You can 17 PGs which are incomplete, a few down+incomplete.

Without those PGs functioning (active+X) your MDS will probably not work.

Take a look at: http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

Make sure you go to HEALTH_WARN at first, in HEALTH_ERR the MDS will never come online.

Wido

> > ceph -s
> > cluster []
> >      health HEALTH_ERR
> >             292 pgs are stuck inactive for more than 300 seconds
> >             142 pgs backfill_wait
> >             135 pgs degraded
> >             63 pgs down
> >             80 pgs incomplete
> >             199 pgs inconsistent
> >             2 pgs recovering
> >             5 pgs recovery_wait
> >             1 pgs repair
> >             132 pgs stale
> >             160 pgs stuck inactive
> >             132 pgs stuck stale
> >             71 pgs stuck unclean
> >             128 pgs undersized
> >             1 requests are blocked > 32 sec
> >             recovery 5301381/46255447 objects degraded (11.461%)
> >             recovery 6335505/46255447 objects misplaced (13.697%)
> >             recovery 131/20781800 unfound (0.001%)
> >             14943 scrub errors
> >             mds cluster is degraded
> >      monmap e1: 3 mons at {core=[]:6789/0,db=[]:6789/0,dev=[]:6789/0}
> >             election epoch 262, quorum 0,1,2 core,dev,db
> >       fsmap e3627: 1/1/1 up {0=core=up:replay}
> >      osdmap e3685: 8 osds: 8 up, 8 in; 153 remapped pgs
> >             flags sortbitwise
> >       pgmap v1807138: 744 pgs, 10 pools, 7668 GB data, 20294 kobjects
> >             8998 GB used, 50598 GB / 59596 GB avail
> >             5301381/46255447 objects degraded (11.461%)
> >             6335505/46255447 objects misplaced (13.697%)
> >             131/20781800 unfound (0.001%)
> >                  209 active+clean
> >                  170 active+clean+inconsistent
> >                  112 stale+active+clean
> >                   74 undersized+degraded+remapped+wait_backfill+peered
> >                   63 down+incomplete
> >                   48 active+undersized+degraded+remapped+wait_backfill
> >                   19 stale+active+clean+inconsistent
> >                   17 incomplete
> >                   12 active+remapped+wait_backfill
> >                    5 active+recovery_wait+degraded
> >                    4 undersized+degraded+remapped+inconsistent+wait_backfill+peered
> >                    4 active+remapped+inconsistent+wait_backfill
> >                    2 active+recovering+degraded
> >                    2 undersized+degraded+remapped+peered
> >                    1 stale+active+clean+scrubbing+deep+inconsistent+repair
> >                    1 active+clean+scrubbing+deep
> >                    1 active+clean+scrubbing+inconsistent
> 
> 
> Thanks,
> 
> Reed
> 
> > On Aug 31, 2016, at 4:08 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> > 
> >> 
> >> Op 31 augustus 2016 om 22:56 schreef Reed Dier <reed.dier@xxxxxxxxxxx <mailto:reed.dier@xxxxxxxxxxx>>:
> >> 
> >> 
> >> After a power failure left our jewel cluster crippled, I have hit a sticking point in attempted recovery.
> >> 
> >> Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can.
> >> 
> > 
> > That's probably to much. How do you mean lost? Is XFS crippled/corrupted? That shouldn't happen.
> > 
> >> In addition to rados pools, we were also using CephFS, and the cephfs.metadata and cephfs.data pools likely lost plenty of PG’s.
> >> 
> > 
> > What is the status of all PGs? What does 'ceph -s' show?
> > 
> > Are all PGs active? Since that's something which needs to be done first.
> > 
> >> The mds has reported this ever since returning from the power loss:
> >>> # ceph mds stat
> >>> e3627: 1/1/1 up {0=core=up:replay}
> >> 
> >> 
> >> When looking at the slow request on the osd, it shows this task which I can’t quite figure out. Any help appreciated.
> >> 
> > 
> > Are all clients (including MDS) and OSDs running the same version?
> > 
> > Wido
> > 
> >>> # ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok dump_ops_in_flight
> >>> {
> >>>    "ops": [
> >>>        {
> >>>            "description": "osd_op(mds.0.3625:8 6.c5265ab3 (undecoded) ack+retry+read+known_if_redirected+full_force e3668)",
> >>>            "initiated_at": "2016-08-31 10:37:18.833644",
> >>>            "age": 22212.235361,
> >>>            "duration": 22212.235379,
> >>>            "type_data": [
> >>>                "no flag points reached",
> >>>                [
> >>>                    {
> >>>                        "time": "2016-08-31 10:37:18.833644",
> >>>                        "event": "initiated"
> >>>                    }
> >>>                ]
> >>>            ]
> >>>        }
> >>>    ],
> >>>    "num_ops": 1
> >>> }
> >> 
> >> Thanks,
> >> 
> >> Reed
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com