Re: Health error: 1 MDSs report slow metadata IOs, 1 MDSs report slow requests

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Tue, 24 Sep 2019 08:11:13 -0700

On Tue, Sep 24, 2019 at 4:56 AM Thomas Schneider <74cmonty@xxxxxxxxx> wrote:
>
> Can you please advise how to fix this (manually)?
> My cluster is not getting healthy since 14 days now.

> >>              Reduced data availability: 33 pgs inactive, 32 pgs peering
> >>              Degraded data redundancy: 123285/153918525 objects degraded
> >> (0.080%), 27 pgs degraded, 27 pgs undersized

> >>      osd: 368 osds: 368 up, 368 in; 140 remapped pgs

You have a number of PGs that are undersized and inactive and peering.
This seems to indicate that you have some down OSDs, but it reports
that all are there and up which is odd.

I'd run `ceph health detail | grep peering` and find the list of OSDs
peering the PGs. You may find one OSD that is common between them all,
that is the one that I'd target and restart that OSD process. That my
unblock things and allow things to start recovering.

Then you need to fix the too_full messages by reweighting some of the
full OSDs, deleting unneeded data or adding more OSDs.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx