Re: Troubleshooting stuck unclean PGs?

Matt Larson <larsonmattr@xxxxxxxxx> · Mon, 21 Sep 2020 22:44:40 -0500

I tried this:

`sudo ceph tell 'osd.*' injectargs '--osd-max-backfills 4'`

Which has increased to having 10 simultaneous backfills and a higher
10X higher rate of data movements. It looks like I could increase this
further by increasing the number of simultaneous recovery operations,
but changing that parameter to 20 didn't cause a change. The command
warned that OSDs may need to be restarted before this takes effect:

sudo ceph tell 'osd.*' injectargs '--osd-recovery-max-active 20'

I'll let it run overnight with a higher backfill rate and see if that
is sufficient to let the cluster catch up.

The commands are from
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023844.html)

-Matt

On Mon, Sep 21, 2020 at 7:20 PM Matt Larson <larsonmattr@xxxxxxxxx> wrote:
>
> Hi Wout,
>
>  None of the OSDs are greater than 20% full. However, only 1 PG is
> backfilling at a time, while the others are backfill_wait. I had
> recently added a large amount of data to the Ceph cluster, and this
> may have caused the # of PGs to increase causing the need to rebalance
> or move objects.
>
>  It appears that I could increase the # of backfill operations that
> happen simultaneously by increasing `osd_max_backfills` and/or
> `osd_recovery_max_active`. It looks like I should maybe consider
> increasing the number of max backfills happening at a time because the
> overall io during the backfill is pretty small.
>
>  Does this seem reasonable? If so, with Ceph Octopus/cephadm, how can
> adjust the parameters?
>
>  Thanks,
>    Matt
>
> On Mon, Sep 21, 2020 at 2:21 PM Wout van Heeswijk <wout@xxxxxxxx> wrote:
> >
> > Hi Matt,
> >
> > The mon data can grow during when PGs are stuck unclean. Don't restart the mons.
> >
> > You need to find out why your placement groups are "backfill_wait". Likely some of your OSDs are (near)full.
> >
> > If you have space elsewhere you can use the ceph balancer module or reweighting of OSDs to rebalance data.
> >
> > Scrubbing will continue once the PGs are "active+clean"
> >
> > Kind regards,
> >
> > Wout
> > 42on
> >
> > ________________________________________
> > From: Matt Larson <larsonmattr@xxxxxxxxx>
> > Sent: Monday, September 21, 2020 6:22 PM
> > To: ceph-users@xxxxxxx
> > Subject:  Troubleshooting stuck unclean PGs?
> >
> > Hi,
> >
> >  Our Ceph cluster is reporting several PGs that have not been scrubbed
> > or deep scrubbed in time. It is over a week for these PGs to have been
> > scrubbed. When I checked the `ceph health detail`, there are 29 pgs
> > not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to
> > manually start a scrub on the PGs, but it appears that they are
> > actually in an unclean state that needs to be resolved first.
> >
> > This is a cluster running:
> >  ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)
> >
> >  Following the information at [Troubleshooting
> > PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/),
> > I checked for PGs that are stuck stale | inactive | unclean. There
> > were no PGs that are stale or inactive, but there are several that are
> > stuck unclean:
> >
> >  ```
> > PG_STAT  STATE                          UP
> >    UP_PRIMARY  ACTING                            ACTING_PRIMARY
> > 8.3c     active+remapped+backfill_wait
> > [124,41,108,8,87,16,79,157,49]         124
> > [139,57,16,125,154,65,109,86,45]             139
> > 8.3e     active+remapped+backfill_wait
> > [108,2,58,146,130,29,37,66,118]         108
> > [127,92,24,50,33,6,130,66,149]             127
> > 8.3f     active+remapped+backfill_wait
> > [19,34,86,132,59,78,153,99,6]          19
> > [90,45,147,4,105,61,30,66,125]              90
> > 8.40     active+remapped+backfill_wait
> > [19,131,80,76,42,101,61,3,144]          19
> > [28,106,132,3,151,36,65,60,83]              28
> > 8.3a       active+remapped+backfilling
> > [32,72,151,30,103,131,62,84,120]          32
> > [91,60,7,133,101,117,78,20,158]              91
> > 8.7e     active+remapped+backfill_wait
> > [108,2,58,146,130,29,37,66,118]         108
> > [127,92,24,50,33,6,130,66,149]             127
> > 8.3b     active+remapped+backfill_wait
> > [34,113,148,63,18,95,70,129,13]          34
> > [66,17,132,90,14,52,101,47,115]              66
> > 8.7f     active+remapped+backfill_wait
> > [19,34,86,132,59,78,153,99,6]          19
> > [90,45,147,4,105,61,30,66,125]              90
> > 8.78     active+remapped+backfill_wait
> > [96,113,159,63,29,133,73,8,89]          96
> > [138,121,15,103,55,41,146,69,18]             138
> > 8.7d       active+remapped+backfilling
> > [0,90,60,124,159,19,71,101,135]           0
> > [150,72,124,129,63,10,94,29,41]             150
> > 8.7c     active+remapped+backfill_wait
> > [124,41,108,8,87,16,79,157,49]         124
> > [139,57,16,125,154,65,109,86,45]             139
> > 8.79     active+remapped+backfill_wait
> > [59,15,41,82,131,20,73,156,113]          59
> > [13,51,120,102,29,149,42,79,132]              13
> > ```
> >
> > If I query one of the PGs that is backfilling, 8.3a, it shows it's state as :
> >     "recovery_state": [
> >         {
> >             "name": "Started/Primary/Active",
> >             "enter_time": "2020-09-19T20:45:44.027759+0000",
> >             "might_have_unfound": [],
> >             "recovery_progress": {
> >                 "backfill_targets": [
> >                     "30(3)",
> >                     "32(0)",
> >                     "62(6)",
> >                     "72(1)",
> >                     "84(7)",
> >                     "103(4)",
> >                     "120(8)",
> >                     "131(5)",
> >                     "151(2)"
> >                 ],
> >
> > Q1: Is there anything that I should check/fix to enable the PGs to
> > resolve from the `unclean` state?
> > Q2: I have also seen that the podman containers on one of our OSD
> > servers are taking large amounts of disk space. Is there a way to
> > limit the growth of disk space for podman containers, when
> > administering a Ceph cluster using `cephadm` tools? At last check, a
> > server running 16 OSDs and 1 MON is using 39G of disk space for its
> > running containers. Can restarting containers help to start with a
> > fresh slate or reduce the disk use?
> >
> > Thanks,
> >   Matt
> >
> > ------------------------
> >
> > Matt Larson
> > Associate Scientist
> > Computer Scientist/System Administrator
> > UW-Madison Cryo-EM Research Center
> > 433 Babcock Drive, Madison, WI 53706
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> --
> Matt Larson, PhD
> Madison, WI  53705 U.S.A.

-- 
Matt Larson, PhD
Madison, WI  53705 U.S.A.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx