Re: pgs stuck unclean after reweight

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Sun, 24 Jul 2016 00:54:09 +0000

Hi Christian
Thanks for the tips.
We do have monitoring in place but we are currently on a peak and the occupancy increased tremendously in a couple of days time.

I solved the problem of the stucked pgs by reweight (decreasing weights) of the new osds which were preventing the backfilling. Once those 4 pgs recovered i applied your suggestion of increasing weight os the less used osds. Cluster is much more balanced now and we will add more osds soon. . It is still a mystery to me why in my initial procedure which triggered the problem, heavy used osds were chosen for the remapping. 

Thanks for the help
Goncalo

________________________________________
From: Christian Balzer [chibi@xxxxxxx]
Sent: 20 July 2016 19:36
To: ceph-users@xxxxxxxx
Cc: Goncalo Borges
Subject: Re:  pgs stuck unclean after reweight

Hello,

On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote:

> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%.

One would hope that this would have been picked up earlier, as in before
it even reaches near-full.
Either by monitoring (nagios, etc) disk usage checks and/or graphing the
usage and taking a look at it at least daily.

Since you seem to have at least 60 OSDs going from below 85% to 90% must

> In order to solve the situation,
> I've decided to reweigh those first using
>
>      ceph osd crush reweight osd.1 2.67719
>
>      ceph osd crush reweight osd.26 2.67719
>
>      ceph osd crush reweight osd.53 2.67719
>
What I'd do is to find the least utilized OSDs and give them higher
weights, so data will (hopefully) move there instead of potentially
pushing another OSD to near-full as with the approach above.

You might consider doing that aside from what I'm writing below.

> Please note that I've started with a very conservative step since the
> original weight for all osds was 2.72710.
>
> After some rebalancing (which has now stopped) I've seen that the
> cluster is currently in the following state
>
>     # ceph health detail
>     HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
>     20/39433323 objects degraded (0.000%); recovery 77898/39433323
>     objects misplaced (0.198%); 8 near full osd(s); crush map has legacy
>     tunables (require bobtail, min is firefly)
>
So there are all your woes in one fell swoop.

Unless you changed the defaults, your mon_osd_nearfull_ratio and
osd_backfill_full_ratio are the same at 0.85.
So any data movement towards those 8 near full OSDs will not go anywhere.

Thus aside from the tip above, consider upping your
osd_backfill_full_ratio for those OSDs to something like .92 for the time
being until things are good again.

Going forward, you will want to:
a) add more OSDs
b) re-weight things so that your OSDs are within a few % of each other
than the often encountered 20%+ variance.

Christian

>     pg 6.e2 is stuck unclean for 9578.920997, current state
>     active+remapped+backfill_toofull, last acting [49,38,11]
>     pg 6.4 is stuck unclean for 9562.054680, current state
>     active+remapped+backfill_toofull, last acting [53,6,26]
>     pg 5.24 is stuck unclean for 10292.469037, current state
>     active+remapped+backfill_toofull, last acting [32,13,51]
>     pg 5.306 is stuck unclean for 10292.448364, current state
>     active+remapped+backfill_toofull, last acting [44,7,59]
>     pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
>     pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
>     pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
>     pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
>     recovery 20/39433323 objects degraded (0.000%)
>     recovery 77898/39433323 objects misplaced (0.198%)
>     osd.1 is near full at 88%
>     osd.14 is near full at 87%
>     osd.24 is near full at 86%
>     osd.26 is near full at 87%
>     osd.37 is near full at 87%
>     osd.53 is near full at 88%
>     osd.56 is near full at 85%
>     osd.62 is near full at 87%
>
>         crush map has legacy tunables (require bobtail, min is firefly);
> see http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> Not sure if it is worthwhile to mention, but after upgrading to Jewel,
> our cluster shows the warnings regarding tunables. We still have not
> migrated to the optimal tunables because the cluster will be very
> actively used during the 3 next weeks ( due to one of the main
> conference in our area) and we prefer to do that migration after this
> peak period,
>
>
> I am unsure what happen during the rebalacing but the mapping of these 4
> stuck pgs seems strange, namely the up and acting osds are different.
>
>     # ceph pg dump_stuck unclean
>     ok
>     pg_stat    state    up    up_primary    acting    acting_primary
>     6.e2    active+remapped+backfill_toofull    [8,53,38]    8
>     [49,38,11]    49
>     6.4    active+remapped+backfill_toofull    [53,24,6]    53
>     [53,6,26]    53
>     5.24    active+remapped+backfill_toofull    [32,13,56]    32
>     [32,13,51]    32
>     5.306    active+remapped+backfill_toofull    [44,60,26]    44
>     [44,7,59]    44
>
>     # ceph pg map 6.e2
>     osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11]
>
>     # ceph pg map 6.4
>     osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26]
>
>     # ceph pg map 5.24
>     osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51]
>
>     # ceph pg map 5.306
>     osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59]
>
>
> To complete this information, I am also sending the output of pg query
> for one of these problematic pgs (ceph pg  5.306 query) after this email.
>
> What should be the procedure to try to recover those PGS before
> continuing with the reweighing?
>
> Than you in advance
> Goncalo
>

--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com