Hi Christian Thanks for the tips. We do have monitoring in place but we are currently on a peak and the occupancy increased tremendously in a couple of days time. I solved the problem of the stucked pgs by reweight (decreasing weights) of the new osds which were preventing the backfilling. Once those 4 pgs recovered i applied your suggestion of increasing weight os the less used osds. Cluster is much more balanced now and we will add more osds soon. . It is still a mystery to me why in my initial procedure which triggered the problem, heavy used osds were chosen for the remapping. Thanks for the help Goncalo ________________________________________ From: Christian Balzer [chibi@xxxxxxx] Sent: 20 July 2016 19:36 To: ceph-users@xxxxxxxx Cc: Goncalo Borges Subject: Re: pgs stuck unclean after reweight Hello, On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote: > Hi All... > > Today we had a warning regarding 8 near full osd. Looking to the osds > occupation, 3 of them were above 90%. One would hope that this would have been picked up earlier, as in before it even reaches near-full. Either by monitoring (nagios, etc) disk usage checks and/or graphing the usage and taking a look at it at least daily. Since you seem to have at least 60 OSDs going from below 85% to 90% must > In order to solve the situation, > I've decided to reweigh those first using > > ceph osd crush reweight osd.1 2.67719 > > ceph osd crush reweight osd.26 2.67719 > > ceph osd crush reweight osd.53 2.67719 > What I'd do is to find the least utilized OSDs and give them higher weights, so data will (hopefully) move there instead of potentially pushing another OSD to near-full as with the approach above. You might consider doing that aside from what I'm writing below. > Please note that I've started with a very conservative step since the > original weight for all osds was 2.72710. > > After some rebalancing (which has now stopped) I've seen that the > cluster is currently in the following state > > # ceph health detail > HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery > 20/39433323 objects degraded (0.000%); recovery 77898/39433323 > objects misplaced (0.198%); 8 near full osd(s); crush map has legacy > tunables (require bobtail, min is firefly) > So there are all your woes in one fell swoop. Unless you changed the defaults, your mon_osd_nearfull_ratio and osd_backfill_full_ratio are the same at 0.85. So any data movement towards those 8 near full OSDs will not go anywhere. Thus aside from the tip above, consider upping your osd_backfill_full_ratio for those OSDs to something like .92 for the time being until things are good again. Going forward, you will want to: a) add more OSDs b) re-weight things so that your OSDs are within a few % of each other than the often encountered 20%+ variance. Christian > pg 6.e2 is stuck unclean for 9578.920997, current state > active+remapped+backfill_toofull, last acting [49,38,11] > pg 6.4 is stuck unclean for 9562.054680, current state > active+remapped+backfill_toofull, last acting [53,6,26] > pg 5.24 is stuck unclean for 10292.469037, current state > active+remapped+backfill_toofull, last acting [32,13,51] > pg 5.306 is stuck unclean for 10292.448364, current state > active+remapped+backfill_toofull, last acting [44,7,59] > pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59] > pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51] > pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26] > pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11] > recovery 20/39433323 objects degraded (0.000%) > recovery 77898/39433323 objects misplaced (0.198%) > osd.1 is near full at 88% > osd.14 is near full at 87% > osd.24 is near full at 86% > osd.26 is near full at 87% > osd.37 is near full at 87% > osd.53 is near full at 88% > osd.56 is near full at 85% > osd.62 is near full at 87% > > crush map has legacy tunables (require bobtail, min is firefly); > see http://ceph.com/docs/master/rados/operations/crush-map/#tunables > > Not sure if it is worthwhile to mention, but after upgrading to Jewel, > our cluster shows the warnings regarding tunables. We still have not > migrated to the optimal tunables because the cluster will be very > actively used during the 3 next weeks ( due to one of the main > conference in our area) and we prefer to do that migration after this > peak period, > > > I am unsure what happen during the rebalacing but the mapping of these 4 > stuck pgs seems strange, namely the up and acting osds are different. > > # ceph pg dump_stuck unclean > ok > pg_stat state up up_primary acting acting_primary > 6.e2 active+remapped+backfill_toofull [8,53,38] 8 > [49,38,11] 49 > 6.4 active+remapped+backfill_toofull [53,24,6] 53 > [53,6,26] 53 > 5.24 active+remapped+backfill_toofull [32,13,56] 32 > [32,13,51] 32 > 5.306 active+remapped+backfill_toofull [44,60,26] 44 > [44,7,59] 44 > > # ceph pg map 6.e2 > osdmap e1054 pg 6.e2 (6.e2) -> up [8,53,38] acting [49,38,11] > > # ceph pg map 6.4 > osdmap e1054 pg 6.4 (6.4) -> up [53,24,6] acting [53,6,26] > > # ceph pg map 5.24 > osdmap e1054 pg 5.24 (5.24) -> up [32,13,56] acting [32,13,51] > > # ceph pg map 5.306 > osdmap e1054 pg 5.306 (5.306) -> up [44,60,26] acting [44,7,59] > > > To complete this information, I am also sending the output of pg query > for one of these problematic pgs (ceph pg 5.306 query) after this email. > > What should be the procedure to try to recover those PGS before > continuing with the reweighing? > > Than you in advance > Goncalo > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com