Re: A question about HEALTH_WARN and monitors holding onto cluster maps

Thomas Byrne - UKRI STFC <tom.byrne@xxxxxxxxxx> · Mon, 21 May 2018 09:25:43 +0000

mon_compact_on_start was not changed from default (false). From the logs, it looks like the monitor with the excessive resource usage
 (mon1) was up and winning the majority of elections throughout the period of unresponsiveness, with other monitors occasionally winning an election without mon1 participating (I’m guessing as it failed to respond).

That’s interesting about the false map updates. We had a short networking blip (caused by me) on some monitors shortly before the trouble
 started, which caused some monitors to start calling frequent (every few seconds) elections. Could this rapid creation of new monmaps have the same effect as updating pool settings? Thus causing the monitor to try and clean up in one go, causing the observed
 resource usage and unresponsiveness.

I’ve been bringing in the storage as you described, I’m in the process of adding 6PB of new storage to a ~10PB (raw) cluster (with
 ~8PB raw utilisation), so I’m feeling around for the largest backfills we can safely do. I had been weighting up storage in steps that take ~5 days to finish, but have been starting the next reweight as we get to the tail end of the previous, so not giving
 the mons time to compact their stores. Although it’s far from ideal (from a total time to get new storage weighted up), I’ll be letting the mons compact between every backfill until I have a better idea of what went on last week.

From: David Turner <drakonstein@xxxxxxxxx>

Sent: 17 May 2018 18:57

To: Byrne, Thomas (STFC,RAL,SC) <tom.byrne@xxxxxxxxxx>

Cc: Wido den Hollander <wido@xxxxxxxx>; ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

Generally they clean up slowly by deleting 30 maps every time the maps update.  You can speed that up by creating false map updates with something like updating a pool setting to what it already is.  What it sounds like happened to you
 is that your mon crashed and restarted.  If it crashed and has the setting to compact the mon store on start, then it would cause it to forcibly go through and clean everything up in 1 go.

I generally plan my backfilling to not take longer than a week.  Any longer than that is pretty rough on the mons.  You can achieve that by bringing in new storage with a weight of 0.0 and increase it appropriately as opposed to just adding
 it with it's full weight and having everything move at once.

On Thu, May 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC <tom.byrne@xxxxxxxxxx> wrote:

That seems like a sane way to do it, thanks for the clarification Wido.

As a follow-up, do you have any feeling as to whether the trimming a particularly intensive task? We just had a fun afternoon where the monitors became unresponsive (no ceph status etc) for several hours, seemingly due to the leaders monitor process consuming
 all available ram+swap (64GB+32GB) on that monitor. This was then followed by the actual trimming of the stores (26GB->11GB), which took a few minutes and happened simultaneously across the monitors.

If this is something to be expected, it'll be a good reason to plan our long backfills much more carefully in the future!

> -----Original Message-----

> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> On Behalf Of Wido

> den Hollander

> Sent: 17 May 2018 15:40

> To: ceph-users@xxxxxxxxxxxxxx

> Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors

> holding onto cluster maps

> 

> 

> 

> On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:

> > Hi all,

> >

> >

> >

> > As far as I understand, the monitor stores will grow while not

> > HEALTH_OK as they hold onto all cluster maps. Is this true for all

> > HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN

> due to

> > a few weeks of backfilling onto new hardware pushing the monitors data

> > stores over the default 15GB threshold. Are they now prevented from

> > shrinking till I increase the threshold above their current size?

> >

> 

> No, monitors will trim their data store with all PGs are active+clean, not when

> they are HEALTH_OK.

> 

> So a 'noout' flag triggers a WARN, but that doesn't prevent the MONs from

> trimming for example.

> 

> Wido

> 

> >

> >

> > Cheers

> >

> > Tom

> >

> >

> >

> >

> >

> >

> >

> > _______________________________________________

> > ceph-users mailing list

> > ceph-users@xxxxxxxxxxxxxx

> > 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com