Such great detail in this post David.. This will come in very handy for people in the future On Thu, Jul 21, 2016 at 8:24 PM, David Turner <david.turner@xxxxxxxxxxxxxxxx> wrote: > The Mon store is important and since your cluster isn't healthy, they need > to hold onto it to make sure that when things come up that the mon can > replay everything for them. Once you fix the 2 down and peering PGs, The > mon store will fix itself in no time at all. Ceph is rightly refusing to > compact that database until your cluster is healthy. > > It seems like you have a couple things that might help your setup. First I > see something very easy to resolve, and that's the blocked requests. Try > running the following command: > > ceph osd down 71 > > That command will tell the cluster that osd.71 is down without restarting > the actual osd daemon. Osd.71 will come back and tell the mons it's > actually up, but in the mean time the operations blocking on osd.71 will go > to a secondary to get the response and clear up. > > Second, osd.53 looks to be causing the never ending peering. A couple > questions to check things here. What is your osd_max_backfills set to? > That is directly related to how fast osd.53 will fill back up. Something > you might do to speed that up is to just inject a higher setting for osd.53 > and not the rest of the cluster: > > ceph tell osd.53 injectargs '--osd_max_backfills=20' > > If this is the problem and the cluster is just waiting for osd.53 to finish > backfilling, then this will get you there faster. I'm unfamiliar with the > strategy you used to rebuild the data for osd.53. I would have removed the > osd from the cluster and added it back in with the same weight. That way > the osd would start right away and you would see the pgs backfilling onto > the osd as opposed to it sitting in a perpetual "booting" state. > > To remove the osd with minimal impact to the cluster, the following commands > should get you there. > > ceph osd tree | grep 'osd.53 ' > ceph osd set nobackfill > ceph osd set norecover > #on the host with osd.53, stop the daemon > ceph osd down 53 > ceph osd out 53 > ceph osd crush remove osd.53 > ceph auth rm osd.53 > ceph osd rm 53 > > At this point osd.53 is completely removed from the cluster and you have the > original weight of the osd to set it to when you bring the osd back in. The > down and peering PGs should now be resolved. Now, completely re-format and > add the osd back into the cluster. Make sure to do whatever you need for > dmcrypt, journals, etc that are specific to your environment. Once the osd > is back in the cluster, up and in, reweight the osd to what it was before > you removed it and unset norecover and nobackfill. > > ceph osd crush reweight osd.53 {{ weight_from_tree_command }} > ceph osd unset nobackfill > ceph osd unset norecover > > At this point everything is back to the way it was and the osd should start > receiving data. The only data movement should be refilling osd.53 with the > data it used to have and everything else should stay the same. Increasing > the backfills for this osd will help it fill up faster, but it will be > slower for client io if you do. The mon stores will remain "too big" until > after backfilling onto osd.53 finishes, but once the data stops moving > around and all of your osds are up and in, the mon stores will compact in no > time. > > I hope this helps. Ask questions if you have any, and never run a command > on your cluster that you don't understand. > > David Turner > ________________________________ > From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Salwasser, > Zac [zsalwass@xxxxxxxxxx] > Sent: Thursday, July 21, 2016 12:54 PM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Heller, Chris > Subject: Uncompactable Monitor Store at 69GB -- Re: Cluster in > warn state, not sure what to do next. > > Rephrasing for brevity – I have a monitor store that is 69GB and won’t > compact any further on restart or with ‘tell compact’. Has anyone dealt > with this before? > > > > > > > > From: "Salwasser, Zac" <zsalwass@xxxxxxxxxx> > Date: Thursday, July 21, 2016 at 1:18 PM > To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> > Cc: "Salwasser, Zac" <zsalwass@xxxxxxxxxx>, "Heller, Chris" > <cheller@xxxxxxxxxx> > Subject: Cluster in warn state, not sure what to do next. > > > > Hi, > > > > I have a cluster that has been in an unhealthy state for a month or so. We > realized the OSDs were flapping due to not having user access to enough file > handles, but it took us a while to realize this and we appear to have done a > lot of damage to the state of the monitor store in the meantime. > > > > I’ve been trying to tackle one issue at a time, starting with the size of > the monitor store. Compaction, either compact on restart or compact as a > ‘tell’ operation, does not shrink the size of the monitor store any more > than it presently is. Having no luck getting the monitor store to shrink, I > switched gears to troubleshooting down placement groups. There are two > remaining that I cannot fix, and they both claim to be blocked from peering > by the same osd (osd.53). > > > > Two days ago, I removed the osd data for osd.53 and restarted it after a > ‘mkfs’ operation. It has been in the “booting” state ever since, although > there is now 72GB of data in the osd data partition for osd.53, indicating > that some sort of partial “backfilling” has taken place. Watching the host > file system indicates that any data coming into that partition at this point > is only trickling in. > > > > Here is the output of “ceph health detail”. I’m wondering if anyone would > be willing to engage with me to at least get me unstuck. I am on #ceph as > salwasser. > > > > * * * > > HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive; 2 pgs stuck > unclean; 15 requests are blocked > 32 sec; 1 osds have slow requests; mds0: > Behind on trimming (367/30); mds-1: Behind on trimming (364/30); > mon.a65-121-158-160 store is getting too big! 74468 MB >= 15360 MB; > mon.a65-121-158-161 store is getting too big! 73881 MB >= 15360 MB; > mon.a65-121-158-195 store is getting too big! 64963 MB >= 15360 MB; > mon.a65-121-158-196 store is getting too big! 64023 MB >= 15360 MB; > mon.a65-121-158-197 store is getting too big! 63632 MB >= 15360 MB > > pg 4.285 is stuck inactive since forever, current state down+peering, last > acting [28,122,114] > > pg 1.716 is stuck inactive for 969017.268003, current state down+peering, > last acting [71,213,55] > > pg 4.285 is stuck unclean since forever, current state down+peering, last > acting [28,122,114] > > pg 1.716 is stuck unclean for 969351.417382, current state down+peering, > last acting [71,213,55] > > pg 1.716 is down+peering, acting [71,213,55] > > pg 4.285 is down+peering, acting [28,122,114] > > 5 ops are blocked > 4194.3 sec > > 10 ops are blocked > 2097.15 sec > > 5 ops are blocked > 4194.3 sec on osd.71 > > 10 ops are blocked > 2097.15 sec on osd.71 > > 1 osds have slow requests > > mds0: Behind on trimming (367/30)(max_segments: , num_segments: o) > > mds-1: Behind on trimming (364/30)(max_segments: , num_segments: l) > > mon.a65-121-158-160 store is getting too big! 74468 MB >= 15360 MB -- 53% > avail > > mon.a65-121-158-161 store is getting too big! 73881 MB >= 15360 MB -- 73% > avail > > mon.a65-121-158-195 store is getting too big! 64963 MB >= 15360 MB -- 81% > avail > > mon.a65-121-158-196 store is getting too big! 64023 MB >= 15360 MB -- 81% > avail > > mon.a65-121-158-197 store is getting too big! 63632 MB >= 15360 MB -- 81% > avail > > > > > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com