Ok, great glad you got your issue sorted. I’m still battling along with mine. From: Karun Josy [mailto:karunjosy1@xxxxxxxxx] Sent: 13 December 2017 12:22 To: nick@xxxxxxxxxx Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx> Subject: Re: Health Error : Request Stuck Hi Nick, Finally, was able to correct the issue! We found that there were many slow requests in ceph health detail. And found that some osds were slowing the cluster down. Initially the cluster was unusable when there were 10 PGs with "activating+remapped" status and slow requests. Slow requests were mainly on 2 osds. And we restarted osd daemons one by one, which cleared the block requests. And that made the cluster reusable. However, there were 4 PGs still in inactive state. So I took down one of the osd with slow requests for some time, and allowed the cluster to rebalance. To be honest, not exactly sure its the correct way. P.S : I had upgraded to Luminous 12.2.2 yesterday.
On Wed, Dec 13, 2017 at 4:31 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: Hi Karun, I too am experiencing something very similar with a PG stuck in activating+remapped state after re-introducing a OSD back into the cluster as Bluestore. Although this new OSD is not the one listed against the PG’s stuck activating. I also see the same thing as you where the up set is different to the acting set. Can I just ask what ceph version you are running and the output of ceph osd tree? Cluster is unusable because of inactive PGs. How can we correct it? ceph pg dump_stuck inactive PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 1.4b activating+remapped [5,2,0,13,1] 5 [5,2,13,1,4] 5 1.35 activating+remapped [2,7,0,1,12] 2 [2,7,1,12,9] 2 1.12 activating+remapped [1,3,5,0,7] 1 [1,3,5,7,2] 1 1.4e activating+remapped [1,3,0,9,2] 1 [1,3,0,9,5] 1 2.3b activating+remapped [13,1,0] 13 [13,1,2] 13 1.19 activating+remapped [2,13,8,9,0] 2 [2,13,8,9,1] 2 1.1e activating+remapped [2,3,1,10,0] 2 [2,3,1,10,5] 2 2.29 activating+remapped [1,0,13] 1 [1,8,11] 1 1.6f activating+remapped [8,2,0,4,13] 8 [8,2,4,13,1] 8 1.74 activating+remapped [7,13,2,0,4] 7 [7,13,2,4,1] 7
On Wed, Dec 13, 2017 at 8:27 AM, Karun Josy <karunjosy1@xxxxxxxxx> wrote: We added a new disk to the cluster and while rebalancing we are getting error warnings. Overall status: HEALTH_ERR REQUEST_SLOW: 1824 slow requests are blocked > 32 sec REQUEST_STUCK: 1022 stuck requests are blocked > 4096 sec The load in the servers seems to be very low.
|