Re: Ceph 14.2 - some PGs stuck peering.

m.sliwinski@xxxxx · Wed, 04 Nov 2020 01:47:16 +0100

W dniu 2020-11-04 01:18, m.sliwinski@xxxxx napisał(a):

Just in case - result of ceph report is here: 
http://paste.ubuntu.com/p/D7yfr3pzr4/

Hi

We have a weird issue iwth our ceph cluster - almost all PGs assigned
to one specific pool became stuck, locking out all operations without
reporting any errors.
Story:
We have 3 different pools, hdd-backed, ssd-backed and nvme-backed.
Pool ssh worked fine for few months.
Today one of the hosts assigned to nvme pool restarted triggering
recovery in that pool. It wnet fast and cluster went to OK state.
During these events or shortly after them ssd pool became
unresponsive. It was impossible to either read or write from/to it.
We decided to slowly restart fist OSDs assigned to it, thenm as it
didn't help - all the mons, wihout breaking quorum of course.
At this moment both nvme and hdd polls are working fine, ssd one is
stuck in recovery.
All OSDs in that ssd pool use large amount of CPU and are exchanging
approx 1Mpps per OSD server between each other.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx