Nautilus: PGs stuck "activating" after adding OSDs. Please help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hallo,
I am on Nautilus and today, after upgrading the operating system (from CentOS 7 to CentOS 8 Stream) on a couple OSD servers and adding them back to the cluster, I noticed some PGs are still "activating". The upgraded server are from the same "rack", and I have replica-3 pools with 1-per-rack rule, and 6+4 EC pools (in some cases, with SSD pool for metadata).

More details:
- on the two OSD servers I upgrade, I ran "systemctl stop ceph.target"
   and waited a while, to verify all PGs would remain "active"
- went on with the upgrade and ceph-ansible reconfig
- as soon as I started adding OSDs I saw "slow ops"
- to exclude possible effect of updated packages, I ran "yum update" on
   all OSD servers, and rebooted them one by one
- after 2-3 hours, the last OSD disks finally came up
- I am left with:
	about 1k "slow ops" (if I pause recovery, number ~stable but max
		age increasing)
	~200 inactive PGs

   Most of the inactive PGs are from the object store pool:

[cephmgr@cephAdmCT1.cephAdmCT1 ~]$ ceph osd pool get default.rgw.buckets.data crush_rule
crush_rule: default.rgw.buckets.data

rule default.rgw.buckets.data {
         id 6
         type erasure
         min_size 3
         max_size 10
         step set_chooseleaf_tries 5
         step set_choose_tries 100
         step take default class big
         step chooseleaf indep 0 type host
         step emit
}

But "ceph pg dump_stuck inactive" also shows 4 lines for the glance replicated pool, like:

82.34 activating+remapped [139,50,207] 139 [139,50,284] 139 82.54 activating+undersized+degraded+remapped [139,86,5] 139 [139,74] 139


Need your help please:

- any idea what was the root cause for all this?

- and now, how can I help OSDs complete their activation?
   + does the procedure differ for EC or replicated pools, by the way?
   + or may be I should first get rid of the "slow ops" issue?

I am pasting:
ceph osd df tree
   https://pastebin.ubuntu.com/p/VWhT7FWf6m/

ceph osd lspools ; ceph pg dump_stuck inactive
   https://pastebin.ubuntu.com/p/9f6rXRYMh4/

   Thanks a lot!

			Fulvio

--
Fulvio Galeazzi
GARR-CSD Department
tel.: +39-334-6533-250
skype: fgaleazzi70
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux