Hi all, I'm trying to upgrade some clusters from luminous to nautilus 14.2.22 (I know, I know!). It's taking about 16-18 minutes for each HDD OSD to connect into the cluster after the upgrade, but it only takes a minute or two for the SSD OSD's to connect. The cluster is dockerized using the standard ceph/daemon stable containers, and I'm using a simple ansible playbook to start the OSD dockers. The cluster has 42 OSD nodes and each node has 12 x 14TB disks and 2 x 3.8TB SSD's. Each SSD is partitioned into 6 block.db devices and one OSD, and the SSD pool is used for RGW metadata and indexes. I have of course upgraded the 5 mon/mgr nodes beforehand. The nodes are Debian Stretch, which might be suboptimal but that's what my shop uses. The cluster is still receiving writes, and with these disks down for 18 minutes, we end up with so many degraded objects that I have to wait an hour or two to do the next node. The primary RGW data pool is 3+2 EC so I expect that recovery is a little slower than it would be in a replicated pool. Under Luminous they were only taking a few minutes to connect. Any ideas what could be happening here? Thanks, Trey Palmer _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx