This should only happen while upgrading. I can't remember the reason why but there's a fsck (for stat repair maybe?) happening on the first boot after upgrade. There should be a message in the OSD log about it. Alex On Mon, Feb 14, 2022 at 1:31 PM Trey Palmer <nerdmagicatl@xxxxxxxxx> wrote: > > Hi all, > > I'm trying to upgrade some clusters from luminous to nautilus 14.2.22 (I > know, I know!). > > It's taking about 16-18 minutes for each HDD OSD to connect into the > cluster after the upgrade, but it only takes a minute or two for the SSD > OSD's to connect. > > The cluster is dockerized using the standard ceph/daemon stable containers, > and I'm using a simple ansible playbook to start the OSD dockers. > > The cluster has 42 OSD nodes and each node has 12 x 14TB disks and 2 x > 3.8TB SSD's. Each SSD is partitioned into 6 block.db devices and one OSD, > and the SSD pool is used for RGW metadata and indexes. > > I have of course upgraded the 5 mon/mgr nodes beforehand. > > The nodes are Debian Stretch, which might be suboptimal but that's what my > shop uses. > > The cluster is still receiving writes, and with these disks down for 18 > minutes, we end up with so many degraded objects that I have to wait an > hour or two to do the next node. The primary RGW data pool is 3+2 EC so I > expect that recovery is a little slower than it would be in a replicated > pool. > > Under Luminous they were only taking a few minutes to connect. > > Any ideas what could be happening here? > > Thanks, > > Trey Palmer > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx