Hello, I'm going down the long and winding road of upgrading our ceph clusters from mimic to the latest version. This has involved slowly going up one release at a time. I'm now going from octopus to pacific, which also involves upgrading the OS on the host systems from Centos 7 to Rocky 9. I first upgraded the monitors and managers, and those upgraded with no problems. Now I'm upgrading the OSD servers, and I ran into some issues that caused the first system to be down for a couple of days. I finally got it back up, and got all the OSDs ready to come back online, but whenever I try to bring the OSDs back up, they start running for a bit, and it looks like the cluster is recovering and catching up, but then the OSDs all go down again. The logs show some messages like: received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 osd.10 254568 *** Got signal Interrupt *** osd.10 254568 *** Immediate shutdown (osd_fast_shutdown=true) *** osd.10 254568 prepare_to_stop starting shutdown I found this thread: https://www.spinics.net/lists/ceph-users/msg75628.html which seems to be something similar, and they claim that the cluster needs to be restarted many times in order for the OSDs to catch up to the current epoch. I have restarted the OSDs many times, and now it's gotten to a spot where there doesn't seem to be any progress. My questions are: Is this the right solution? Is there a way of seeing if some progress is happening with the OSDs? Is there something else I should be trying? Thanks for any help! Jorge _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx