I think I figured it out. The problem was that my ceph.conf file only listed the first machine in mon_initial_members and in mon_host. I'm not sure why. I added the other monitors, restarted the monitors and the managers, and everything is now working as expected. I have now upgraded all the monitors and all the managers to Pacific and Rocky 9. Now on to the OSDs. Well, maybe next week... On Thu, Oct 26, 2023 at 5:37 PM Tyler Stachecki <stachecki.tyler@xxxxxxxxx> wrote: > On Thu, Oct 26, 2023, 8:11 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx> wrote: > >> Oh, I meant that "ceph -s" just hangs. I didn't even try to look at the >> I/O. Maybe I can do that, but the "ceph -s" hang just freaked me out. >> >> Also, I know that the recommended order is mon->mgr->osd->mds->rgw, but >> when you run mgr on the same hardware as the monitors, it's hard to not >> upgrade both at the same time. Particularly if you're upgrading the whole >> machine at once. Here's where upgrading to the new container method will >> help a lot! FWIW, the managers seem to be running fine. >> > > I recently did something like this, so I understand that it's difficult. > Most of my testing and prep-work was centered around exactly this problem, > which was avoided by first upgrading mons/mgrs to an interim OS while > remaining on Octopus -- solely for the purposes of opening an avenue from > Octopus to Quincy separate from tbe OS upgrade. > > In my pre-prod resting, trying to upgrade the mons/mgrs without that > middle step that allowed mgrs to be upgraded separately did result in `ceph > -s` locking up. Client I/O remained non-impacted in this state though. > > Maybe look at which mgr is active and/or try stopping all but the Octopus > mgr when stopping the mon as well? > > Cheers, > Tyler > > >> On Thu, Oct 26, 2023 at 4:57 PM Tyler Stachecki < >> stachecki.tyler@xxxxxxxxx> wrote: >> >>> On Thu, Oct 26, 2023 at 6:52 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx> >>> wrote: >>> > >>> > Hi Tyler, >>> > >>> > Maybe you didn't read the full message, but in the message you will >>> notice that I'm doing exactly that, and the problem just occurred when I >>> was doing the upgrade from Octopus to Pacific. I'm nowhere near Quincy yet. >>> The original goal was to move from Nautilus to Quincy, but I have gone to >>> Octopus (no problems) and now to Pacific (problems). >>> >>> I did not, apologies -- though do see my second message about ordering >>> mon/mgr ordering... >>> >>> When you say "the cluster becomes unresponsive" -- does the client I/O >>> lock up, or do you mean that `ceph -s` and such hangs? >>> >>> May help to look to Pacific mons via the asok and see if they respond >>> in such a state (and their status) if I/O is not locked up and you can >>> afford to leave it in that state for a couple minutes: >>> $ ceph daemon mon.name mon_status >>> >>> Cheers, >>> Tyler >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx