Re: Problem with upgrade

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Thu, 26 Oct 2023 20:36:57 -0400

On Thu, Oct 26, 2023, 8:11 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx> wrote:

> Oh, I meant that "ceph -s" just hangs. I didn't even try to look at the
> I/O. Maybe I can do that, but the "ceph -s" hang just freaked me out.
>
> Also, I know that the recommended order is mon->mgr->osd->mds->rgw, but
> when you run mgr on the same hardware as the monitors, it's hard to not
> upgrade both at the same time. Particularly if you're upgrading the whole
> machine at once. Here's where upgrading to the new container method will
> help a lot! FWIW, the managers seem to be running fine.
>

I recently did something like this, so I understand that it's difficult.
Most of my testing and prep-work was centered around exactly this problem,
which was avoided by first upgrading mons/mgrs to an interim OS while
remaining on Octopus -- solely for the purposes of opening an avenue from
Octopus to Quincy separate from tbe OS upgrade.

In my pre-prod resting, trying to upgrade the mons/mgrs without that middle
step that allowed mgrs to be upgraded separately did result in `ceph -s`
locking up. Client I/O remained non-impacted in this state though.

Maybe look at which mgr is active and/or try stopping all but the Octopus
mgr when stopping the mon as well?

Cheers,
Tyler

> On Thu, Oct 26, 2023 at 4:57 PM Tyler Stachecki <stachecki.tyler@xxxxxxxxx>
> wrote:
>
>> On Thu, Oct 26, 2023 at 6:52 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx>
>> wrote:
>> >
>> > Hi Tyler,
>> >
>> > Maybe you didn't read the full message, but in the message you will
>> notice that I'm doing exactly that, and the problem just occurred when I
>> was doing the upgrade from Octopus to Pacific. I'm nowhere near Quincy yet.
>> The original goal was to move from Nautilus to Quincy, but I have gone to
>> Octopus (no problems) and now to Pacific (problems).
>>
>> I did not, apologies -- though do see my second message about ordering
>> mon/mgr ordering...
>>
>> When you say "the cluster becomes unresponsive" -- does the client I/O
>> lock up, or do you mean that `ceph -s` and such hangs?
>>
>> May help to look to Pacific mons via the asok and see if they respond
>> in such a state (and their status) if I/O is not locked up and you can
>> afford to leave it in that state for a couple minutes:
>> $ ceph daemon mon.name mon_status
>>
>> Cheers,
>> Tyler
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx