Re: Problem with upgrade

Eugen Block <eblock@xxxxxx> · Fri, 27 Oct 2023 06:32:07 +0000

What do the two Pacific monitors log when you stop the Octopus mon?  
And when all of them are up, which one is the leader?

ceph mon stat --format json | jq

Zitat von Tyler Stachecki <stachecki.tyler@xxxxxxxxx>:

On Thu, Oct 26, 2023, 8:11 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx> wrote:

Oh, I meant that "ceph -s" just hangs. I didn't even try to look at the
I/O. Maybe I can do that, but the "ceph -s" hang just freaked me out.

Also, I know that the recommended order is mon->mgr->osd->mds->rgw, but
when you run mgr on the same hardware as the monitors, it's hard to not
upgrade both at the same time. Particularly if you're upgrading the whole
machine at once. Here's where upgrading to the new container method will
help a lot! FWIW, the managers seem to be running fine.

I recently did something like this, so I understand that it's difficult.
Most of my testing and prep-work was centered around exactly this problem,
which was avoided by first upgrading mons/mgrs to an interim OS while
remaining on Octopus -- solely for the purposes of opening an avenue from
Octopus to Quincy separate from tbe OS upgrade.

In my pre-prod resting, trying to upgrade the mons/mgrs without that middle
step that allowed mgrs to be upgraded separately did result in `ceph -s`
locking up. Client I/O remained non-impacted in this state though.

Maybe look at which mgr is active and/or try stopping all but the Octopus
mgr when stopping the mon as well?

Cheers,
Tyler

On Thu, Oct 26, 2023 at 4:57 PM Tyler Stachecki <stachecki.tyler@xxxxxxxxx>
wrote:

On Thu, Oct 26, 2023 at 6:52 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx>
wrote:
>
> Hi Tyler,
>
> Maybe you didn't read the full message, but in the message you will
notice that I'm doing exactly that, and the problem just occurred when I
was doing the upgrade from Octopus to Pacific. I'm nowhere near Quincy yet.
The original goal was to move from Nautilus to Quincy, but I have gone to
Octopus (no problems) and now to Pacific (problems).

I did not, apologies -- though do see my second message about ordering
mon/mgr ordering...

When you say "the cluster becomes unresponsive" -- does the client I/O
lock up, or do you mean that `ceph -s` and such hangs?

May help to look to Pacific mons via the asok and see if they respond
in such a state (and their status) if I/O is not locked up and you can
afford to leave it in that state for a couple minutes:
$ ceph daemon mon.name mon_status

Cheers,
Tyler

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx