Re: Problem with upgrade

Eugen Block <eblock@xxxxxx> · Sat, 28 Oct 2023 08:52:44 +0000

Ah yes, this is a real classic  ;-) I assume that after bootstrapping  
the first node no update to the ceph.conf was done. Anyway, good luck  
with the rest of the upgrade!

Zitat von Jorge Garcia <jgarcia@xxxxxxxxxxxx>:

I think I figured it out. The problem was that my ceph.conf file only
listed the first machine in mon_initial_members and in mon_host. I'm not
sure why. I added the other monitors, restarted the monitors and the
managers, and everything is now working as expected. I have now upgraded
all the monitors and all the managers to Pacific and Rocky 9. Now on to the
OSDs. Well, maybe next week...

On Thu, Oct 26, 2023 at 5:37 PM Tyler Stachecki <stachecki.tyler@xxxxxxxxx>
wrote:

On Thu, Oct 26, 2023, 8:11 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx> wrote:

Oh, I meant that "ceph -s" just hangs. I didn't even try to look at the
I/O. Maybe I can do that, but the "ceph -s" hang just freaked me out.

Also, I know that the recommended order is mon->mgr->osd->mds->rgw, but
when you run mgr on the same hardware as the monitors, it's hard to not
upgrade both at the same time. Particularly if you're upgrading the whole
machine at once. Here's where upgrading to the new container method will
help a lot! FWIW, the managers seem to be running fine.

I recently did something like this, so I understand that it's difficult.
Most of my testing and prep-work was centered around exactly this problem,
which was avoided by first upgrading mons/mgrs to an interim OS while
remaining on Octopus -- solely for the purposes of opening an avenue from
Octopus to Quincy separate from tbe OS upgrade.

In my pre-prod resting, trying to upgrade the mons/mgrs without that
middle step that allowed mgrs to be upgraded separately did result in `ceph
-s` locking up. Client I/O remained non-impacted in this state though.

Maybe look at which mgr is active and/or try stopping all but the Octopus
mgr when stopping the mon as well?

Cheers,
Tyler

On Thu, Oct 26, 2023 at 4:57 PM Tyler Stachecki <
stachecki.tyler@xxxxxxxxx> wrote:

On Thu, Oct 26, 2023 at 6:52 PM Jorge Garcia <jgarcia@xxxxxxxxxxxx>
wrote:
>
> Hi Tyler,
>
> Maybe you didn't read the full message, but in the message you will
notice that I'm doing exactly that, and the problem just occurred when I
was doing the upgrade from Octopus to Pacific. I'm nowhere near  
Quincy yet.
The original goal was to move from Nautilus to Quincy, but I have gone to
Octopus (no problems) and now to Pacific (problems).

I did not, apologies -- though do see my second message about ordering
mon/mgr ordering...

When you say "the cluster becomes unresponsive" -- does the client I/O
lock up, or do you mean that `ceph -s` and such hangs?

May help to look to Pacific mons via the asok and see if they respond
in such a state (and their status) if I/O is not locked up and you can
afford to leave it in that state for a couple minutes:
$ ceph daemon mon.name mon_status

Cheers,
Tyler

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx