Hi,
I've checked, checked, and checked again that the individual config
files all point towards the correct ip subnet for the monitors, and
I cannot find any trace of the old subnet's ip address in any config
file (that I can find).
what are those "individual config files"? The ones underneath
/var/lib/ceph/{FSID}/mgr.{MGR}/config? Did you also look in the config
store? I'd try something like:
ceph config dump | grep "192\.168\." (or whatever your IP range was)
ceph config get mgr public_network (just in case you accidentally used that)
ceph config get mon public_network (does it match your actual setup?)
Could it be possible that you're looking at the wrong MGRs you're
trying to start? Maybe from earlier failed attempts or something? Does
'cephadm ls --no-detail | grep mgr' on all hosts reveal more than the
MGRs you expect?
One possible and relatively quick manual workaround would be to set up
a MGR the legacy way [1] which basically is to add a keyring (that
should work if the MONs have a quorum) and start the daemon:
ceph-mgr -i $name
Note that you'll need the respective package ceph-mgr on that host.
You could then convert it with cephadm. But maybe it's not necessary
if you get the existing containers up.
[1] https://docs.ceph.com/en/nautilus/mgr/administrator/#manual-setup
Zitat von duluxoz <duluxoz@xxxxxxxxx>:
Hi All,
I don't know how its happened (bad backup/restore, bad config file
somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad
state, and I'm looking for pointers/help in getting it back running
(unfortunate, a complete rebuild/restore is *not* an option).
This is on Ceph Reef (on Rocky 9) which was converted to CephAdm
from a manual install a few weeks ago (which worked). Five days ago
everything when "t!ts-up" (an Ozzie technical ICT term meaning
nothing works :-) )
So, my (first?) issue is that I can't get any Managers to come up
clean. Each one tries to connect on an ip subnet which doesn't exist
any longer and hasn't for a couple of years.
The second issue is that (possible because of the first) every `ceph
orch` command just hangs. Cephadm commands work fine.
I've checked, checked, and checked again that the individual config
files all point towards the correct ip subnet for the monitors, and
I cannot find any trace of the old subnet's ip address in any config
file (that I can find).
For the record I am *not* a "podman guy" so there may be something
there that's causing my issue(s?) but I don't know where to even
begin to look.
Any/all logs simply start that the Manager(s) try to come up, can't
find an address in the "old" subnet, and so fail - nothing else
helpful (at least to me).
I've even pulled a copy of the monmap and its showing the correct ip
subnet addresses for the monitors.
The firewalls are all set correctly and an audit2allow shows nothing
is out of place, as does disabling SELinux (ie no change).
A `ceph -s` shows I've got no active managers (and that a monitor is
down - that's my third issue), plus a whole bunch of osds and pgs
aren't happy either. I have, though, got a monitor quorum.
So, what should I be looking at / where should I be looking? Any
help is greatly *greatly* appreciated.
Cheers
Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx