Re: Ceph Cluster Config File Locations?

Eugen Block <eblock@xxxxxx> · Wed, 06 Mar 2024 07:39:32 +0000

Hi,

I've checked, checked, and checked again that the individual config  
files all point towards the correct ip subnet for the monitors, and  
I cannot find any trace of the old subnet's ip address in any config  
file (that I can find).

what are those "individual config files"? The ones underneath  
/var/lib/ceph/{FSID}/mgr.{MGR}/config? Did you also look in the config  
store? I'd try something like:

ceph config dump | grep "192\.168\."  (or whatever your IP range was)
ceph config get mgr public_network  (just in case you accidentally used that)
ceph config get mon public_network  (does it match your actual setup?)

Could it be possible that you're looking at the wrong MGRs you're  
trying to start? Maybe from earlier failed attempts or something? Does  
'cephadm ls --no-detail | grep mgr' on all hosts reveal more than the  
MGRs you expect?

One possible and relatively quick manual workaround would be to set up  
a MGR the legacy way [1] which basically is to add a keyring (that  
should work if the MONs have a quorum) and start the daemon:

ceph-mgr -i $name

Note that you'll need the respective package ceph-mgr on that host.  
You could then convert it with cephadm. But maybe it's not necessary  
if you get the existing containers up.

[1] https://docs.ceph.com/en/nautilus/mgr/administrator/#manual-setup

Zitat von duluxoz <duluxoz@xxxxxxxxx>:

Hi All,

I don't know how its happened (bad backup/restore, bad config file  
somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad  
state, and I'm looking for pointers/help in getting it back running  
(unfortunate, a complete rebuild/restore is *not* an option).

This is on Ceph Reef (on Rocky 9) which was converted to CephAdm  
from a manual install a few weeks ago (which worked). Five days ago  
everything when "t!ts-up" (an Ozzie technical ICT term meaning  
nothing works :-)   )

So, my (first?) issue is that I can't get any Managers to come up  
clean. Each one tries to connect on an ip subnet which doesn't exist  
any longer and hasn't for a couple of years.

The second issue is that (possible because of the first) every `ceph  
orch` command just hangs. Cephadm commands work fine.

I've checked, checked, and checked again that the individual config  
files all point towards the correct ip subnet for the monitors, and  
I cannot find any trace of the old subnet's ip address in any config  
file (that I can find).

For the record I am *not* a "podman guy" so there may be something  
there that's causing my issue(s?) but I don't know where to even  
begin to look.

Any/all logs simply start that the Manager(s) try to come up, can't  
find an address in the "old" subnet, and so fail - nothing else  
helpful (at least to me).

I've even pulled a copy of the monmap and its showing the correct ip  
subnet addresses for the monitors.

The firewalls are all set correctly and an audit2allow shows nothing  
is out of place, as does disabling SELinux (ie no change).

A `ceph -s` shows I've got no active managers (and that a monitor is  
down - that's my third issue), plus a whole bunch of osds and pgs  
aren't happy either. I have, though, got a monitor quorum.

So, what should I be looking at / where should I be looking? Any  
help is greatly *greatly* appreciated.

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx