Re: Ceph Cluster Config File Locations?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've checked, checked, and checked again that the individual config files all point towards the correct ip subnet for the monitors, and I cannot find any trace of the old subnet's ip address in any config file (that I can find).

what are those "individual config files"? The ones underneath /var/lib/ceph/{FSID}/mgr.{MGR}/config? Did you also look in the config store? I'd try something like:

ceph config dump | grep "192\.168\."  (or whatever your IP range was)
ceph config get mgr public_network  (just in case you accidentally used that)
ceph config get mon public_network  (does it match your actual setup?)

Could it be possible that you're looking at the wrong MGRs you're trying to start? Maybe from earlier failed attempts or something? Does 'cephadm ls --no-detail | grep mgr' on all hosts reveal more than the MGRs you expect?

One possible and relatively quick manual workaround would be to set up a MGR the legacy way [1] which basically is to add a keyring (that should work if the MONs have a quorum) and start the daemon:

ceph-mgr -i $name

Note that you'll need the respective package ceph-mgr on that host. You could then convert it with cephadm. But maybe it's not necessary if you get the existing containers up.

[1] https://docs.ceph.com/en/nautilus/mgr/administrator/#manual-setup

Zitat von duluxoz <duluxoz@xxxxxxxxx>:

Hi All,

I don't know how its happened (bad backup/restore, bad config file somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad state, and I'm looking for pointers/help in getting it back running (unfortunate, a complete rebuild/restore is *not* an option).

This is on Ceph Reef (on Rocky 9) which was converted to CephAdm from a manual install a few weeks ago (which worked). Five days ago everything when "t!ts-up" (an Ozzie technical ICT term meaning nothing works :-)   )

So, my (first?) issue is that I can't get any Managers to come up clean. Each one tries to connect on an ip subnet which doesn't exist any longer and hasn't for a couple of years.

The second issue is that (possible because of the first) every `ceph orch` command just hangs. Cephadm commands work fine.

I've checked, checked, and checked again that the individual config files all point towards the correct ip subnet for the monitors, and I cannot find any trace of the old subnet's ip address in any config file (that I can find).

For the record I am *not* a "podman guy" so there may be something there that's causing my issue(s?) but I don't know where to even begin to look.

Any/all logs simply start that the Manager(s) try to come up, can't find an address in the "old" subnet, and so fail - nothing else helpful (at least to me).

I've even pulled a copy of the monmap and its showing the correct ip subnet addresses for the monitors.

The firewalls are all set correctly and an audit2allow shows nothing is out of place, as does disabling SELinux (ie no change).

A `ceph -s` shows I've got no active managers (and that a monitor is down - that's my third issue), plus a whole bunch of osds and pgs aren't happy either. I have, though, got a monitor quorum.

So, what should I be looking at / where should I be looking? Any help is greatly *greatly* appreciated.

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux