Re: One host down osd status error

Eugen Block <eblock@xxxxxx> · Fri, 21 Mar 2025 09:23:30 +0000

I know, unfortunately, this has been an issue for two or three years  
now. The first thing I (and many others) suggest if anything stopped  
working is to fail the mgr. My impression is that in the past years,  
more and more features were added to the mgr while the default configs  
haven't changed, causing it to silently fail or at least misbehave. I  
created a tracker [0] for one specific issue I saw on a customer  
cluster last year. My theory is that due to too low defaults, the mgr  
communication between MONs, OSDs, MGRs etc. gets flooded and some  
messages get lost. But I haven't found a way to reproduce it in test  
clusters yet, so it's still only a theory.

[0] https://tracker.ceph.com/issues/66310

Zitat von Marcus <marcus@xxxxxxxxxx>:

Hi,
Thanks for the tip Eugen!!

I stopped the active systemd mgr so the cluster failed over to  
another mgr. After this it all worked fine!
Started the systemd mgr again and it came up as a standby again.
Suppose the mgr got som hickup somehow, did not found any specific  
in the log.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx