Re: One host down osd status error

Tim Holloway <timh@xxxxxxxxxxxxx> · Thu, 20 Mar 2025 09:52:12 -0400

Based on my experience, that error comes from 1 of 3 possible causes:

1. The machine in question doesn't have proper security keys

2. The machine in question is short on resources - especially RAM

3. The machine in question has its brains scrambled. Cosmic rays 
flipping critical RAM bits, bugs in OS software, whatever. Rebooting is 
the only fix for that.

On 3/20/25 07:10, Marcus wrote:
Hi all,
We are running a ceph cluster with filesystem that contains 5 servers.
Ceph version: 19.2.0 squid

If I run: ceph osd status when all hosts are online and in the output
is the way it should and it prints status for all osds. If just a couple
of osds are down status is printed and specific osds are stated as down.

One of the servers went down and we ended up with a health warning.
If I run: ceph osd stat
I get the information that 64 out of 80 osd is in.

If I try to run: ceph osd status
I get an python error:
Error EINVAL: Traceback (most recent call last):
 File "/usr/share/ceph/mgr/mgr_module.py", line 1864, in _handle_command
   return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/share/ceph/mgr/mgr_module.py", line 499, in call
   return self.func(mgr, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/share/ceph/mgr/status/module.py", line 337, in 
handle_osd_status
   assert metadata
AssertionError

I suppose this is some type of bug when one host is down?

Thanks!
Marcus

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx