Re: "ceph orch" not working anymore

Eugen Block <eblock@xxxxxx> · Thu, 31 Oct 2024 08:14:48 +0000

Thanks for the update. Too bad you didn't find a way around it. I  
guess it would require a real deep dive into systems to understand  
what really happened there, which unfortunately can be a bit difficult  
via mails. And of course, there's a chance you might hit this issue  
again, which I hope won't happen.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Eugen,

thanks a lot. We got our down time today to work on the cluster.

However, nothing worked. Even with Ceph 19.

All ceph orch commands do not work.

Error ENOENT: No orchestrator configured (try `ceph orch set backend`)

This has nothing to do with osd_remove_queue.

Getting back the MON quorum with three MONs and also three MGRs with  
Squid did not help at all.

I still think this can be fixed somehow. Perhaps with editing the  
mon store somehow but I don't know where.

We decided to deploy a new cluster since backups are available.

Thanks again everybody.

Best,
Malte

On 18.10.24 16:37, Eugen Block wrote:
Hi Malte,

so I would only suggest to bring up a new MGR, issue a failover to  
that MGR and see if you get the orchestrator to work again.
It should suffice to change the container_image in the unit.run  
file (/ var/lib/ceph/{FSID}/mgr.{MGR}/unit.run):

CONTAINER_IMAGE={NEWER IMAGE}

So stop one MGR, change the container image, start it and make sure  
it takes over as the active MGR.

But I would like to know if I could replace the cephadm on one  
running node, stop the MGR and deploy a new MGR on that node with  
this:

https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually-  
deploying-a-manager-daemon

cephadm --image <container-image> deploy --fsid <fsid> --name  
mgr.hostname.smfvfd --config-json config-json.json

This approach probably works as well, but I haven't tried that yet.

And I still do not know what places cephadm... under /var/lib/ceph/fsid.

Does that happen when I enable the orchestrator in the MGR?

And can I replace that cephadm by hand?

The orchestrator would automatically download the respective  
cephadm image into that directory if you changed the  
container_image config value(s). But I wouldn't do that because you  
could break your cluster. If for some reason a MON, OSD or some  
other Ceph daemon would need to be redeployed, you would basically  
upgrade it. That's why I would suggest to only start one single MGR  
daemon with a newer version to see how it goes. In case you get the  
orchestrator to work again, I would "downgrade" it again and see  
what happens next.

Zitat von Eugen Block <eblock@xxxxxx>:

I’m on a mobile phone right now, I can’t go into much detail right now.
But I don’t think it’s necessary to rebuild an entire node, just a  
mgr. otherwise you risk cluster integrity if you redeploy a mon as  
well with a newer image. I’ll respond later in more detail.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Well, thank you, Eugen. That is what I planned to do.

Rebuild the broken node and start a MON and a MGR there with the  
latest images. Then I will stop the other MGRs and have a look if  
it's working.

But I would like to know if I could replace the cephadm on one  
running node, stop the MGR and deploy a new MGR on that node with  
this:

https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually-  
deploying-a-manager-daemon

cephadm --image <container-image> deploy --fsid <fsid> --name  
mgr.hostname.smfvfd --config-json config-json.json

And I still do not know what places cephadm... under /var/lib/ceph/fsid.

Does that happen when I enable the orchestrator in the MGR?

And can I replace that cephadm by hand?

Best,
Malte

On 18.10.24 12:11, Eugen Block wrote:
Okay, then I misinterpreted your former statement:

 I think there are entries of the OSDs from the broken node we removed.

So the stack trace in the log points to the osd_remove_queue,  
but I don't understand why it's empty. Is there still some OSD  
removal going on or something? Did you paste your current  
cluster status already? You could probably try starting a Squid  
mgr daemon by replacing the container image in the unit.run file  
and see how that goes.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Eugen,

thanks a lot. However:

ceph config-key get mgr/cephadm/osd_remove_queue

is empty!

Damn.

So should I get a new cephadm with the diff included?

Best,
Malte

On 17.10.24 23:48, Eugen Block wrote:
Save the current output to a file:

ceph config-key get mgr/cephadm/osd_remove_queue > remove_queue.json

Then remove the original_weight key from the json and set the  
modified key again with:
ceph config-key set …
Then fail the mgr.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Frederic, Hello Eugen,

yes, but I am not sure how to do it.

The links says:

the config-key responsible was mgr/cephadm/osd_remove_queue

This is what it looked like before.  After removing the  
original_weight field and setting the variable again, the  
cephadm module loads and orch works.

So now: Do I remove the value of mgr/cephadm/osd_remove_queue?

Or:

What is meant by:

"After removing the original_weight field and setting the  
variable again, the cephadm module loads and orch works."

I can enter a MGR's container and open the file:

/usr/share/ceph/mgr/cephadm/services/osd.py

But what is meant by "removing the original_weight field and  
setting the variable again" and what JSON do you mean, Eugen?

osd_obj = OSD.from_json(osd, rm_util=self.rm_util)

Code looks like this:

def load_from_store(self) -> None:
        with self.lock:
            for k, v in  
self.mgr.get_store_prefix('osd_remove_queue').items():
                for osd in json.loads(v):
                    logger.debug(f"Loading osd ->{osd} from store")
                    osd_obj = OSD.from_json(osd, rm_util=self.rm_util)
                    if osd_obj is not None:
                        self.osds.add(osd_obj)

I am a bit lost here.

Best,
Malte

On 17.10.24 21:50, Eugen Block wrote:
I appreciate your kind words. 😎🙂
Frederics link has the correct answer, remove the respective  
field from the json.

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

You're so cool, Eugen. Somehow you seem to find out everything.

Yes, this seems to be the issue and I suspected a bug there.

Looking here:

https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/  
services/osd.py

The diff is included in the code.

What can I do now? Get the latest cephadm and put it on the node?

What about the cephadm under /var/lib/ceph/fsid?

I am not sure how to continue.

I would download the latest cephadm and put it under /usr/sbin.

Then disable the module with

ceph mgr module disable cephadm

and enable it

ceph mgr module enable cephadm

Best,
Malte

On 17.10.24 19:20, Eugen Block wrote:
Oh why didn’t you mention earlier that you removed OSDs? 😄  
it sounds like this one:

https://tracker.ceph.com/issues/67329

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello Redouane,

thank you. Interesting.

ceph config-key dump

shows about 42000 lines.

What can I search for? Something with OSDs.

But there are thousands of entries.

And if I find something, how can I fix that?

I think there are entries of the OSDs from the broken  
node we removed.

Best,
Malte

On 17.10.24 17:46, Redouane Kachach wrote:
So basically it's failing here:

 self.to_remove_osds.load_from_store()

This function is responsible of loading Specs from the  
mon- store. The
information is stored in json format and it seems the
stored json for the OSD(s) is not valid for some reason.  
You can see what's
stored in the mon-store by running:

ceph config-key dump

Don't share the information publicly here especially if it's a
production cluster as it may have sensitive information  
about your cluster.

Best,
Redo.

On Thu, Oct 17, 2024 at 5:04 PM Malte Stroem  
<malte.stroem@xxxxxxxxx> wrote:

Thanks Eugen & Redouane,

of course I tried enabling and disabling the cephadm  
module for the MGRs.

Running ceph mgr module enable cephadm produces this  
output in the MGR log:

-1 mgr load Failed to construct class in 'cephadm'
  -1 mgr load Traceback (most recent call last):
   File "/usr/share/ceph/mgr/cephadm/module.py", line  
619, in __init__
     self.to_remove_osds.load_from_store()
   File "/usr/share/ceph/mgr/cephadm/services/osd.py",  
line 922, in
load_from_store
     for osd in json.loads(v):
   File "/lib64/python3.9/json/__init__.py", line 346, in loads
     return _default_decoder.decode(s)
   File "/lib64/python3.9/json/decoder.py", line 337, in decode
     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
   File "/lib64/python3.9/json/decoder.py", line 355,  
in raw_decode
     raise JSONDecodeError("Expecting value", s,  
err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1  
column 1 (char 0)

  -1 mgr operator() Failed to run module in active mode  
('cephadm')

This comes from inside the MGR container because it's  
Python3.9. On the
hosts it'S Python3.11.

I think of redeploying an MGR.

Can I stop the existing MGRs?

Redeploying with ceph orch does not work of course, but  
I think this
will work:

https://docs.ceph.com/en/latest/cephadm/troubleshooting/  
#manually- deploying-a-manager-daemon

because cephadm standalone is working. Crazy as it sounds.

What do you think?

Best,
Malte

On 17.10.24 12:49, Eugen Block wrote:
Hi,

if you just execute cephadm commands, those are issued  
locally on the
hosts, they won't confirm an orchestrator issue immediately.
What does the active MGR log? It could show a stack  
trace or error
messages which could point to a root cause.

What about the cephadm files under  
/var/lib/ceph/fsid? Can I replace
the latest?

Those are the cephadm versions the orchestrator  
actually uses, it will
just download them again from your registry (or upstream).
Can you share:

ceph -s
ceph versions
MGR logs (active MGR)

Thanks,
Eugen

Zitat von Malte Stroem <malte.stroem@xxxxxxxxx>:

Hello,

I am still struggling here and do not know the root  
cause of this issue.

Searching the list I found lots of people who had the  
same or a
similar problem the last years.

However there is no solution four our cluster.

Disabling and enabling the cephadm module does not  
work. There are no
error messages. When we run "ceph orch..." we get the  
error message:

Error ENOENT: No orchestrator configured (try `ceph  
orch set backend`)

But every single cephadm command works!

cephadm ls for example.

Stopping and restarting the MGRs did not help.  
Removing the .asok
files did not help.

I think of stopping both MGRs and trying to deploy a  
new MGR like this:

https://docs.ceph.com/en/latest/cephadm/troubleshooting/  
#manually-
deploying-a-manager-daemon

How could I find the root cause? Is the cephadm  
somehow broken?

What about the cephadm files under  
/var/lib/ceph/fsid? Can I replace
the latest?

Best,
Malte

On 16.10.24 14:54, Malte Stroem wrote:
Hi Laimis,

that did not work. Still ceph orch does not work.

Best,
Malte

On 16.10.24 14:12, Malte Stroem wrote:
Thank you, Laimis.

And you got the same error message? That's strange.

In the mean time I try to check for clients  
connected. No Kubernetes
and CephFS, but RGWs.

Best,
Malte

On 16.10.24 14:01, Laimis Juzeliūnas wrote:
Hi Malte,

We have faced this recently when upgrading to  
Squid from latest Reef.
As a temporary workaround we disabled the balancer  
with ‘ceph
balancer off’ and restarted mgr daemons.
We are suspecting older clients (from Kubernetes  
RBD mounts as well
as CephFS mounts) on servers with incompatible  
client versions but
are yet to dig through it.

Best,
Laimis J.

On 16 Oct 2024, at 14:57, Malte Stroem  
<malte.stroem@xxxxxxxxx>
wrote:

Error ENOENT: No orchestrator configured (try  
`ceph orch set
backend`)

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx