Re: Error ENOENT: Module not found

Eugen Block <eblock@xxxxxx> · Sun, 26 Jan 2025 10:51:26 +0000

I suggest to wipe the removed OSDs and let cephadm recreate them, it  
might be easier than trying to reintegrate them. You also seem to use  
encrypted OSDs, that might be trickier to manually add back in than  
"regular" OSDs.

But if failure domain is host level, if we remove a host and osds,  
should it not recover?

If you had enough hosts, it would recover. But with your EC profile  
you need 5 hosts and you removed one, so only your replicated pools  
(size 3) would be able to recover.

Zitat von Devender Singh <devender@xxxxxxxxxx>:

Hello Eugen

I re-added my node but facing auth issue with the osds, on the host  
I can see some of the osd up and running but not showing in  
dashboard under osd.

“inutes ago - daemon:osd.101
 auth get failed: failed to find osd.101 in keyring retval: -2”

# bash unit.run
--> Failed to activate via raw: did not find any matching OSD to activate
--> Running ceph config-key get  
dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks
Running command: /usr/bin/ceph --cluster ceph --name  
client.osd-lockbox.b2781be2-010d-485b-82de-65f869563eaf --keyring  
/var/lib/ceph/osd/ceph-101/lockbox.keyring config-key get  
dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks
 stderr: 2025-01-26T02:48:38.010+0000 7f3caaffd640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2,1]
 stderr: 2025-01-26T02:48:38.010+0000 7f3caa7fc640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2,1]
 stderr: 2025-01-26T02:48:38.010+0000 7f3cab7fe640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2,1]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> Failed to activate via LVM: Unable to retrieve dmcrypt secret
--> Failed to activate via simple: 'Namespace' object has no  
attribute 'json_config'
--> Failed to activate any OSD(s)
debug 2025-01-26T02:48:38.506+0000 7f1bccacc640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2]
debug 2025-01-26T02:48:38.506+0000 7f1bcd2cd640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2]
debug 2025-01-26T02:48:41.506+0000 7f1bcd2cd640 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [2]
failed to fetch mon config (--no-mon-config to skip)

But if failure domain is host level, if we remove a host and osds,  
should it not recover?

Regards
Dev

On Jan 25, 2025, at 1:34 PM, Eugen Block <eblock@xxxxxx> wrote:

Hi,

But now issue is, my cluster showing objects misplaced, whereas I  
had 5 nodes with host failure domain with R3 pool (size 3 and min  
2), EC with 3+2.

the math is pretty straight forward, with 5 chunks (k3, m2) you  
need (at least) 5 hosts. So you should add the host back to be able  
to recover. I would even suggest to add two more hosts so you can  
sustain the failure of one entire host.
There are ways to recover in the current state (change failure  
domain to OSD via crush rule), but I really don't recommend that, I  
just want to add it for the sake of completeness. I strongly  
suggest to re-add the fifth host (and think about adding a sixth).

Regards,
Eugen

Zitat von Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>:

+Eugen
Lets follow “No recovery after removing node -  
active+undersized+degraded-- removed osd using purge…”.  Here.

Sorry I missed ceph version which is 18.2.4. (with 5 nodes, 22osd  
each, where I removed one node and all mess.)

Regards
Dev

On Jan 25, 2025, at 11:34 AM, Devender Singh <devender@xxxxxxxxxx> wrote:

Hello Fredreic

Thanks for your reply, Yes I also faced this issue after draining  
and removing of the node.
So used the same command and remove “original_weight” using ceph  
config-key get mgr/cephadm/osd_remove_queue and injected file  
again. Which resolved the orch issue.

“Error ENOENT: Module not found - ceph orch commands stoppd working

ceph config-key get mgr/cephadm/osd_remove_queue > osd_remove_queue.json

Then only remove the "original_weight" key from that json and  
upload it back to the config-key store:

ceph config-key set mgr/cephadm/osd_remove_queue -i  
osd_remove_queue_modified.json

Then fail the mgr:

ceph mgr fail”

But now issue is, my cluster showing objects misplaced, whereas I  
had 5 nodes with host failure domain with R3 pool (size 3 and min  
2), EC with 3+2.

# ceph -s
 cluster:
   id:     384d7590-d018-11ee-b74c-5b2acfe0b35c
   health: HEALTH_WARN
           Degraded data redundancy: 2848547/29106793 objects  
degraded (9.787%), 105 pgs degraded, 132 pgs undersized

 services:
   mon: 4 daemons, quorum node1,node5,node4,node2 (age 12h)
   mgr: node1.cvknae(active, since 12h), standbys: node4.foomun
   mds: 2/2 daemons up, 2 standby
   osd: 95 osds: 95 up (since 16h), 95 in (since 21h); 124 remapped pgs
   rgw: 2 daemons active (2 hosts, 1 zones)

 data:
   volumes: 2/2 healthy
   pools:   18 pools, 817 pgs
   objects: 6.06M objects, 20 TiB
   usage:   30 TiB used, 302 TiB / 332 TiB avail
   pgs:     2848547/29106793 objects degraded (9.787%)
            2617833/29106793 objects misplaced (8.994%)
            561 active+clean
            124 active+clean+remapped
            105 active+undersized+degraded
            27  active+undersized

 io:
   client:   1.4 MiB/s rd, 4.0 MiB/s wr, 25 op/s rd, 545 op/s wr

And when using 'ceph config-key ls’ it’s showing old node and osd’s.

# ceph config-key ls|grep -i 03n
   "config-history/135/+osd/host:node3/osd_memory_target",
   "config-history/14990/+osd/host:node3/osd_memory_target",
   "config-history/14990/-osd/host:node3/osd_memory_target",
   "config-history/15003/+osd/host:node3/osd_memory_target",
   "config-history/15003/-osd/host:node3/osd_memory_target",
   "config-history/15016/+osd/host:node3/osd_memory_target",
   "config-history/15016/-osd/host:node3/osd_memory_target",
   "config-history/15017/+osd/host:node3/osd_memory_target",
   "config-history/15017/-osd/host:node3/osd_memory_target",
   "config-history/15022/+osd/host:node3/osd_memory_target",
   "config-history/15022/-osd/host:node3/osd_memory_target",
   "config-history/15024/+osd/host:node3/osd_memory_target",
   "config-history/15024/-osd/host:node3/osd_memory_target",
   "config-history/15025/+osd/host:node3/osd_memory_target",
   "config-history/15025/-osd/host:node3/osd_memory_target",
   "config-history/153/+osd/host:node3/osd_memory_target",
   "config-history/153/-osd/host:node3/osd_memory_target",
   "config-history/165/+mon.node3/container_image",
   "config-history/171/-mon.node3/container_image",
   "config-history/176/+client.crash.node3/container_image",
   "config-history/182/-client.crash.node3/container_image",
   "config-history/4276/+osd/host:node3/osd_memory_target",
   "config-history/4276/-osd/host:node3/osd_memory_target",
   "config-history/433/+client.ceph-exporter.node3/container_image",
   "config-history/439/-client.ceph-exporter.node3/container_image",
   "config-history/459/+osd/host:node3/osd_memory_target",
   "config-history/459/-osd/host:node3/osd_memory_target",
   "config-history/465/+osd/host:node3/osd_memory_target",
   "config-history/465/-osd/host:node3/osd_memory_target",
   "config-history/4867/+osd/host:node3/osd_memory_target",
   "config-history/4867/-osd/host:node3/osd_memory_target",
   "config-history/4878/+mon.node3/container_image",
   "config-history/4884/-mon.node3/container_image",
   "config-history/4889/+client.crash.node3/container_image",
   "config-history/4895/-client.crash.node3/container_image",
   "config-history/5139/+mds.k8s-dev-cephfs.node3.iebxqn/container_image",
   "config-history/5142/-mds.k8s-dev-cephfs.node3.iebxqn/container_image",
   "config-history/5150/+client.ceph-exporter.node3/container_image",
   "config-history/5156/-client.ceph-exporter.node3/container_image",
   "config-history/5179/+osd/host:node3/osd_memory_target",
   "config-history/5179/-osd/host:node3/osd_memory_target",
   "config-history/5183/+client.rgw.sea-dev.node3.betyqd/rgw_frontends",
   "config-history/5189/+osd/host:node3/osd_memory_target",
   "config-history/5189/-osd/host:node3/osd_memory_target",
   "config-history/6929/-client.rgw.sea-dev.node3.betyqd/rgw_frontends",
   "config-history/6933/+osd/host:node3/osd_memory_target",
   "config-history/6933/-osd/host:node3/osd_memory_target",
   "config-history/9710/+osd/host:node3/osd_memory_target",
   "config-history/9710/-osd/host:node3/osd_memory_target",
   "config/osd/host:node3/osd_memory_target”,

Regards
Dev

On Jan 25, 2025, at 4:39 AM, Frédéric Nass  
<frederic.nass@xxxxxxxxxxxxxxxx> wrote:

Hi,

I've seen this happening on a test cluster after draining a host  
that also had a MGR service. Can you check if Eugen's solution  
here [1] helps in your case ? And maybe investigate 'ceph  
config-key ls' for any issues in config keys ?

Regards,
Frédéric.

[1]  
https://www.google.com/url?q=https://www.spinics.net/lists/ceph-users/msg83667.html&source=gmail-imap&ust=1738445690000000&usg=AOvVaw15NWLxIRc3boBpYf4URpvo  
<https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://www.spinics.net/lists/ceph-users/msg83667.html%26source%3Dgmail-imap%26ust%3D1738413580000000%26usg%3DAOvVaw3Zk70LrQ6SrLX02gJ7Cowl&source=gmail-imap&ust=1738445690000000&usg=AOvVaw059BufjCpNI3NhOIPfBdFy>

De : Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
Envoyé : samedi 25 janvier 2025 06:27
À : Fnu Virender Kumar
Cc: ceph-users
Objet :  Re: Error ENOENT: Module not found

Thanks for you reply… but those command not working as its an  
always module..but strange still showing error,

# ceph  mgr module enable orchestrator
module 'orchestrator' is already enabled (always-on)

# ceph orch set backend  — returns successfully…

# # ceph orch ls
Error ENOENT: No orchestrator configured (try `ceph orch set backend`)

Its revolving between same error..

Root Cause: I removed a hosts and its odd’s and after some time  
above error started automatically.

Earlier in the had 5  nodes  but now 4.. Cluster is showing   
unclean pg but not doing anything..

But big error is Error ENOENT:

Regards
Dev

> On Jan 24, 2025, at 4:59 PM, Fnu Virender Kumar  
<virenderk@xxxxxxxxxxxx <mailto:virenderk@xxxxxxxxxxxx>> wrote:
>
> Did you try
>
> Ceph mgr module enable orchestrator
> Ceph orch set backend
> Ceph orch ls
>
> Check the mgr service daemon as well
> Ceph -s
>
>
> Regards
> Virender
> From: Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
> Sent: Friday, January 24, 2025 6:34:43 PM
> To: ceph-users <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
> Subject:  Error ENOENT: Module not found
>
>
> Hello all
>
> Any quick fix for …
>
> root@sea-devnode1:~# ceph orch ls
> Error ENOENT: Module not found
>
>
> Regards
> Dev
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx  
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx  
<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx  
<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx