Re: Error ENOENT: Module not found

Devender Singh <devender@xxxxxxxxxx> · Sat, 25 Jan 2025 18:51:01 -0800

Hello Eugen 

I re-added my node but facing auth issue with the osds, on the host I can see some of the osd up and running but not showing in dashboard under osd. 

“inutes ago - daemon:osd.101
 auth get failed: failed to find osd.101 in keyring retval: -2”

# bash unit.run
--> Failed to activate via raw: did not find any matching OSD to activate
--> Running ceph config-key get dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks
Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.b2781be2-010d-485b-82de-65f869563eaf --keyring /var/lib/ceph/osd/ceph-101/lockbox.keyring config-key get dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks
 stderr: 2025-01-26T02:48:38.010+0000 7f3caaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
 stderr: 2025-01-26T02:48:38.010+0000 7f3caa7fc640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
 stderr: 2025-01-26T02:48:38.010+0000 7f3cab7fe640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> Failed to activate via LVM: Unable to retrieve dmcrypt secret
--> Failed to activate via simple: 'Namespace' object has no attribute 'json_config'
--> Failed to activate any OSD(s)
debug 2025-01-26T02:48:38.506+0000 7f1bccacc640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
debug 2025-01-26T02:48:38.506+0000 7f1bcd2cd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
debug 2025-01-26T02:48:41.506+0000 7f1bcd2cd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
failed to fetch mon config (--no-mon-config to skip)

But if failure domain is host level, if we remove a host and osds, should it not recover?

Regards
Dev

> On Jan 25, 2025, at 1:34 PM, Eugen Block <eblock@xxxxxx> wrote:
> 
> Hi,
> 
>>> But now issue is, my cluster showing objects misplaced, whereas I had 5 nodes with host failure domain with R3 pool (size 3 and min 2), EC with 3+2.
> 
> the math is pretty straight forward, with 5 chunks (k3, m2) you need (at least) 5 hosts. So you should add the host back to be able to recover. I would even suggest to add two more hosts so you can sustain the failure of one entire host.
> There are ways to recover in the current state (change failure domain to OSD via crush rule), but I really don't recommend that, I just want to add it for the sake of completeness. I strongly suggest to re-add the fifth host (and think about adding a sixth).
> 
> Regards,
> Eugen
> 
> Zitat von Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>:
> 
>> +Eugen
>> Lets follow “No recovery after removing node - active+undersized+degraded-- removed osd using purge…”.  Here.
>> 
>> Sorry I missed ceph version which is 18.2.4. (with 5 nodes, 22osd each, where I removed one node and all mess.)
>> 
>> Regards
>> Dev
>> 
>> 
>> 
>>> On Jan 25, 2025, at 11:34 AM, Devender Singh <devender@xxxxxxxxxx> wrote:
>>> 
>>> Hello Fredreic
>>> 
>>> 
>>> Thanks for your reply, Yes I also faced this issue after draining and removing of the node.
>>> So used the same command and remove “original_weight” using ceph config-key get mgr/cephadm/osd_remove_queue and injected file again. Which resolved the orch issue.
>>> 
>>> “Error ENOENT: Module not found - ceph orch commands stoppd working
>>> 
>>> ceph config-key get mgr/cephadm/osd_remove_queue > osd_remove_queue.json
>>> 
>>> Then only remove the "original_weight" key from that json and upload it back to the config-key store:
>>> 
>>> ceph config-key set mgr/cephadm/osd_remove_queue -i osd_remove_queue_modified.json
>>> 
>>> Then fail the mgr:
>>> 
>>> ceph mgr fail”
>>> 
>>> 
>>> But now issue is, my cluster showing objects misplaced, whereas I had 5 nodes with host failure domain with R3 pool (size 3 and min 2), EC with 3+2.
>>> 
>>> # ceph -s
>>>  cluster:
>>>    id:     384d7590-d018-11ee-b74c-5b2acfe0b35c
>>>    health: HEALTH_WARN
>>>            Degraded data redundancy: 2848547/29106793 objects degraded (9.787%), 105 pgs degraded, 132 pgs undersized
>>> 
>>>  services:
>>>    mon: 4 daemons, quorum node1,node5,node4,node2 (age 12h)
>>>    mgr: node1.cvknae(active, since 12h), standbys: node4.foomun
>>>    mds: 2/2 daemons up, 2 standby
>>>    osd: 95 osds: 95 up (since 16h), 95 in (since 21h); 124 remapped pgs
>>>    rgw: 2 daemons active (2 hosts, 1 zones)
>>> 
>>>  data:
>>>    volumes: 2/2 healthy
>>>    pools:   18 pools, 817 pgs
>>>    objects: 6.06M objects, 20 TiB
>>>    usage:   30 TiB used, 302 TiB / 332 TiB avail
>>>    pgs:     2848547/29106793 objects degraded (9.787%)
>>>             2617833/29106793 objects misplaced (8.994%)
>>>             561 active+clean
>>>             124 active+clean+remapped
>>>             105 active+undersized+degraded
>>>             27  active+undersized
>>> 
>>>  io:
>>>    client:   1.4 MiB/s rd, 4.0 MiB/s wr, 25 op/s rd, 545 op/s wr
>>> 
>>> And when using 'ceph config-key ls’ it’s showing old node and osd’s.
>>> 
>>> # ceph config-key ls|grep -i 03n
>>>    "config-history/135/+osd/host:node3/osd_memory_target",
>>>    "config-history/14990/+osd/host:node3/osd_memory_target",
>>>    "config-history/14990/-osd/host:node3/osd_memory_target",
>>>    "config-history/15003/+osd/host:node3/osd_memory_target",
>>>    "config-history/15003/-osd/host:node3/osd_memory_target",
>>>    "config-history/15016/+osd/host:node3/osd_memory_target",
>>>    "config-history/15016/-osd/host:node3/osd_memory_target",
>>>    "config-history/15017/+osd/host:node3/osd_memory_target",
>>>    "config-history/15017/-osd/host:node3/osd_memory_target",
>>>    "config-history/15022/+osd/host:node3/osd_memory_target",
>>>    "config-history/15022/-osd/host:node3/osd_memory_target",
>>>    "config-history/15024/+osd/host:node3/osd_memory_target",
>>>    "config-history/15024/-osd/host:node3/osd_memory_target",
>>>    "config-history/15025/+osd/host:node3/osd_memory_target",
>>>    "config-history/15025/-osd/host:node3/osd_memory_target",
>>>    "config-history/153/+osd/host:node3/osd_memory_target",
>>>    "config-history/153/-osd/host:node3/osd_memory_target",
>>>    "config-history/165/+mon.node3/container_image",
>>>    "config-history/171/-mon.node3/container_image",
>>>    "config-history/176/+client.crash.node3/container_image",
>>>    "config-history/182/-client.crash.node3/container_image",
>>>    "config-history/4276/+osd/host:node3/osd_memory_target",
>>>    "config-history/4276/-osd/host:node3/osd_memory_target",
>>>    "config-history/433/+client.ceph-exporter.node3/container_image",
>>>    "config-history/439/-client.ceph-exporter.node3/container_image",
>>>    "config-history/459/+osd/host:node3/osd_memory_target",
>>>    "config-history/459/-osd/host:node3/osd_memory_target",
>>>    "config-history/465/+osd/host:node3/osd_memory_target",
>>>    "config-history/465/-osd/host:node3/osd_memory_target",
>>>    "config-history/4867/+osd/host:node3/osd_memory_target",
>>>    "config-history/4867/-osd/host:node3/osd_memory_target",
>>>    "config-history/4878/+mon.node3/container_image",
>>>    "config-history/4884/-mon.node3/container_image",
>>>    "config-history/4889/+client.crash.node3/container_image",
>>>    "config-history/4895/-client.crash.node3/container_image",
>>>    "config-history/5139/+mds.k8s-dev-cephfs.node3.iebxqn/container_image",
>>>    "config-history/5142/-mds.k8s-dev-cephfs.node3.iebxqn/container_image",
>>>    "config-history/5150/+client.ceph-exporter.node3/container_image",
>>>    "config-history/5156/-client.ceph-exporter.node3/container_image",
>>>    "config-history/5179/+osd/host:node3/osd_memory_target",
>>>    "config-history/5179/-osd/host:node3/osd_memory_target",
>>>    "config-history/5183/+client.rgw.sea-dev.node3.betyqd/rgw_frontends",
>>>    "config-history/5189/+osd/host:node3/osd_memory_target",
>>>    "config-history/5189/-osd/host:node3/osd_memory_target",
>>>    "config-history/6929/-client.rgw.sea-dev.node3.betyqd/rgw_frontends",
>>>    "config-history/6933/+osd/host:node3/osd_memory_target",
>>>    "config-history/6933/-osd/host:node3/osd_memory_target",
>>>    "config-history/9710/+osd/host:node3/osd_memory_target",
>>>    "config-history/9710/-osd/host:node3/osd_memory_target",
>>>    "config/osd/host:node3/osd_memory_target”,
>>> 
>>> 
>>> Regards
>>> Dev
>>> 
>>>> On Jan 25, 2025, at 4:39 AM, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I've seen this happening on a test cluster after draining a host that also had a MGR service. Can you check if Eugen's solution here [1] helps in your case ? And maybe investigate 'ceph config-key ls' for any issues in config keys ?
>>>> 
>>>> Regards,
>>>> Frédéric.
>>>> 
>>>> [1] https://www.google.com/url?q=https://www.spinics.net/lists/ceph-users/msg83667.html&source=gmail-imap&ust=1738445690000000&usg=AOvVaw15NWLxIRc3boBpYf4URpvo <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://www.spinics.net/lists/ceph-users/msg83667.html%26source%3Dgmail-imap%26ust%3D1738413580000000%26usg%3DAOvVaw3Zk70LrQ6SrLX02gJ7Cowl&source=gmail-imap&ust=1738445690000000&usg=AOvVaw059BufjCpNI3NhOIPfBdFy>
>>>> 
>>>> De : Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
>>>> Envoyé : samedi 25 janvier 2025 06:27
>>>> À : Fnu Virender Kumar
>>>> Cc: ceph-users
>>>> Objet :  Re: Error ENOENT: Module not found
>>>> 
>>>> Thanks for you reply… but those command not working as its an always module..but strange still showing error,
>>>> 
>>>> # ceph  mgr module enable orchestrator
>>>> module 'orchestrator' is already enabled (always-on)
>>>> 
>>>> # ceph orch set backend  — returns successfully…
>>>> 
>>>> # # ceph orch ls
>>>> Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
>>>> 
>>>> Its revolving between same error..
>>>> 
>>>> Root Cause: I removed a hosts and its odd’s and after some time above error started automatically.
>>>> 
>>>> Earlier in the had 5  nodes  but now 4.. Cluster is showing  unclean pg but not doing anything..
>>>> 
>>>> But big error is Error ENOENT:
>>>> 
>>>> 
>>>> Regards
>>>> Dev
>>>> 
>>>> > On Jan 24, 2025, at 4:59 PM, Fnu Virender Kumar <virenderk@xxxxxxxxxxxx <mailto:virenderk@xxxxxxxxxxxx>> wrote:
>>>> >
>>>> > Did you try
>>>> >
>>>> > Ceph mgr module enable orchestrator
>>>> > Ceph orch set backend
>>>> > Ceph orch ls
>>>> >
>>>> > Check the mgr service daemon as well
>>>> > Ceph -s
>>>> >
>>>> >
>>>> > Regards
>>>> > Virender
>>>> > From: Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
>>>> > Sent: Friday, January 24, 2025 6:34:43 PM
>>>> > To: ceph-users <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
>>>> > Subject:  Error ENOENT: Module not found
>>>> >
>>>> >
>>>> > Hello all
>>>> >
>>>> > Any quick fix for …
>>>> >
>>>> > root@sea-devnode1:~# ceph orch ls
>>>> > Error ENOENT: Module not found
>>>> >
>>>> >
>>>> > Regards
>>>> > Dev
>>>> > _______________________________________________
>>>> > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx