Re: Error ENOENT: Module not found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I suggest to wipe the removed OSDs and let cephadm recreate them, it might be easier than trying to reintegrate them. You also seem to use encrypted OSDs, that might be trickier to manually add back in than "regular" OSDs.

But if failure domain is host level, if we remove a host and osds, should it not recover?

If you had enough hosts, it would recover. But with your EC profile you need 5 hosts and you removed one, so only your replicated pools (size 3) would be able to recover.

Zitat von Devender Singh <devender@xxxxxxxxxx>:

Hello Eugen

I re-added my node but facing auth issue with the osds, on the host I can see some of the osd up and running but not showing in dashboard under osd.

“inutes ago - daemon:osd.101
 auth get failed: failed to find osd.101 in keyring retval: -2”

# bash unit.run
--> Failed to activate via raw: did not find any matching OSD to activate
--> Running ceph config-key get dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.b2781be2-010d-485b-82de-65f869563eaf --keyring /var/lib/ceph/osd/ceph-101/lockbox.keyring config-key get dm-crypt/osd/b2781be2-010d-485b-82de-65f869563eaf/luks stderr: 2025-01-26T02:48:38.010+0000 7f3caaffd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] stderr: 2025-01-26T02:48:38.010+0000 7f3caa7fc640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] stderr: 2025-01-26T02:48:38.010+0000 7f3cab7fe640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
 stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> Failed to activate via LVM: Unable to retrieve dmcrypt secret
--> Failed to activate via simple: 'Namespace' object has no attribute 'json_config'
--> Failed to activate any OSD(s)
debug 2025-01-26T02:48:38.506+0000 7f1bccacc640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] debug 2025-01-26T02:48:38.506+0000 7f1bcd2cd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] debug 2025-01-26T02:48:41.506+0000 7f1bcd2cd640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
failed to fetch mon config (--no-mon-config to skip)


But if failure domain is host level, if we remove a host and osds, should it not recover?

Regards
Dev


On Jan 25, 2025, at 1:34 PM, Eugen Block <eblock@xxxxxx> wrote:

Hi,

But now issue is, my cluster showing objects misplaced, whereas I had 5 nodes with host failure domain with R3 pool (size 3 and min 2), EC with 3+2.

the math is pretty straight forward, with 5 chunks (k3, m2) you need (at least) 5 hosts. So you should add the host back to be able to recover. I would even suggest to add two more hosts so you can sustain the failure of one entire host. There are ways to recover in the current state (change failure domain to OSD via crush rule), but I really don't recommend that, I just want to add it for the sake of completeness. I strongly suggest to re-add the fifth host (and think about adding a sixth).

Regards,
Eugen

Zitat von Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>:

+Eugen
Lets follow “No recovery after removing node - active+undersized+degraded-- removed osd using purge…”. Here.

Sorry I missed ceph version which is 18.2.4. (with 5 nodes, 22osd each, where I removed one node and all mess.)

Regards
Dev



On Jan 25, 2025, at 11:34 AM, Devender Singh <devender@xxxxxxxxxx> wrote:

Hello Fredreic


Thanks for your reply, Yes I also faced this issue after draining and removing of the node. So used the same command and remove “original_weight” using ceph config-key get mgr/cephadm/osd_remove_queue and injected file again. Which resolved the orch issue.

“Error ENOENT: Module not found - ceph orch commands stoppd working

ceph config-key get mgr/cephadm/osd_remove_queue > osd_remove_queue.json

Then only remove the "original_weight" key from that json and upload it back to the config-key store:

ceph config-key set mgr/cephadm/osd_remove_queue -i osd_remove_queue_modified.json

Then fail the mgr:

ceph mgr fail”


But now issue is, my cluster showing objects misplaced, whereas I had 5 nodes with host failure domain with R3 pool (size 3 and min 2), EC with 3+2.

# ceph -s
 cluster:
   id:     384d7590-d018-11ee-b74c-5b2acfe0b35c
   health: HEALTH_WARN
Degraded data redundancy: 2848547/29106793 objects degraded (9.787%), 105 pgs degraded, 132 pgs undersized

 services:
   mon: 4 daemons, quorum node1,node5,node4,node2 (age 12h)
   mgr: node1.cvknae(active, since 12h), standbys: node4.foomun
   mds: 2/2 daemons up, 2 standby
   osd: 95 osds: 95 up (since 16h), 95 in (since 21h); 124 remapped pgs
   rgw: 2 daemons active (2 hosts, 1 zones)

 data:
   volumes: 2/2 healthy
   pools:   18 pools, 817 pgs
   objects: 6.06M objects, 20 TiB
   usage:   30 TiB used, 302 TiB / 332 TiB avail
   pgs:     2848547/29106793 objects degraded (9.787%)
            2617833/29106793 objects misplaced (8.994%)
            561 active+clean
            124 active+clean+remapped
            105 active+undersized+degraded
            27  active+undersized

 io:
   client:   1.4 MiB/s rd, 4.0 MiB/s wr, 25 op/s rd, 545 op/s wr

And when using 'ceph config-key ls’ it’s showing old node and osd’s.

# ceph config-key ls|grep -i 03n
   "config-history/135/+osd/host:node3/osd_memory_target",
   "config-history/14990/+osd/host:node3/osd_memory_target",
   "config-history/14990/-osd/host:node3/osd_memory_target",
   "config-history/15003/+osd/host:node3/osd_memory_target",
   "config-history/15003/-osd/host:node3/osd_memory_target",
   "config-history/15016/+osd/host:node3/osd_memory_target",
   "config-history/15016/-osd/host:node3/osd_memory_target",
   "config-history/15017/+osd/host:node3/osd_memory_target",
   "config-history/15017/-osd/host:node3/osd_memory_target",
   "config-history/15022/+osd/host:node3/osd_memory_target",
   "config-history/15022/-osd/host:node3/osd_memory_target",
   "config-history/15024/+osd/host:node3/osd_memory_target",
   "config-history/15024/-osd/host:node3/osd_memory_target",
   "config-history/15025/+osd/host:node3/osd_memory_target",
   "config-history/15025/-osd/host:node3/osd_memory_target",
   "config-history/153/+osd/host:node3/osd_memory_target",
   "config-history/153/-osd/host:node3/osd_memory_target",
   "config-history/165/+mon.node3/container_image",
   "config-history/171/-mon.node3/container_image",
   "config-history/176/+client.crash.node3/container_image",
   "config-history/182/-client.crash.node3/container_image",
   "config-history/4276/+osd/host:node3/osd_memory_target",
   "config-history/4276/-osd/host:node3/osd_memory_target",
   "config-history/433/+client.ceph-exporter.node3/container_image",
   "config-history/439/-client.ceph-exporter.node3/container_image",
   "config-history/459/+osd/host:node3/osd_memory_target",
   "config-history/459/-osd/host:node3/osd_memory_target",
   "config-history/465/+osd/host:node3/osd_memory_target",
   "config-history/465/-osd/host:node3/osd_memory_target",
   "config-history/4867/+osd/host:node3/osd_memory_target",
   "config-history/4867/-osd/host:node3/osd_memory_target",
   "config-history/4878/+mon.node3/container_image",
   "config-history/4884/-mon.node3/container_image",
   "config-history/4889/+client.crash.node3/container_image",
   "config-history/4895/-client.crash.node3/container_image",
   "config-history/5139/+mds.k8s-dev-cephfs.node3.iebxqn/container_image",
   "config-history/5142/-mds.k8s-dev-cephfs.node3.iebxqn/container_image",
   "config-history/5150/+client.ceph-exporter.node3/container_image",
   "config-history/5156/-client.ceph-exporter.node3/container_image",
   "config-history/5179/+osd/host:node3/osd_memory_target",
   "config-history/5179/-osd/host:node3/osd_memory_target",
   "config-history/5183/+client.rgw.sea-dev.node3.betyqd/rgw_frontends",
   "config-history/5189/+osd/host:node3/osd_memory_target",
   "config-history/5189/-osd/host:node3/osd_memory_target",
   "config-history/6929/-client.rgw.sea-dev.node3.betyqd/rgw_frontends",
   "config-history/6933/+osd/host:node3/osd_memory_target",
   "config-history/6933/-osd/host:node3/osd_memory_target",
   "config-history/9710/+osd/host:node3/osd_memory_target",
   "config-history/9710/-osd/host:node3/osd_memory_target",
   "config/osd/host:node3/osd_memory_target”,


Regards
Dev

On Jan 25, 2025, at 4:39 AM, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote:

Hi,

I've seen this happening on a test cluster after draining a host that also had a MGR service. Can you check if Eugen's solution here [1] helps in your case ? And maybe investigate 'ceph config-key ls' for any issues in config keys ?

Regards,
Frédéric.

[1] https://www.google.com/url?q=https://www.spinics.net/lists/ceph-users/msg83667.html&source=gmail-imap&ust=1738445690000000&usg=AOvVaw15NWLxIRc3boBpYf4URpvo <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttps://www.spinics.net/lists/ceph-users/msg83667.html%26source%3Dgmail-imap%26ust%3D1738413580000000%26usg%3DAOvVaw3Zk70LrQ6SrLX02gJ7Cowl&source=gmail-imap&ust=1738445690000000&usg=AOvVaw059BufjCpNI3NhOIPfBdFy>

De : Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
Envoyé : samedi 25 janvier 2025 06:27
À : Fnu Virender Kumar
Cc: ceph-users
Objet :  Re: Error ENOENT: Module not found

Thanks for you reply… but those command not working as its an always module..but strange still showing error,

# ceph  mgr module enable orchestrator
module 'orchestrator' is already enabled (always-on)

# ceph orch set backend  — returns successfully…

# # ceph orch ls
Error ENOENT: No orchestrator configured (try `ceph orch set backend`)

Its revolving between same error..

Root Cause: I removed a hosts and its odd’s and after some time above error started automatically.

Earlier in the had 5 nodes but now 4.. Cluster is showing unclean pg but not doing anything..

But big error is Error ENOENT:


Regards
Dev

> On Jan 24, 2025, at 4:59 PM, Fnu Virender Kumar <virenderk@xxxxxxxxxxxx <mailto:virenderk@xxxxxxxxxxxx>> wrote:
>
> Did you try
>
> Ceph mgr module enable orchestrator
> Ceph orch set backend
> Ceph orch ls
>
> Check the mgr service daemon as well
> Ceph -s
>
>
> Regards
> Virender
> From: Devender Singh <devender@xxxxxxxxxx <mailto:devender@xxxxxxxxxx>>
> Sent: Friday, January 24, 2025 6:34:43 PM
> To: ceph-users <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
> Subject:  Error ENOENT: Module not found
>
>
> Hello all
>
> Any quick fix for …
>
> root@sea-devnode1:~# ceph orch ls
> Error ENOENT: Module not found
>
>
> Regards
> Dev
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux