Re: Cephadm cannot aquire lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sebastian,

Following your sugestion, I've found this process:

/usr/bin/python3 /var/lib/ceph/<FSID>/cephadm.f77d9d71514a634758d4ad41ab6eef36d25386c99d8b365310ad41f9b74d5ce6 --image ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a ceph-volume --fsid <FSID> -- lvm list --format json

That process have been running for more than 12 hours, so I killed it and then cephadm could aquire lock. Shortly after the process starts again and I can see that it is running on all the nodes (we have 3 nodes). I tried executing the same sentence in all the nodes, from the command line, and it works fine, here is the output https://pastebin.com/v58Nyxdx.

What can be causing this process to be stuck when it is launched by the orchestrator, since launching it from the command line works fine?

Thank you, kind regards.

On 02/09/2021 05:19, Sebastian Wagner wrote:

Am 31.08.21 um 04:05 schrieb fcid:
Hi ceph community,

I'm having some trouble trying to delete an OSD.

I've been using cephadm in one of our clusters and it's works fine, but lately, after an OSD failure, I cannot delete it using the orchestrator. Since the orchestrator is not working (for some unknown reason) I tried to manually delete the OSD using the following command:

ceph purge osd <id> --yes-i-really-mean-it

This command removed the OSD from the crush map, but then the warning CEPHADM_FAILED_DEAMON appeared. So the next step is delete de daemon in the server that use to host the failed OSD. The command I used here was the following:

cephadm rm-daemon --name osd.<id> --fsid <FSID>

But this command does not work because, accoding to the log, cephadm cannot aquire lock:

2021-08-30 21:50:09,712 DEBUG Lock 139899822730784 not acquired on /run/cephadm/$FSID.lock, waiting 0.05 seconds ... 2021-08-30 21:50:09,762 DEBUG Acquiring lock 139899822730784 on /run/cephadm/$FSID.lock 2021-08-30 21:50:09,763 DEBUG Lock 139899822730784 not acquired on /run/cephadm/$FSID.lock, waiting 0.05 seconds ...

The file /run/cephadm/$FSID.lock does exist. Can I safely remove it? What should I check before doing such task.

Yes, in case you're sure that no other cephadm process (i.e. call `ps`) is stuck.


I'll really appreciate any hint you can give relating this matter.

Thanks! regards.


--
AltaVoz <https://www.altavoz.net/> 	
Fernando Cid
Ingeniero de Operaciones
www.altavoz.net <https://www.altavoz.net/>
Ubicación AltaVoz 	
Viña del Mar: 2 Poniente 355 of 53 <https://www.altavoz.net/altavoz/contacto> | +56 32 276 8060 <tel:+56322768060> Santiago: Antonio Bellet 292 of 701 <https://www.altavoz.net/altavoz/contacto> | +56 2 2585 4264 <tel:+562225854264>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux