Hi Sebastian,
Following your sugestion, I've found this process:
/usr/bin/python3
/var/lib/ceph/<FSID>/cephadm.f77d9d71514a634758d4ad41ab6eef36d25386c99d8b365310ad41f9b74d5ce6
--image
ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
ceph-volume --fsid <FSID> -- lvm list --format json
That process have been running for more than 12 hours, so I killed it
and then cephadm could aquire lock. Shortly after the process starts
again and I can see that it is running on all the nodes (we have 3
nodes). I tried executing the same sentence in all the nodes, from the
command line, and it works fine, here is the output
https://pastebin.com/v58Nyxdx.
What can be causing this process to be stuck when it is launched by the
orchestrator, since launching it from the command line works fine?
Thank you, kind regards.
On 02/09/2021 05:19, Sebastian Wagner wrote:
Am 31.08.21 um 04:05 schrieb fcid:
Hi ceph community,
I'm having some trouble trying to delete an OSD.
I've been using cephadm in one of our clusters and it's works fine,
but lately, after an OSD failure, I cannot delete it using the
orchestrator. Since the orchestrator is not working (for some unknown
reason) I tried to manually delete the OSD using the following command:
ceph purge osd <id> --yes-i-really-mean-it
This command removed the OSD from the crush map, but then the warning
CEPHADM_FAILED_DEAMON appeared. So the next step is delete de daemon
in the server that use to host the failed OSD. The command I used
here was the following:
cephadm rm-daemon --name osd.<id> --fsid <FSID>
But this command does not work because, accoding to the log, cephadm
cannot aquire lock:
2021-08-30 21:50:09,712 DEBUG Lock 139899822730784 not acquired on
/run/cephadm/$FSID.lock, waiting 0.05 seconds ...
2021-08-30 21:50:09,762 DEBUG Acquiring lock 139899822730784 on
/run/cephadm/$FSID.lock
2021-08-30 21:50:09,763 DEBUG Lock 139899822730784 not acquired on
/run/cephadm/$FSID.lock, waiting 0.05 seconds ...
The file /run/cephadm/$FSID.lock does exist. Can I safely remove it?
What should I check before doing such task.
Yes, in case you're sure that no other cephadm process (i.e. call
`ps`) is stuck.
I'll really appreciate any hint you can give relating this matter.
Thanks! regards.
--
AltaVoz <https://www.altavoz.net/>
Fernando Cid
Ingeniero de Operaciones
www.altavoz.net <https://www.altavoz.net/>
Ubicación AltaVoz
Viña del Mar: 2 Poniente 355 of 53
<https://www.altavoz.net/altavoz/contacto> | +56 32 276 8060
<tel:+56322768060>
Santiago: Antonio Bellet 292 of 701
<https://www.altavoz.net/altavoz/contacto> | +56 2 2585 4264
<tel:+562225854264>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx