Re: Cephadm cannot aquire lock

David Orman <ormandj@xxxxxxxxxxxx> · Thu, 2 Sep 2021 14:03:36 -0500

It may be this:

https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62

Which we resolved with: https://github.com/alfredodeza/remoto/pull/63

What version of ceph are you running, and is it impacted by the above?

David

On Thu, Sep 2, 2021 at 9:53 AM fcid <fcid@xxxxxxxxxxx> wrote:
>
> Hi Sebastian,
>
> Following your sugestion, I've found this process:
>
> /usr/bin/python3
> /var/lib/ceph/<FSID>/cephadm.f77d9d71514a634758d4ad41ab6eef36d25386c99d8b365310ad41f9b74d5ce6
> --image
> ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
> ceph-volume --fsid <FSID> -- lvm list --format json
>
> That process have been running for more than 12 hours, so I killed it
> and then cephadm could aquire lock. Shortly after the process starts
> again and I can see that it is running on all the nodes (we have 3
> nodes). I tried executing the same sentence in all the nodes, from the
> command line, and it works fine, here is the output
> https://pastebin.com/v58Nyxdx.
>
> What can be causing this process to be stuck when it is launched by the
> orchestrator, since launching it from the command line works fine?
>
> Thank you, kind regards.
>
> On 02/09/2021 05:19, Sebastian Wagner wrote:
> >
> > Am 31.08.21 um 04:05 schrieb fcid:
> >> Hi ceph community,
> >>
> >> I'm having some trouble trying to delete an OSD.
> >>
> >> I've been using cephadm in one of our clusters and it's works fine,
> >> but lately, after an OSD failure, I cannot delete it using the
> >> orchestrator. Since the orchestrator is not working (for some unknown
> >> reason) I tried to manually delete the OSD using the following command:
> >>
> >> ceph purge osd <id> --yes-i-really-mean-it
> >>
> >> This command removed the OSD from the crush map, but then the warning
> >> CEPHADM_FAILED_DEAMON appeared. So the next step is delete de daemon
> >> in the server that use to host the failed OSD. The command I used
> >> here was the following:
> >>
> >> cephadm rm-daemon --name osd.<id> --fsid <FSID>
> >>
> >> But this command does not work because, accoding to the log, cephadm
> >> cannot aquire lock:
> >>
> >> 2021-08-30 21:50:09,712 DEBUG Lock 139899822730784 not acquired on
> >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ...
> >> 2021-08-30 21:50:09,762 DEBUG Acquiring lock 139899822730784 on
> >> /run/cephadm/$FSID.lock
> >> 2021-08-30 21:50:09,763 DEBUG Lock 139899822730784 not acquired on
> >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ...
> >>
> >> The file /run/cephadm/$FSID.lock does exist. Can I safely remove it?
> >> What should I check before doing such task.
> >
> > Yes, in case you're sure that no other cephadm process (i.e. call
> > `ps`) is stuck.
> >
> >>
> >> I'll really appreciate any hint you can give relating this matter.
> >>
> >> Thanks! regards.
> >>
> >
> --
> AltaVoz <https://www.altavoz.net/>
> Fernando Cid
> Ingeniero de Operaciones
> www.altavoz.net <https://www.altavoz.net/>
> Ubicación AltaVoz
> Viña del Mar: 2 Poniente 355 of 53
> <https://www.altavoz.net/altavoz/contacto> | +56 32 276 8060
> <tel:+56322768060>
> Santiago: Antonio Bellet 292 of 701
> <https://www.altavoz.net/altavoz/contacto> | +56 2 2585 4264
> <tel:+562225854264>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx