It may be this: https://tracker.ceph.com/issues/50526 https://github.com/alfredodeza/remoto/issues/62 Which we resolved with: https://github.com/alfredodeza/remoto/pull/63 What version of ceph are you running, and is it impacted by the above? David On Thu, Sep 2, 2021 at 9:53 AM fcid <fcid@xxxxxxxxxxx> wrote: > > Hi Sebastian, > > Following your sugestion, I've found this process: > > /usr/bin/python3 > /var/lib/ceph/<FSID>/cephadm.f77d9d71514a634758d4ad41ab6eef36d25386c99d8b365310ad41f9b74d5ce6 > --image > ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a > ceph-volume --fsid <FSID> -- lvm list --format json > > That process have been running for more than 12 hours, so I killed it > and then cephadm could aquire lock. Shortly after the process starts > again and I can see that it is running on all the nodes (we have 3 > nodes). I tried executing the same sentence in all the nodes, from the > command line, and it works fine, here is the output > https://pastebin.com/v58Nyxdx. > > What can be causing this process to be stuck when it is launched by the > orchestrator, since launching it from the command line works fine? > > Thank you, kind regards. > > On 02/09/2021 05:19, Sebastian Wagner wrote: > > > > Am 31.08.21 um 04:05 schrieb fcid: > >> Hi ceph community, > >> > >> I'm having some trouble trying to delete an OSD. > >> > >> I've been using cephadm in one of our clusters and it's works fine, > >> but lately, after an OSD failure, I cannot delete it using the > >> orchestrator. Since the orchestrator is not working (for some unknown > >> reason) I tried to manually delete the OSD using the following command: > >> > >> ceph purge osd <id> --yes-i-really-mean-it > >> > >> This command removed the OSD from the crush map, but then the warning > >> CEPHADM_FAILED_DEAMON appeared. So the next step is delete de daemon > >> in the server that use to host the failed OSD. The command I used > >> here was the following: > >> > >> cephadm rm-daemon --name osd.<id> --fsid <FSID> > >> > >> But this command does not work because, accoding to the log, cephadm > >> cannot aquire lock: > >> > >> 2021-08-30 21:50:09,712 DEBUG Lock 139899822730784 not acquired on > >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ... > >> 2021-08-30 21:50:09,762 DEBUG Acquiring lock 139899822730784 on > >> /run/cephadm/$FSID.lock > >> 2021-08-30 21:50:09,763 DEBUG Lock 139899822730784 not acquired on > >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ... > >> > >> The file /run/cephadm/$FSID.lock does exist. Can I safely remove it? > >> What should I check before doing such task. > > > > Yes, in case you're sure that no other cephadm process (i.e. call > > `ps`) is stuck. > > > >> > >> I'll really appreciate any hint you can give relating this matter. > >> > >> Thanks! regards. > >> > > > -- > AltaVoz <https://www.altavoz.net/> > Fernando Cid > Ingeniero de Operaciones > www.altavoz.net <https://www.altavoz.net/> > Ubicación AltaVoz > Viña del Mar: 2 Poniente 355 of 53 > <https://www.altavoz.net/altavoz/contacto> | +56 32 276 8060 > <tel:+56322768060> > Santiago: Antonio Bellet 292 of 701 > <https://www.altavoz.net/altavoz/contacto> | +56 2 2585 4264 > <tel:+562225854264> > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx