On Fri, Apr 23, 2021 at 9:16 AM Boris Behrens <bb@xxxxxxxxx> wrote: > > > > Am Do., 22. Apr. 2021 um 20:59 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>: >> >> On Thu, Apr 22, 2021 at 7:33 PM Boris Behrens <bb@xxxxxxxxx> wrote: >> > >> > >> > >> > Am Do., 22. Apr. 2021 um 18:30 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>: >> >> >> >> On Thu, Apr 22, 2021 at 6:00 PM Boris Behrens <bb@xxxxxxxxx> wrote: >> >> > >> >> > >> >> > >> >> > Am Do., 22. Apr. 2021 um 17:27 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>: >> >> >> >> >> >> On Thu, Apr 22, 2021 at 5:08 PM Boris Behrens <bb@xxxxxxxxx> wrote: >> >> >> > >> >> >> > >> >> >> > >> >> >> > Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>: >> >> >> >> >> >> >> >> On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens <bb@xxxxxxxxx> wrote: >> >> >> >> > >> >> >> >> > Hi, >> >> >> >> > >> >> >> >> > I have a customer VM that is running fine, but I can not make snapshots >> >> >> >> > anymore. >> >> >> >> > rbd snap create rbd/IMAGE@test-bb-1 >> >> >> >> > just hangs forever. >> >> >> >> >> >> >> >> Hi Boris, >> >> >> >> >> >> >> >> Run >> >> >> >> >> >> >> >> $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20 >> >> >> >> >> >> >> >> let it hang for a few minutes and attach the output. >> >> >> > >> >> >> > >> >> >> > I just pasted a short snip here: https://pastebin.com/B3Xgpbzd >> >> >> > If you need more I can give it to you, but the output is very large. >> >> >> >> >> >> Paste the first couple thousand lines (i.e. from the very beginning), >> >> >> that should be enough. >> >> >> >> >> > sure: https://pastebin.com/GsKpLbqG >> >> > >> >> > good luck :) >> >> >> >> What is the output of "rbd status"? I know you said it shows one >> >> watcher, but I need to see it. >> >> >> >> >> > sure >> > # rbd status rbd/IMAGE >> > Watchers: >> > watcher=[fd00:2380:2:43::11]:0/3919389201 client.136378749 cookie=139968010125312 >> > > Hi Ilya, > thank you a lot for your support. > > This might be other hanging snapshot sheduler that got removed afterwards. > Sorry for that. > > https://pastebin.com/TBZs7Mvb > > I just created a new paste and added status and lock ls at the top and at the bottom. > The 2nd watcher disaperas after a minute or so. > All commands are done within one minute. This snippet confirms my suspicion. Unfortunately without a verbose log from that VM from three days ago (i.e. when it got into this state) it's hard to tell what exactly went wrong. The problem is that the VM doesn't consider itself to be the rightful owner of the lock and so when "rbd snap create" requests the lock from it in order to make a snapshot, the VM just ignores the request because even though it owns the lock, its record appears to be of sync. I'd suggest to kick it by restarting osd36. If the VM is active, it should reacquire the lock and hopefully update its internal record as expected. If "rbd snap create" still hangs after that, it would mean that we have a reproducer and can gather logs on the VM side. What version of qemu/librbd and ceph is in use (both on the VM side and on the side you are running "rbd snap create"? Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx