Re: rbd snap create now working and just hangs forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2021 at 9:16 AM Boris Behrens <bb@xxxxxxxxx> wrote:
>
>
>
> Am Do., 22. Apr. 2021 um 20:59 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>>
>> On Thu, Apr 22, 2021 at 7:33 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>> >
>> >
>> >
>> > Am Do., 22. Apr. 2021 um 18:30 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>> >>
>> >> On Thu, Apr 22, 2021 at 6:00 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>> >> >
>> >> >
>> >> >
>> >> > Am Do., 22. Apr. 2021 um 17:27 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>> >> >>
>> >> >> On Thu, Apr 22, 2021 at 5:08 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>> >> >> >>
>> >> >> >> On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>> >> >> >> >
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > I have a customer VM that is running fine, but I can not make snapshots
>> >> >> >> > anymore.
>> >> >> >> > rbd snap create rbd/IMAGE@test-bb-1
>> >> >> >> > just hangs forever.
>> >> >> >>
>> >> >> >> Hi Boris,
>> >> >> >>
>> >> >> >> Run
>> >> >> >>
>> >> >> >> $ rbd snap create rbd/IMAGE@test-bb-1 --debug-ms=1 --debug-rbd=20
>> >> >> >>
>> >> >> >> let it hang for a few minutes and attach the output.
>> >> >> >
>> >> >> >
>> >> >> > I just pasted a short snip here: https://pastebin.com/B3Xgpbzd
>> >> >> > If you need more I can give it to you, but the output is very large.
>> >> >>
>> >> >> Paste the first couple thousand lines (i.e. from the very beginning),
>> >> >> that should be enough.
>> >> >>
>> >> > sure: https://pastebin.com/GsKpLbqG
>> >> >
>> >> > good luck :)
>> >>
>> >> What is the output of "rbd status"?  I know you said it shows one
>> >> watcher, but I need to see it.
>> >>
>> >>
>> > sure
>> > # rbd status rbd/IMAGE
>> > Watchers:
>> > watcher=[fd00:2380:2:43::11]:0/3919389201 client.136378749 cookie=139968010125312
>>
>
> Hi Ilya,
> thank you a lot for your support.
>
> This might be other hanging snapshot sheduler that got removed afterwards.
> Sorry for that.
>
> https://pastebin.com/TBZs7Mvb
>
> I just created a new paste and added status and lock ls at the top and at the bottom.
> The 2nd watcher disaperas after a minute or so.
> All commands are done within one minute.

This snippet confirms my suspicion.  Unfortunately without a verbose
log from that VM from three days ago (i.e. when it got into this state)
it's hard to tell what exactly went wrong.

The problem is that the VM doesn't consider itself to be the rightful
owner of the lock and so when "rbd snap create" requests the lock from
it in order to make a snapshot, the VM just ignores the request because
even though it owns the lock, its record appears to be of sync.

I'd suggest to kick it by restarting osd36.  If the VM is active, it
should reacquire the lock and hopefully update its internal record as
expected.  If "rbd snap create" still hangs after that, it would mean
that we have a reproducer and can gather logs on the VM side.

What version of qemu/librbd and ceph is in use (both on the VM side and
on the side you are running "rbd snap create"?

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux