Re: rbd snap create now working and just hangs forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 23, 2021 at 12:46 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>
>
>
> Am Fr., 23. Apr. 2021 um 12:16 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>>
>> On Fri, Apr 23, 2021 at 12:03 PM Boris Behrens <bb@xxxxxxxxx> wrote:
>> >
>> >
>> >
>> > Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov <idryomov@xxxxxxxxx>:
>> >>
>> >>
>> >> This snippet confirms my suspicion.  Unfortunately without a verbose
>> >> log from that VM from three days ago (i.e. when it got into this state)
>> >> it's hard to tell what exactly went wrong.
>> >>
>> >> The problem is that the VM doesn't consider itself to be the rightful
>> >> owner of the lock and so when "rbd snap create" requests the lock from
>> >> it in order to make a snapshot, the VM just ignores the request because
>> >> even though it owns the lock, its record appears to be of sync.
>> >>
>> >> I'd suggest to kick it by restarting osd36.  If the VM is active, it
>> >> should reacquire the lock and hopefully update its internal record as
>> >> expected.  If "rbd snap create" still hangs after that, it would mean
>> >> that we have a reproducer and can gather logs on the VM side.
>> >>
>> >> What version of qemu/librbd and ceph is in use (both on the VM side and
>> >> on the side you are running "rbd snap create"?
>> >>
>> > I just stopped the OSD, waited some seconds and started it again.
>> > I still can't create snapshots.
>> >
>> > Ceph version is 14.2.18 accross the board
>> > qemu is 4.1.0-1
>> > as we use krbd, the kernel version is 5.2.9-arch1-1-ARCH
>> >
>> > How can I gather more logs to debug it?
>>
>> Are you saying that this image is mapped and the lock is held by the
>> kernel client?  It doesn't look that way from the logs you shared.
>
> We use krbd instead of librbd (at least this is what I think I know), but qemu is doing the kvm/rbd stuff.

I'm going to assume that by "qemu is doing the kvm/rbd stuff", you
mean that you are using the librbd driver inside qemu and that this
image is opened by qemu (i.e. that driver).  If you don't know what
access method is being used, debugging this might be challenging ;)

Let's start with the same output: "rbd lock ls", "rbd status" and "rbd
snap create --debug-ms=1 --debug-rbd=20".  It should be different after
osd36 was restarted.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux