Re: lingering process when using rbd-nbd

Josef Johansson <josef86@xxxxxxxxx> · Wed, 21 Dec 2022 22:33:09 +0100



That should obviously be
unmap()
{
  rbd-nbd unmap
}
trap unmap EXIT

On Wed, Dec 21, 2022 at 10:32 PM Josef Johansson <josef86@xxxxxxxxx> wrote:
>
> Right, I actually ended up deadlocking rbd-nbd, that's why I switched
> over to rbd-replay.
> The flow was
>
> rbd-nbd map &
> unmap()
> {
>   rbd-nbd unmap
> }
> while true; do
>   lsblk --noempty /dev/nbd0
>   r=$?
>   [ $r -eq 32 ] && continue
>   [ $r -eq 0 ] && break
> done
> dd if=/dev/random of=/dev/nbd0 bs=4096 count=1 oflag=sync
>
> What I did was to ctrl+c the process directly as I started. Maybe
> adding the following just before dd would be enough.
> Sadly I have to reboot the whole VM afterwards :)
>
> deadlock()
> {
>   sleep 0.1
>   exit 1
> }
> deadlock &
>
> On Wed, Dec 21, 2022 at 10:22 PM Sam Perman <sam@xxxxxxxx> wrote:
> >
> > Thanks, i'll take a look at that.  For reference, the deadlock we are seeing looks similar to the one described at the bottom of this issue: https://tracker.ceph.com/issues/52088
> >
> > thanks
> > sam
> >
> > On Wed, Dec 21, 2022 at 4:04 PM Josef Johansson <josef86@xxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> I made some progress with my testing on a similat issue. Maybe the test will be easy to adapt tonyour case.
> >>
> >> https://tracker.ceph.com/issues/57396
> >>
> >> What I can say though is that I don't see the deadlock problem in my testing.
> >>
> >> Cheers
> >> -Josef
> >>
> >> On Wed, 21 Dec 2022 at 22:00, Sam Perman <sam@xxxxxxxx> wrote:
> >>>
> >>> Hello!
> >>>
> >>> I'm trying to chase down a deadlock we occasionally see on the client side
> >>> when using rbd-nbd and have a question about a lingering process we are
> >>> seeing.
> >>>
> >>> I have a simple test script that will execute the following in order:
> >>>
> >>> * use rbd to create a new image
> >>> * use rbd-nbd to map the image locally
> >>> * mkfs a file system
> >>> * mount the image locally
> >>> * use dd to write some dummy data
> >>> * unmount the device
> >>> * use rbd-nbd to unmap the image
> >>> * use rbd to remove the image
> >>>
> >>> After this is all done, there is a lingering process that I'm curious
> >>> about.
> >>>
> >>> The process is called "[kworker/u9:0-knbd0-recv]" (in state "I") and is a
> >>> child of "[kthreadd]" (in state "S").
> >>>
> >>> Is this normal? I don't see any specific problems with it but I'm
> >>> eventually going to ramp up this test to use a lot of concurrency to see if
> >>> I can reproduce the deadlock we are seeing, and want to make sure I'm
> >>> starting clean.)
> >>>
> >>> Thanks for any insight you have!
> >>> sam
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx