Right, I actually ended up deadlocking rbd-nbd, that's why I switched over to rbd-replay. The flow was rbd-nbd map & unmap() { rbd-nbd unmap } while true; do lsblk --noempty /dev/nbd0 r=$? [ $r -eq 32 ] && continue [ $r -eq 0 ] && break done dd if=/dev/random of=/dev/nbd0 bs=4096 count=1 oflag=sync What I did was to ctrl+c the process directly as I started. Maybe adding the following just before dd would be enough. Sadly I have to reboot the whole VM afterwards :) deadlock() { sleep 0.1 exit 1 } deadlock & On Wed, Dec 21, 2022 at 10:22 PM Sam Perman <sam@xxxxxxxx> wrote: > > Thanks, i'll take a look at that. For reference, the deadlock we are seeing looks similar to the one described at the bottom of this issue: https://tracker.ceph.com/issues/52088 > > thanks > sam > > On Wed, Dec 21, 2022 at 4:04 PM Josef Johansson <josef86@xxxxxxxxx> wrote: >> >> Hi, >> >> I made some progress with my testing on a similat issue. Maybe the test will be easy to adapt tonyour case. >> >> https://tracker.ceph.com/issues/57396 >> >> What I can say though is that I don't see the deadlock problem in my testing. >> >> Cheers >> -Josef >> >> On Wed, 21 Dec 2022 at 22:00, Sam Perman <sam@xxxxxxxx> wrote: >>> >>> Hello! >>> >>> I'm trying to chase down a deadlock we occasionally see on the client side >>> when using rbd-nbd and have a question about a lingering process we are >>> seeing. >>> >>> I have a simple test script that will execute the following in order: >>> >>> * use rbd to create a new image >>> * use rbd-nbd to map the image locally >>> * mkfs a file system >>> * mount the image locally >>> * use dd to write some dummy data >>> * unmount the device >>> * use rbd-nbd to unmap the image >>> * use rbd to remove the image >>> >>> After this is all done, there is a lingering process that I'm curious >>> about. >>> >>> The process is called "[kworker/u9:0-knbd0-recv]" (in state "I") and is a >>> child of "[kthreadd]" (in state "S"). >>> >>> Is this normal? I don't see any specific problems with it but I'm >>> eventually going to ramp up this test to use a lot of concurrency to see if >>> I can reproduce the deadlock we are seeing, and want to make sure I'm >>> starting clean.) >>> >>> Thanks for any insight you have! >>> sam >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx