Re: rbd unmap fails with "Device or resource busy"

Chris Dunlop <chris@xxxxxxxxxxxx> · Thu, 15 Sep 2022 18:29:20 +1000

On Wed, Sep 14, 2022 at 10:41:05AM +0200, Ilya Dryomov wrote:
On Wed, Sep 14, 2022 at 5:49 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Tue, Sep 13, 2022 at 01:43:16PM +0200, Ilya Dryomov wrote:
On Tue, Sep 13, 2022 at 3:44 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
What can make a "rbd unmap" fail, assuming the device is not mounted
and not (obviously) open by any other processes?

linux-5.15.58
ceph-16.2.9

I have multiple XFS on rbd filesystems, and often create rbd snapshots,
map and read-only mount the snapshot, perform some work on the fs, then
unmount and unmap. The unmap regularly (about 1 in 10 times) fails
like:

$ sudo rbd unmap /dev/rbd29
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
tl;dr problem solved: there WAS a process holding the rbd device open.

The culprit was a 'pvs' command being run periodically by 'ceph-volume'. 
When the 'rbd unmap' was tried run at the same time the 'pvs' command 
was running, the unmap would fail.
It turns out the 'dd' command in my test script was only instrumental in 
as much as it made the test run long enough that it would intersect with 
the periodic 'pvs'. I had been thinking the 'dd' was causing the rbd 
data to be buffered in the kernel and perhaps the buffered which would 
sometimes not be cleared immediately, causing the rbd unmap to fail.
The conflicting 'pvs' command was a bit tricky to catch because it was 
only running for a very short time, so the 'pvs' would be gone by the 
time I'd run 'lsof'. The key to finding the prolem was to look through 
the processes as quickly as possible upon an unmap failure, e.g.:
----------------------------------------------------------------------
if ! rbd device unmap "${dev}"; then
  while read -r p; do
    p=${p#/proc/}; p=${p%%/*}
    (( p == prevp )) && continue
    prevp=$p

    printf '%(%F %T)T %d\t%s\n' -1 "${p}" "$(tr '\0' ' ' < /proc/${p}/cmdline)"

    pp=$(awk '$1=="PPid:"{print $2}' /proc/${p}/status)
    printf '+ %d\t%s\n' "${pp}" "$(tr '\0' ' ' < /proc/${pp}/cmdline)"

    ppp=$(awk '$1=="PPid:"{print $2}' /proc/${pp}/status)
    printf '+ %d\t%s\n' "${ppp}" "$(tr '\0' ' ' < /proc/${ppp}/cmdline)"
  done < <(
    find /proc/[0-9]*/fd -lname "${dev}" 2> /dev/null
  )
fi
----------------------------------------------------------------------

Note that 'pvs' normally does NOT scan rbd devices: you have to 
explicitly add "rbd" to the lvm.conf element for "List of additional 
acceptable block device types", e.g.:
/etc/lvm/lvm.conf
--
devices {
        types = [ "rbd", 1024 ]
}
--

I'd previously enabled the rbd scanning when testing some lvm-on-rbd 
stuff.
After removing rbd from the lvm.conf I was able to run through my unmap 
test 150 times without a single unmap failure.
---------------------------------------------------------------------
#!/bin/bash
set -e
rbdname=pool/name
for ((i=0; ++i<=50; )); do
   dev=$(rbd map "${rbdname}")
   ts "${i}: ${dev}"
   dd if="${dev}" of=/dev/null bs=1G count=1
   for ((j=0; ++j; )); do
     rbd unmap "${dev}" && break
     sleep 1m
   done
   (( j > 1 )) && echo "$j minutes to unmap"
done
---------------------------------------------------------------------

This failed at about the same rate, i.e. around 1 in 10. This time it 
only took 2 minutes each time to successfully unmap after the initial 
unmap failed - I'm not sure if this is due to the test change (no 
mount), or related to how busy the machine is otherwise.
I would suggest repeating this test with "sleep 1s" to get a better 
idea of how long it really takes.
With "sleep 1s" it was generally successful the 2nd time around. I'm a 
bit puzzled at this because I'm certain, before I started scripting this 
test, I was doing many unmap attempts before finally successfully 
unmapping. I was convinced it was a matter of waiting for "something" to 
time out before the device was released, and in the meantime 'lsof' 
wasn't showing anything with the device open. It's implausible I was 
running into the 'pvs' command each of those times so what was actually 
going on there is a bit of a mystery.
I don't think so.  To confirm, now that there is no filesystem in the
mix, replace "rbd unmap" with "rbd unmap -o force".  If that fixes the
issue, RBD is very unlikely to have anything to do with it because all
"force" does is it overrides the "is this device still open" check
at the very top of "rbd unmap" handler in the kernel.
I'd already confirmed "-o force" (or --force) would remove the device 
but I was concerned that could possibly cause data corruption if/when 
using a writable rbd so I wanted to get to the bottom of the problem.
systemd-udevd may open block devices behind your back.  "rbd unmap"
command actually does a retry internally to work around that:
Huh, interesting.

Perhaps it is hitting "udevadm settle" timeout on your system?
"strace -f" might be useful here.
A good suggestion although using 'strace' wasn't necessary in the end.

Thanks for your help!

Chris