On Wed, Sep 14, 2022 at 10:41:05AM +0200, Ilya Dryomov wrote:
On Wed, Sep 14, 2022 at 5:49 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Tue, Sep 13, 2022 at 01:43:16PM +0200, Ilya Dryomov wrote:
On Tue, Sep 13, 2022 at 3:44 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
What can make a "rbd unmap" fail, assuming the device is not mounted
and not (obviously) open by any other processes?
linux-5.15.58
ceph-16.2.9
I have multiple XFS on rbd filesystems, and often create rbd snapshots,
map and read-only mount the snapshot, perform some work on the fs, then
unmount and unmap. The unmap regularly (about 1 in 10 times) fails
like:
$ sudo rbd unmap /dev/rbd29
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
tl;dr problem solved: there WAS a process holding the rbd device open.
The culprit was a 'pvs' command being run periodically by 'ceph-volume'.
When the 'rbd unmap' was tried run at the same time the 'pvs' command
was running, the unmap would fail.
It turns out the 'dd' command in my test script was only instrumental in
as much as it made the test run long enough that it would intersect with
the periodic 'pvs'. I had been thinking the 'dd' was causing the rbd
data to be buffered in the kernel and perhaps the buffered which would
sometimes not be cleared immediately, causing the rbd unmap to fail.
The conflicting 'pvs' command was a bit tricky to catch because it was
only running for a very short time, so the 'pvs' would be gone by the
time I'd run 'lsof'. The key to finding the prolem was to look through
the processes as quickly as possible upon an unmap failure, e.g.:
----------------------------------------------------------------------
if ! rbd device unmap "${dev}"; then
while read -r p; do
p=${p#/proc/}; p=${p%%/*}
(( p == prevp )) && continue
prevp=$p
printf '%(%F %T)T %d\t%s\n' -1 "${p}" "$(tr '\0' ' ' < /proc/${p}/cmdline)"
pp=$(awk '$1=="PPid:"{print $2}' /proc/${p}/status)
printf '+ %d\t%s\n' "${pp}" "$(tr '\0' ' ' < /proc/${pp}/cmdline)"
ppp=$(awk '$1=="PPid:"{print $2}' /proc/${pp}/status)
printf '+ %d\t%s\n' "${ppp}" "$(tr '\0' ' ' < /proc/${ppp}/cmdline)"
done < <(
find /proc/[0-9]*/fd -lname "${dev}" 2> /dev/null
)
fi
----------------------------------------------------------------------
Note that 'pvs' normally does NOT scan rbd devices: you have to
explicitly add "rbd" to the lvm.conf element for "List of additional
acceptable block device types", e.g.:
/etc/lvm/lvm.conf
--
devices {
types = [ "rbd", 1024 ]
}
--
I'd previously enabled the rbd scanning when testing some lvm-on-rbd
stuff.
After removing rbd from the lvm.conf I was able to run through my unmap
test 150 times without a single unmap failure.
---------------------------------------------------------------------
#!/bin/bash
set -e
rbdname=pool/name
for ((i=0; ++i<=50; )); do
dev=$(rbd map "${rbdname}")
ts "${i}: ${dev}"
dd if="${dev}" of=/dev/null bs=1G count=1
for ((j=0; ++j; )); do
rbd unmap "${dev}" && break
sleep 1m
done
(( j > 1 )) && echo "$j minutes to unmap"
done
---------------------------------------------------------------------
This failed at about the same rate, i.e. around 1 in 10. This time it
only took 2 minutes each time to successfully unmap after the initial
unmap failed - I'm not sure if this is due to the test change (no
mount), or related to how busy the machine is otherwise.
I would suggest repeating this test with "sleep 1s" to get a better
idea of how long it really takes.
With "sleep 1s" it was generally successful the 2nd time around. I'm a
bit puzzled at this because I'm certain, before I started scripting this
test, I was doing many unmap attempts before finally successfully
unmapping. I was convinced it was a matter of waiting for "something" to
time out before the device was released, and in the meantime 'lsof'
wasn't showing anything with the device open. It's implausible I was
running into the 'pvs' command each of those times so what was actually
going on there is a bit of a mystery.
I don't think so. To confirm, now that there is no filesystem in the
mix, replace "rbd unmap" with "rbd unmap -o force". If that fixes the
issue, RBD is very unlikely to have anything to do with it because all
"force" does is it overrides the "is this device still open" check
at the very top of "rbd unmap" handler in the kernel.
I'd already confirmed "-o force" (or --force) would remove the device
but I was concerned that could possibly cause data corruption if/when
using a writable rbd so I wanted to get to the bottom of the problem.
systemd-udevd may open block devices behind your back. "rbd unmap"
command actually does a retry internally to work around that:
Huh, interesting.
Perhaps it is hitting "udevadm settle" timeout on your system?
"strace -f" might be useful here.
A good suggestion although using 'strace' wasn't necessary in the end.
Thanks for your help!
Chris