What can make a "rbd unmap" fail, assuming the device is not mounted and
not (obviously) open by any other processes?
I have multiple XFS on rbd filesystems, and often create rbd snapshots,
map and read-only mount the snapshot, perform some work on the fs, then
unmount and unmap. The unmap regularly (about 1 in 10 times) fails like:
$ sudo rbd unmap /dev/rbd29
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
I've double checked the device is no longer mounted, and, using "lsof"
etc., nothing has the device open.
A "rbd unmap -f" can unmap the "busy" device but I'm concerned this may
have undesirable consequences, e.g. ceph resource leakage, or even
potential data corruption on non-read-only mounts.
I've found that waiting "a while", e.g. 5-30 minutes, will usually allow
the "busy" device to be unmapped without the -f flag.
A simple "map/mount/read/unmount/unmap" test sees the unmap fail about 1
in 10 times. When it fails it often takes 30 min or more for the unmap to
finally succeed. E.g.:
----------------------------------------
#!/bin/bash
set -e
rbdname=pool/name
for ((i=0; ++i<=50; )); do
dev=$(rbd map "${rbdname}")
mount -oro,norecovery,nouuid "${dev}" /mnt/test
dd if="/mnt/test/big-file" of=/dev/null bs=1G count=1
umount /mnt/test
# blockdev --flushbufs "${dev}"
for ((j=0; ++j; )); do
rbd unmap "${rdev}" && break
sleep 5m
done
done
----------------------------------------
Running "blockdev --flushbufs" prior to the unmap doesn't change the unmap
failures.
What can I look at to see what's causing these unmaps to fail?
Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx