Re: rbd unmap fails with "Device or resource busy"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Illya,

On Tue, Sep 13, 2022 at 01:43:16PM +0200, Ilya Dryomov wrote:
On Tue, Sep 13, 2022 at 3:44 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
What can make a "rbd unmap" fail, assuming the device is not mounted and not (obviously) open by any other processes?

linux-5.15.58
ceph-16.2.9

I have multiple XFS on rbd filesystems, and often create rbd snapshots, map and read-only mount the snapshot, perform some work on the fs, then unmount and unmap. The unmap regularly (about 1 in 10 times) fails like:

$ sudo rbd unmap /dev/rbd29
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy

I've double checked the device is no longer mounted, and, using "lsof" etc., nothing has the device open.

One thing that "lsof" is oblivious to is multipath, see
https://tracker.ceph.com/issues/12763.

The server is not using multipath - e.g. there's no multipathd, and:

$ find /dev/mapper/ -name '*mpath*'

...finds nothing.

I've found that waiting "a while", e.g. 5-30 minutes, will usually allow the "busy" device to be unmapped without the -f flag.

"Device or resource busy" error from "rbd unmap" clearly indicates
that the block device is still open by something.  In this case -- you
are mounting a block-level snapshot of an XFS filesystem whose "HEAD"
is already mounted -- perhaps it could be some background XFS worker
thread?  I'm not sure if "nouuid" mount option solves all issues there.

Good suggestion, I should have considered that first. I've now tried it without the mount at all, i.e. with no XFS or other filesystem:

------------------------------------------------------------------------------
#!/bin/bash
set -e
rbdname=pool/name
for ((i=0; ++i<=50; )); do
  dev=$(rbd map "${rbdname}")
  ts "${i}: ${dev}"
  dd if="${dev}" of=/dev/null bs=1G count=1
  for ((j=0; ++j; )); do
    rbd unmap "${dev}" && break
    sleep 1m
  done
  (( j > 1 )) && echo "$j minutes to unmap"
done
------------------------------------------------------------------------------

This failed at about the same rate, i.e. around 1 in 10. This time it only took 2 minutes each time to successfully unmap after the initial unmap failed - I'm not sure if this is due to the test change (no mount), or related to how busy the machine is otherwise.

The upshot is, it definitely looks like there's something related to the underlying rbd that's preventing the unmap.

Have you encountered this error in other scenarios, i.e. without
mounting snapshots this way or with ext4 instead of XFS?

I've seen the same issue after unmounting r/w filesystems, but I don't do that nearly as often so it hasn't been a pain point. However, per the test above, the issue is unrelated to the mount.

Cheers,

Chris



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux