Re: rbd unmap fails with "Device or resource busy"

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 14 Sep 2022 10:41:05 +0200

On Wed, Sep 14, 2022 at 5:49 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
>
> Hi Illya,
>
> On Tue, Sep 13, 2022 at 01:43:16PM +0200, Ilya Dryomov wrote:
> > On Tue, Sep 13, 2022 at 3:44 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
> >> What can make a "rbd unmap" fail, assuming the device is not mounted
> >> and not (obviously) open by any other processes?
> >>
> >> linux-5.15.58
> >> ceph-16.2.9
> >>
> >> I have multiple XFS on rbd filesystems, and often create rbd snapshots,
> >> map and read-only mount the snapshot, perform some work on the fs, then
> >> unmount and unmap. The unmap regularly (about 1 in 10 times) fails
> >> like:
> >>
> >> $ sudo rbd unmap /dev/rbd29
> >> rbd: sysfs write failed
> >> rbd: unmap failed: (16) Device or resource busy
> >>
> >> I've double checked the device is no longer mounted, and, using "lsof"
> >> etc., nothing has the device open.
> >
> > One thing that "lsof" is oblivious to is multipath, see
> > https://tracker.ceph.com/issues/12763.
>
> The server is not using multipath - e.g. there's no multipathd, and:
>
> $ find /dev/mapper/ -name '*mpath*'
>
> ...finds nothing.
>
> >> I've found that waiting "a while", e.g. 5-30 minutes, will usually
> >> allow the "busy" device to be unmapped without the -f flag.
> >
> > "Device or resource busy" error from "rbd unmap" clearly indicates
> > that the block device is still open by something.  In this case -- you
> > are mounting a block-level snapshot of an XFS filesystem whose "HEAD"
> > is already mounted -- perhaps it could be some background XFS worker
> > thread?  I'm not sure if "nouuid" mount option solves all issues there.
>
> Good suggestion, I should have considered that first. I've now tried it
> without the mount at all, i.e. with no XFS or other filesystem:
>
> ------------------------------------------------------------------------------
> #!/bin/bash
> set -e
> rbdname=pool/name
> for ((i=0; ++i<=50; )); do
>    dev=$(rbd map "${rbdname}")
>    ts "${i}: ${dev}"
>    dd if="${dev}" of=/dev/null bs=1G count=1
>    for ((j=0; ++j; )); do
>      rbd unmap "${dev}" && break
>      sleep 1m
>    done
>    (( j > 1 )) && echo "$j minutes to unmap"
> done
> ------------------------------------------------------------------------------
>
> This failed at about the same rate, i.e. around 1 in 10. This time it only
> took 2 minutes each time to successfully unmap after the initial unmap
> failed - I'm not sure if this is due to the test change (no mount), or
> related to how busy the machine is otherwise.

I would suggest repeating this test with "sleep 1s" to get a better
idea of how long it really takes.

>
> The upshot is, it definitely looks like there's something related to the
> underlying rbd that's preventing the unmap.

I don't think so.  To confirm, now that there is no filesystem in the
mix, replace "rbd unmap" with "rbd unmap -o force".  If that fixes the
issue, RBD is very unlikely to have anything to do with it because all
"force" does is it overrides the "is this device still open" check
at the very top of "rbd unmap" handler in the kernel.

systemd-udevd may open block devices behind your back.  "rbd unmap"
command actually does a retry internally to work around that:

  /*
   * On final device close(), kernel sends a block change event, in
   * response to which udev apparently runs blkid on the device.  This
   * makes unmap fail with EBUSY, if issued right after final close().
   * Try to circumvent this with a retry before turning to udev.
   */
  for (int tries = 0; ; tries++) {
    int sysfs_r = sysfs_write_rbd_remove(buf);
    if (sysfs_r == -EBUSY && tries < 2) {
      if (!tries) {
        usleep(250 * 1000);
      } else if (!(flags & KRBD_CTX_F_NOUDEV)) {
        /*
         * libudev does not provide the "wait until the queue is empty"
         * API or the sufficient amount of primitives to build it from.
         */
        std::string err = run_cmd("udevadm", "settle", "--timeout", "10",
                                  (char *)NULL);
        if (!err.empty())
          std::cerr << "rbd: " << err << std::endl;
      }

Perhaps it is hitting "udevadm settle" timeout on your system?
"strace -f" might be useful here.

Thanks,

                Ilya