Re: rbd unmap fails with "Device or resource busy"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

On Fri, Sep 23, 2022 at 11:47:11AM +0200, Ilya Dryomov wrote:
On Fri, Sep 23, 2022 at 5:58 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Wed, Sep 21, 2022 at 12:40:54PM +0200, Ilya Dryomov wrote:
On Wed, Sep 21, 2022 at 3:36 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Tue, Sep 13, 2022 at 3:44 AM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
What can make a "rbd unmap" fail, assuming the device is not mounted and not (obviously) open by any other processes?

OK, I'm confident I now understand the cause of this problem. The particular machine where I'm mounting the rbd snapshots is also running some containerised ceph services. The ceph containers are (bind-)mounting the entire host filesystem hierarchy on startup, and if a ceph container happens to start up whilst a rbd device is mounted, the container also has the rbd mounted, preventing the host from unmapping the device even after the host has unmounted it. (More below.)

This brings up a couple of issues...

Why is the ceph container getting access to the entire host filesystem in the first place?

Even if I mount an rbd device with the "unbindable" mount option, which is specifically supposed to prevent bind mounts to that filesystem, the ceph containers still get the mount - how / why??

If the ceph containers really do need access to the entire host filesystem, perhaps it would be better to do a "slave" mount, so if/when the hosts unmounts a filesystem it's also unmounted in the container[s]. (Of course this also means any filesystems newly mounted in the host would also appear in the containers - but that happens anyway if the container is newly started).

Thanks for the great analysis! I think ceph-volume container does it because of [1]. I'm not sure about "cephadm shell". There is also node-exporter container that needs access to the host for gathering metrics.

[1] https://tracker.ceph.com/issues/52926

I'm guessing ceph-volume may need to see the host mounts so it can detect a disk is being used. Could this also be done in the host (like issue 52926 says is being done with pv/vg/lv commands), removing the need to have the entire host filesystem hierarchy available in the container?

Similarly, I would have thought the node-exporter container only needs access to ceph-specific files/directories rather than the whole system.

On Tue, Sep 27, 2022 at 12:55:37PM +0200, Ilya Dryomov wrote:
On Fri, Sep 23, 2022 at 3:06 PM Guillaume Abrioux <gabrioux@xxxxxxxxxx> wrote:
On Fri, 23 Sept 2022 at 05:59, Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
If the ceph containers really do need access to the entire host filesystem, perhaps it would be better to do a "slave" mount,

Yes, I think a mount with 'slave' propagation should fix your issue. I plan to do some tests next week and work on a patch.

Thanks Guillaume.

I wanted to share an observation that there seem to be two cases here: actual containers (e.g. an OSD container) and cephadm shell which is technically also a container but may be regarded by users as a shell ("window") with some binaries and configuration files injected into it.

For my part I don't see or use a cephadm shell as a normal shell with additional stuff injected. At the very least the host root filesystem location has changed to /rootfs so it's obviously not a standard shell.

In fact I was quite surprised that the rootfs and all the other mounts unrelated to ceph were available at all. I'm still not convinced it's a good idea.

In my conception a cephadm shell is a mini virtual machine specifically for inspecting and managing ceph specific areas *only*.

I guess it's really a difference of philosophy. I only use cephadm shell when I'm explicitly needing to so something with ceph, and I drop back out of the cephadm shell (and it's associated privleges!) as soon as I'm done with that specific task. For everything else I'll be in my (non-privileged) host shell. I can imagine (although I must say I'd be surprised), that others may use the cephadm shell as a matter of course, for managing the whole machine? Then again, given issue 52926 quoted above, it sounds like that would be a bad idea if, for instance, the lvm commands should NOT be run the container "in order to avoid lvm metadata corruption" - i.e. it's not safe to assume a cephadm shell is a normal shell.

I would argue the goal should be to remove access to the general host filesystem(s) from the ceph containers altogether where possible.

I'll also admit that, generally, it's probably a bad idea to be doing things unrelated to ceph on a box hosting ceph. But that's the way this particular system has grown and unfortunately it will take quite a bit of time, effort, and expense to change this now.

For the former, a unidirectional propagation such that when something is unmounted on the host it is also unmounted in the container is all that is needed. However, for the latter, a bidirectional propagation such that when something is mounted in this shell it is also mounted on the host (and therefore in all other windows) seems desirable.

What do you think about going with MS_SLAVE for the former and MS_SHARED for the latter?

Personally I would find it surprising and unexpected (i.e. potentially a source of trouble) for mount changes done in a container (including a "shell" container) to affect the host. But again, that may be that difference of philosophy regarding the cephadm shell mentioned above.

Chris



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux