Hey On Thu, Apr 9, 2020 at 10:27 AM Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > On Thu, Apr 09, 2020 at 07:39:18AM +0200, David Rheinsberg wrote: > > With loopfs in place, any process can create its own user_ns, mount > > their private loopfs and create as many loop-devices as they want. > > Hence, this limit does not serve as an effective global > > resource-control. Secondly, anyone with access to `loop-control` can > > now create loop instances until this limit is hit, thus causing anyone > > else to be unable to create more. This effectively prevents you from > > sharing a loopfs between non-trusting parties. I am unsure where that > > limit would actually be used? > > Restricting it globally indeed wasn't the intended use-case for it. This > was more so that you can specify an instance limit, bind-mount that > instance into several places and sufficiently locked down users cannot > exceed the instance limit. But then these users can each exhaust the limit individually. As such, you cannot share this instance across users that have no trust-relationship. Fine with me, but I still don't understand in which scenario the limit would be useful. Anyone can create a user-ns, create a new loopfs mount, and just happily create more loop-devices. So what is so special that you want to restrict the devices on a _single_ mount instance? > I don't think we'd be getting much out of a global limit per se I think > the initial namespace being able to reserve a bunch of devices > they can always rely on being able create when they need them is more > interesting. This is similat to what devpts implements with the > "reserved" mount option and what I initially proposed for binderfs. For > the latter it was deemed unnecessary by others so I dropped it from > loopfs too. The `reserve` of devpts has a fixed 2-tier system: A global limit, and a init-ns reserve. This does nothing to protect one container from another. Furthermore, how do you intend to limit user-space from creating an unbound amount of loop devices? Unless I am mistaken, with your proposal *any* process can create a new loopfs with a basically unlimited amount of loop-devices, thus easily triggering unbound kernel allocations. I think this needs to be accounted. The classic way is to put a per-uid limit into `struct user_struct` (done by pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`, which allows hierarchical management (inotify uses that, as an example). > I also expect most users to pre-create devices in the initial namespace > instance they need (e.g. similar to what binderfs does or what loop > devices currently have). Does that make sense to you? Our use-case is to get programmatic access to loop-devices, so we can build customer images on request (especially to create XFS images, since mkfs.xfs cannot write them, IIRC). We would be perfectly happy with a kernel-interface that takes a file-descriptor to a regular file and returns us a file-descriptor to a newly created block device (which is automatically destroyed when the last file-descriptor to it is closed). This would be ideal *to us*, since it would do automatic cleanup on crashes. We don't need any representation of the loop-device in the file-system, as long as we can somehow mount it (either by passing the bdev-FD to the new mount-api, or by using /proc/self/fd/ as mount-source). With your proposed loop-fs we could achieve something close to it: Mount a private loopfs, create a loop-device, and rely on automatic cleanup when the mount-namespace is destroyed. Thanks David