On Thu, May 28, 2020 at 01:16:46AM +0200, Christian Brauner wrote: > I'm also starting to think this isn't even possible or currently doable > safely. > The fdtable in the kernel would end up with a dangling pointer, I would > think. Unless you backtrack all fds that still have a reference into the > fdtable and refer to that file and close them all in the kernel which I > don't think is possible and also sounds very dodgy. This also really > seems like we would be breaking a major contract, namely that fds stay > valid until userspace calls close, execve(), or exits. Right, I think I was just using the wrong words? I was looking at it like a pipe, or a socket, where you still have an fd, but reads return 0, you might get SIGPIPE, etc. The VFS clearly knows what a "disconnected" fd is, and I had assumed there was general logic for it to indicate "I'm not here any more". I recently did something very similar to the pstore filesystem, but I got to cheat with some massive subsystem locks. In that case I needed to clear all the inodes out of the tmpfs, so I unlink them all and manage the data lifetimes pointing back into the (waiting to be unloaded) backend module by NULLing the pointer back, which is safe because of the how the locking there happens to work. Any open readers, when they close, will have the last ref count dropped, at which point the record itself is released too. Back to the seccomp subject: should "all tasks died" be distinguishable from "I can't find that notification" in the ioctl()? (i.e. is ENOENT sufficient, or does there need to be an EIO or ESRCH there?) -- Kees Cook _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers