On Fri, Mar 06, 2020 at 08:05:22PM +0000, Al Viro wrote: > On Fri, Mar 06, 2020 at 07:58:23PM +0000, Al Viro wrote: > > On Fri, Mar 06, 2020 at 07:43:22PM +0000, Al Viro wrote: > > > On Fri, Mar 06, 2020 at 05:25:49PM +0100, Miklos Szeredi wrote: > > > > On Tue, Mar 03, 2020 at 08:46:09AM +0100, Miklos Szeredi wrote: > > > > > > > > > > I'm doing a patch. Let's see how it fares in the face of all these > > > > > preconceptions. > > > > > > > > Here's a first cut. Doesn't yet have superblock info, just mount info. > > > > Probably has rough edges, but appears to work. > > > > > > For starters, you have just made namespace_sem held over copy_to_user(). > > > This is not going to fly. > > > > In case if the above is too terse: you grab your mutex while under > > namespace_sem (see attach_recursive_mnt()); the same mutex is held > > while calling dir_emit(). Which can (and normally does) copy data > > to userland-supplied buffer. > > > > NAK for that reason alone, and to be honest I had been too busy > > suppressing the gag reflex to read and comment any deeper. > > > > I really hate that approach, in case it's not clear from the above. > > To the degree that I don't trust myself to filter out the obscenities > > if I try to comment on it right now. > > > > The only blocking thing we can afford under namespace_sem is GFP_KERNEL > > allocation. > > Incidentally, attach_recursive_mnt() only gets you the root(s) of > attached tree(s); try mount --rbind and see how much you've missed. You are misreading mntput_no_expire(), BTW - your get_mount() can bloody well race with umount(2), hitting the moment when we are done figuring out whether it's busy but hadn't cleaned ->mnt_ns (let alone set MNT_DOOMED) yet. If somebody calls umount(2) on a filesystem that is not mounted anywhere else, they are not supposed to see the sucker return 0 until the filesystem is shut down. You break that.