Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Wed, Feb 12, 2020 at 12:35:04PM -0800, Linus Torvalds wrote: >> On Wed, Feb 12, 2020 at 12:03 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> > >> > What's to prevent racing with fs shutdown while you are doing the second part? >> >> I was thinking that only the proc_flush_task() code would do this. >> >> And that holds a ref to the vfsmount through upid->ns. >> >> So I wasn't suggesting doing this in general - just splitting up the >> implementation of d_invalidate() so that proc_flush_task_mnt() could >> delay the complex part to after having traversed the RCU-protected >> list. >> >> But hey - I missed this part of the problem originally, so maybe I'm >> just missing something else this time. Wouldn't be the first time. > > Wait, I thought the whole point of that had been to allow multiple > procfs instances for the same userns? Confused... Multiple procfs instances for the same pidns. Exactly. Which would let people have their own set of procfs mount options without having to worry about stomping on someone else. The fundamental problem with multiple procfs instances per pidns is there isn't an obvous place to put a vfs mount. ... Which means we need some way to keep the file system from going away while anyone in the kernel is running proc_flush_task. One was I can see to solve this that would give us cheap readers, is to have a percpu count of the number of processes in proc_flush_task. That would work something like mnt_count. Then forbid proc_kill_sb from removing any super block from the list or otherwise making progress until the proc_flush_task_count goes to zero. f we wanted cheap readers and an expensive writer kind of flag that proc_kill_sb can Thinking out loud perhaps we have add a list_head on task_struct and a list_head in proc_inode. That would let us find the inodes and by extention the dentries we care about quickly. Then in evict_inode we could remove the proc_inode from the list. Eric