Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Wed, Feb 12, 2020 at 1:48 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> >> The good news is proc_flush_task isn't exactly called from process exit. >> proc_flush_task is called during zombie clean up. AKA release_task. > > Yeah, that at least avoids some of the nasty locking while dying debug problems. > > But the one I was more worried about was actually the lock contention > issue with lots of processes. The lock is basically a single global > lock in many situations - yes, it's technically per-ns, but in a lot > of cases you really only have one namespace anyway. > > And we've had problems with global locks in this area before, notably > the one you call out: > >> Further after proc_flush_task does it's thing the code goes >> and does "write_lock_irq(&task_list_lock);" > > Yeah, so it's not introducing a new issue, but it is potentially > making something we already know is bad even worse. > >> What would be downside of having a mutex for a list of proc superblocks? >> A mutex that is taken for both reading and writing the list. > > That's what the original patch actually was, and I was hoping we could > avoid that thing. > > An rwsem would be possibly better, since most cases by far are likely > about reading. > > And yes, I'm very aware of the task_list_lock, but it's literally why > I don't want to make a new one. > > I'm _hoping_ we can some day come up with something better than > task_list_lock. Yes. I understand that. I occassionally play with ideas, and converted all of proc to rcu to help with situation but I haven't come up with anything clearly better. All of this is why I was really hoping we could have a change in strategy and see if we can make the shrinker be able to better prune proc inodes. I think I have an alternate idea that could work. Add some extra code into proc_task_readdir, that would look for dentries that no longer point to tasks and d_invalidate them. With the same logic probably being called from a few more places as well like proc_pid_readdir, proc_task_lookup, and proc_pid_lookup. We could even optimize it and have a process died flag we set in the superblock. That would would batch up the freeing work until the next time someone reads from proc in a way that would create more dentries. So it would prevent dentries from reaped zombies from growing without bound. Hmm. Given the existence of proc_fill_cache it would really be a good idea if readdir and lookup performed some of the freeing work as well. As on readdir we always populate the dcache for all of the directory entries. I am booked solid for the next little while but if no one beats me to it I will try and code something like that up where at least readdir looks for and invalidates stale dentries. Eric