On Thu, 11 Nov 2021 13:20:11 -0600 ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote: > Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> writes: > > > The goal of this new syscall is to be able to asynchronously free the > > mm of a dying process. This is especially useful for processes that use > > huge amounts of memory (e.g. databases or KVM guests). The process is > > allowed to terminate immediately, while its mm is cleaned/reclaimed > > asynchronously. > > > > A separate process needs use the process_mmput_async syscall to attach > > itself to the mm of a running target process. The process will then > > sleep until the last user of the target mm has gone. > > > > When the last user of the mm has gone, instead of synchronously free > > the mm, the attached process is awoken. The syscall will then continue > > and clean up the target mm. > > > > This solution has the advantage that the cleanup of the target mm can > > happen both be asynchronous and properly accounted for (e.g. cgroups). > > > > Tested on s390x. > > > > A separate patch will actually wire up the syscall. > > I am a bit confused. > > You want the process report that it has finished immediately, > and you want the cleanup work to continue on in the background. > > Why do you need a separate process? > > Why not just modify the process cleanup code to keep the task_struct > running while allowing waitpid to reap the process (aka allowing > release_task to run)? All tasks can be already be reaped after > exit_notify in do_exit. > > I can see some reasons for wanting an opt-in. It is nice to know all of > a processes resources have been freed when waitpid succeeds. > > Still I don't see why this whole thing isn't exit_mm returning > the mm_sturct when a flag is set, and then having an exit_mm_late > being called and passed the returned mm after exit_notify. nevermind, exit_notify is done after cgroup_exit, the teardown would then not be accounted properly > > Or maybe something with schedule_work or task_work, instead of an > exit_mm_late. I don't see any practical difference. > > I really don't see why this needs a whole other process to connect to > the process you care about asynchronously. > > This whole thing seems an exercise in spending lots of resources to free > resources much later. > > Eric