Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> writes: > The goal of this new syscall is to be able to asynchronously free the > mm of a dying process. This is especially useful for processes that use > huge amounts of memory (e.g. databases or KVM guests). The process is > allowed to terminate immediately, while its mm is cleaned/reclaimed > asynchronously. > > A separate process needs use the process_mmput_async syscall to attach > itself to the mm of a running target process. The process will then > sleep until the last user of the target mm has gone. > > When the last user of the mm has gone, instead of synchronously free > the mm, the attached process is awoken. The syscall will then continue > and clean up the target mm. > > This solution has the advantage that the cleanup of the target mm can > happen both be asynchronous and properly accounted for (e.g. cgroups). > > Tested on s390x. > > A separate patch will actually wire up the syscall. I am a bit confused. You want the process report that it has finished immediately, and you want the cleanup work to continue on in the background. Why do you need a separate process? Why not just modify the process cleanup code to keep the task_struct running while allowing waitpid to reap the process (aka allowing release_task to run)? All tasks can be already be reaped after exit_notify in do_exit. I can see some reasons for wanting an opt-in. It is nice to know all of a processes resources have been freed when waitpid succeeds. Still I don't see why this whole thing isn't exit_mm returning the mm_sturct when a flag is set, and then having an exit_mm_late being called and passed the returned mm after exit_notify. Or maybe something with schedule_work or task_work, instead of an exit_mm_late. I don't see any practical difference. I really don't see why this needs a whole other process to connect to the process you care about asynchronously. This whole thing seems an exercise in spending lots of resources to free resources much later. Eric