Re: [RFC v1 2/4] kernel/fork.c: implement new process_mmput_async syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Nov 2021 13:20:11 -0600
ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:

> Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> writes:
> 
> > The goal of this new syscall is to be able to asynchronously free the
> > mm of a dying process. This is especially useful for processes that use
> > huge amounts of memory (e.g. databases or KVM guests). The process is
> > allowed to terminate immediately, while its mm is cleaned/reclaimed
> > asynchronously.
> >
> > A separate process needs use the process_mmput_async syscall to attach
> > itself to the mm of a running target process. The process will then
> > sleep until the last user of the target mm has gone.
> >
> > When the last user of the mm has gone, instead of synchronously free
> > the mm, the attached process is awoken. The syscall will then continue
> > and clean up the target mm.
> >
> > This solution has the advantage that the cleanup of the target mm can
> > happen both be asynchronous and properly accounted for (e.g. cgroups).
> >
> > Tested on s390x.
> >
> > A separate patch will actually wire up the syscall.  
> 
> I am a bit confused.
> 
> You want the process report that it has finished immediately,
> and you want the cleanup work to continue on in the background.
> 
> Why do you need a separate process?
> 
> Why not just modify the process cleanup code to keep the task_struct
> running while allowing waitpid to reap the process (aka allowing
> release_task to run)?  All tasks can be already be reaped after
> exit_notify in do_exit.
> 
> I can see some reasons for wanting an opt-in.  It is nice to know all of
> a processes resources have been freed when waitpid succeeds.
> 
> Still I don't see why this whole thing isn't exit_mm returning
> the mm_sturct when a flag is set, and then having an exit_mm_late
> being called and passed the returned mm after exit_notify.

nevermind, exit_notify is done after cgroup_exit, the teardown would
then not be accounted properly

> 
> Or maybe something with schedule_work or task_work, instead of an
> exit_mm_late.  I don't see any practical difference.
> 
> I really don't see why this needs a whole other process to connect to
> the process you care about asynchronously.
> 
> This whole thing seems an exercise in spending lots of resources to free
> resources much later.
> 
> Eric





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux