Re: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while killing a process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 13 Nov 2020 17:57:02 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:

> On Fri, Nov 13, 2020 at 5:18 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, 13 Nov 2020 17:09:37 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> >
> > > > > > Seems to me that the ability to reap another process's memory is a
> > > > > > generally useful one, and that it should not be tied to delivering a
> > > > > > signal in this fashion.
> > > > > >
> > > > > > And we do have the new process_madvise(MADV_PAGEOUT).  It may need a
> > > > > > few changes and tweaks, but can't that be used to solve this problem?
> > > > >
> > > > > Thank you for the feedback, Andrew. process_madvise(MADV_DONTNEED) was
> > > > > one of the options recently discussed in
> > > > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@xxxxxxxxxxxxxx
> > > > > . The thread describes some of the issues with that approach but if we
> > > > > limit it to processes with pending SIGKILL only then I think that
> > > > > would be doable.
> > > >
> > > > Why would it be necessary to read /proc/pid/maps?  I'd have thought
> > > > that a starting effort would be
> > > >
> > > >         madvise((void *)0, (void *)-1, MADV_PAGEOUT)
> > > >
> > > > (after translation into process_madvise() speak).  Which is equivalent
> > > > to the proposed process_madvise(MADV_DONTNEED_MM)?
> > >
> > > Yep, this is very similar to option #3 in
> > > https://lore.kernel.org/linux-api/CAJuCfpGz1kPM3G1gZH+09Z7aoWKg05QSAMMisJ7H5MdmRrRhNQ@xxxxxxxxxxxxxx
> > > and I actually have a tested prototype for that.
> >
> > Why is the `vector=NULL' needed?  Can't `vector' point at a single iovec
> > which spans the whole address range?
> 
> That would be the option #4 from the same discussion and the issues
> noted there are "process_madvise return value can't handle such a
> large number of bytes and there is MAX_RW_COUNT limit on max number of
> bytes one process_madvise call can handle". In my prototype I have a
> special handling for such "bulk operation" to work around the
> MAX_RW_COUNT limitation.

Ah, OK, return value.  Maybe process_madvise() shouldn't have done that
and should have simply returned 0 on success, like madvise().

I guess a special "nuke whole address space" command is OK.  But, again
in the search for generality, the ability to nuke very large amounts of
address space (but not the entire address space) would be better. 

The process_madvise() return value issue could be addressed by adding a
process_madvise() mode which return 0 on success.

And I guess the MAX_RW_COUNT issue is solvable by adding an
import_iovec() arg to say "don't check that".  Along those lines.

It's all sounding a bit painful (but not *too* painful).  But to
reiterate, I do think that adding the ability for a process to shoot
down a large amount of another process's memory is a lot more generally
useful than tying it to SIGKILL, agree?

> >
> > > If that's the
> > > preferred method then I can post it quite quickly.
> >
> > I assume you've tested that prototype.  How did its usefulness compare
> > with this SIGKILL-based approach?
> 
> Just to make sure I understand correctly your question, you are asking
> about performance comparison of:
> 
> // approach in this RFC
> pidfd_send_signal(SIGKILL, SYNC_REAP_MM)
> 
> vs
> 
> // option #4 in the previous RFC
> kill(SIGKILL); process_madvise(vector=NULL, MADV_DONTNEED);
> 
> If so, I have results for the current RFC approach but the previous
> approach was testing on an older device, so don't have
> apples-to-apples comparison results at the moment. I can collect the
> data for fair comparison if desired, however I don't expect a
> noticeable performance difference since they both do pretty much the
> same thing (even on different devices my results are quite close). I
> think it's more a question of which API would be more appropriate.

OK.  I wouldn't expect performance to be very different (and things can
be sped up if so), but the API usefulness might be an issue.  Using
process_madvise() (or similar) makes it a two-step operation, whereas
tying it to SIGKILL&&TASK_UNINTERRUPTIBLE provides a more precise tool.
Any thoughts on this?




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux