On Mon, Jul 12, 2021 at 5:51 AM Jan Engelhardt <jengelh@xxxxxxx> wrote: > > > On Thursday 2021-07-08 08:05, Suren Baghdasaryan wrote: > >> > >> That explains very clearly the requirement, but it raises the question > >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existing > >> system call. > > > >I think you are suggesting to use sigqueue() to deliver the signal and > >perform the reaping when a special value accompanies it. This would be > >somewhat similar to my early suggestion to use a flag in > >pidfd_send_signal() (see: > >https://lore.kernel.org/patchwork/patch/1060407) to implement memory > >reaping which has another advantage of operation on PIDFDs instead of > >PIDs which can be recycled. > >kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the > >signal and return without blocking. Changing that behavior was > >considered unacceptable in these discussions. > > The way I understood the request is that a userspace program (or perhaps two, > if so desired) should issue _two_ calls, one to deliver the signal, > one to perform the reap portion: > > uinfo.si_code = SI_QUEUE; > sigqueue(pid, SIGKILL, &uinfo); > uinfo.si_code = SI_REAP; > sigqueue(pid, SIGKILL, &uinfo); This approach would still lead to the same discussion: by design, sigqueue/kill/pidfd_send_signal deliver the signal but do not wait for the signal to be processed by the recipient. Changing that would be a behavior change. Therefore we would have to follow this pattern and implement memory reaping in an asynchronous manner using a kthread/workqueue and it won't be done in the context of the calling process. This is undesirable because we lose the ability to control priority and cpu affinity for this operation and work won't be charged to the caller. That's why the proposed syscall performs memory reaping in the caller's context and blocks until the operation is done. In this proposal, your sequence looks like this: pidfd_send_signal(pidfd, SIGKILL, NULL, 0); process_reap(pidfd, 0); except we decided to rename process_reap() to process_mrelease() in the next revision.