Eric and Christian, Ping? If you guys don't like these patches anymore what about something simple like just exporting some helpers to update and check a task's nproc limit. Something like this: diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 81cab4b01edc..71b5946be792 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -98,6 +98,10 @@ int kernel_wait(pid_t pid, int *stat); extern void free_task(struct task_struct *tsk); +extern bool task_is_over_nproc_limit(struct task_struct *tsk); +extern void task_inc_nproc(struct task_struct *tsk); +extern void task_dec_nproc(struct task_struct *tsk); + /* sched_exec is called by processes performing an exec */ #ifdef CONFIG_SMP extern void sched_exec(void); diff --git a/kernel/cred.c b/kernel/cred.c index e10c15f51c1f..c15e7b926013 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -358,7 +358,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) kdebug("share_creds(%p{%d,%d})", p->cred, atomic_read(&p->cred->usage), read_cred_subscribers(p->cred)); - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + task_inc_nproc(p); return 0; } @@ -395,7 +395,7 @@ int copy_creds(struct task_struct *p, unsigned long clone_flags) #endif p->cred = p->real_cred = get_cred(new); - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + task_inc_nproc(p); alter_cred_subscribers(new, 2); validate_creds(new); return 0; diff --git a/kernel/fork.c b/kernel/fork.c index 9d44f2d46c69..88dbe3458d7d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1964,6 +1964,32 @@ static void copy_oom_score_adj(u64 clone_flags, struct task_struct *tsk) mutex_unlock(&oom_adj_mutex); } +bool task_is_over_nproc_limit(struct task_struct *tsk) +{ + if (!is_ucounts_overlimit(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, + task_rlimit(tsk, RLIMIT_NPROC))) + return false; + + if (tsk->real_cred->user != INIT_USER && !capable(CAP_SYS_RESOURCE) && + !capable(CAP_SYS_ADMIN)) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(task_is_over_nproc_limit); + +void task_inc_nproc(struct task_struct *tsk) +{ + inc_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1); +} +EXPORT_SYMBOL_GPL(task_inc_nproc); + +void task_dec_nproc(struct task_struct *tsk) +{ + dec_rlimit_ucounts(task_ucounts(tsk), UCOUNT_RLIMIT_NPROC, 1); +} +EXPORT_SYMBOL_GPL(task_dec_nproc); + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2102,11 +2128,8 @@ static __latent_entropy struct task_struct *copy_process( goto bad_fork_free; retval = -EAGAIN; - if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { - if (p->real_cred->user != INIT_USER && - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) - goto bad_fork_cleanup_count; - } + if (task_is_over_nproc_limit(p)) + goto bad_fork_cleanup_count; current->flags &= ~PF_NPROC_EXCEEDED; /* @@ -2526,7 +2549,7 @@ static __latent_entropy struct task_struct *copy_process( bad_fork_cleanup_delayacct: delayacct_tsk_free(p); bad_fork_cleanup_count: - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + task_dec_nproc(p); exit_creds(p); bad_fork_free: WRITE_ONCE(p->__state, TASK_DEAD); On 6/19/22 8:13 PM, Mike Christie wrote: > The following patches were made over Linus's tree. > > Eric and Christian, the vhost maintainer, Michael Tsirkin has ACK'd the > patches. I haven't got any more comments from you guys for a couple > postings now (Jan 8 was the last reply). Are you guys ok to merge them? > > For everyone else that hasn't see this before, the patches allow the > vhost layer to do a copy_process on the thread that does the > VHOST_SET_OWNER ioctl like how io_uring does a copy_process against its > userspace app. This allows the vhost layer's worker threads to inherit > cgroups, namespaces, address space, etc and this worker thread will also > be accounted for against that owner/parent process's RLIMIT_NPROC limit. > > If you are not familiar with qemu and vhost here is more detailed > problem description: > > Qemu will create vhost devices in the kernel which perform network, SCSI, > etc IO and management operations from worker threads created by the > kthread API. Because the kthread API does a copy_process on the kthreadd > thread, the vhost layer has to use kthread_use_mm to access the Qemu > thread's memory and cgroup_attach_task_all to add itself to the Qemu > thread's cgroups. > > The problem with this approach is that we then have to add new functions/ > args/functionality for every thing we want to inherit. I started doing > that here: > > https://lkml.org/lkml/2021/6/23/1233 > > for the RLIMIT_NPROC check, but it seems it might be easier to just > inherit everything from the beginning, becuase I'd need to do something > like that patch several times. > > V10: > - Eric's cleanup patches my vhost flush cleanup patches are merged > upstream, so rebase against Linus's tree which has everything. > V9: > - Rebase against Eric's kthread-cleanups-for-v5.19 branch. Drop patches > no longer needed due to kernel clone arg and pf io worker patches in that > branch. > V8: > - Fix kzalloc GFP use. > - Fix email subject version number. > V7: > - Drop generic user_worker_* helpers and replace with vhost_task specific > ones. > - Drop autoreap patch. Use kernel_wait4 instead. > - Fix issue where vhost.ko could be removed while the worker function is > still running. > V6: > - Rename kernel_worker to user_worker and fix prefixes. > - Add better patch descriptions. > V5: > - Handle kbuild errors by building patchset against current kernel that > has all deps merged. Also add patch to remove create_io_thread code as > it's not used anymore. > - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER > case added in 5.16-rc1. > - Add PF_USER_WORKER flag so we can check it later after the initial > thread creation for the wake up, vm and singal cses. > - Added patch to auto reap the worker thread. > V4: > - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch. > - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that > added the new kernel worker functions. > - Fixed extra "i" issue. > - Added PF_USER_WORKER flag and added check that kernel_worker_start users > had that flag set. Also dropped patches that passed worker flags to > copy_thread and replaced with PF_USER_WORKER check. > V3: > - Add parentheses in p->flag and work_flags check in copy_thread. > - Fix check in arm/arm64 which was doing the reverse of other archs > where it did likely(!flags) instead of unlikely(flags). > V2: > - Rename kernel_copy_process to kernel_worker. > - Instead of exporting functions, make kernel_worker() a proper > function/API that does common work for the caller. > - Instead of adding new fields to kernel_clone_args for each option > make it flag based similar to CLONE_*. > - Drop unused completion struct in vhost. > - Fix compile warnings by merging vhost cgroup cleanup patch and > vhost conversion patch. > > > > _______________________________________________ > Virtualization mailing list > Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linuxfoundation.org/mailman/listinfo/virtualization _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization